# Cloud - Full Markdown Export > This file contains all Cloud documentation pages in markdown format for AI agent consumption. > Generated from 663 pages on 2026-04-10T19:52:50.622Z > Component: redpanda-cloud | Version: > Site: https://docs.redpanda.com ## About This Export This export includes the **latest version** () of the Cloud documentation. ### AI-Friendly Documentation Formats We provide multiple formats optimized for AI consumption: - **https://docs.redpanda.com/llms.txt**: Curated overview of all Redpanda documentation - **https://docs.redpanda.com/llms-full.txt**: Complete documentation export with all components - **https://docs.redpanda.com/redpanda-cloud-full.txt**: This file - Cloud documentation only - **Individual markdown pages**: Each HTML page has a corresponding .md file --- # Page 1: Redpanda Agentic Data Plane **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents.md --- # Redpanda Agentic Data Plane --- title: Redpanda Agentic Data Plane latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/index.adoc description: Redpanda Agentic Data Plane (ADP) provides enterprise-grade infrastructure for building, deploying, and governing AI agents at scale with enterprise governance, cost controls, and compliance-grade audit trails. page-git-created-date: "2025-10-21" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). - [Redpanda Agentic Data Plane Overview](adp-overview/) Enterprise-grade infrastructure for building, deploying, and governing AI agents at scale with compliance-grade audit trails. - [Model Context Protocol (MCP)](mcp/) Give AI agents direct access to your databases, queues, CRMs, and other business systems without writing custom glue code. - [AI Agents](agents/) Declare agent behavior using built-in connectors in Redpanda Cloud. No custom agent code required. - [Transcripts](observability/) Govern agentic AI with complete execution transcripts built on Redpanda's immutable distributed log. - [AI Gateway](ai-gateway/) Keep AI-powered apps running with automatic provider failover, prevent runaway spend with centralized budget controls, and govern access across teams, apps, and service accounts. --- # Page 2: Redpanda Agentic Data Plane Overview **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/adp-overview.md --- # Redpanda Agentic Data Plane Overview --- title: Redpanda Agentic Data Plane Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: adp-overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: adp-overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/adp-overview.adoc description: Enterprise-grade infrastructure for building, deploying, and governing AI agents at scale with compliance-grade audit trails. page-topic-type: overview personas: evaluator, ai_agent_developer, platform_admin learning-objective-1: Identify the key components of Redpanda ADP and their purposes learning-objective-2: Describe how each component addresses enterprise governance and reliability requirements learning-objective-3: Determine whether Redpanda ADP fits your organization's requirements for AI agent deployment page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-20" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). As [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) evolve from experimental prototypes to business-critical systems, companies face new challenges. How do you ensure your AI agents are reliable? How do you maintain control over costs and compliance? And how do you scale them across your organization without creating technical debt? Teams across your organization want AI agents in production with direct access to enterprise data, from real-time event streams to databases and business systems. Building an agent is the easy part. Running one safely at scale remains the challenge: every database, queue, and API needs its own access policies, creating security gaps and slowing deployment. When you manage high-volume, event-driven data, you need a centralized layer through which all agent interactions flow so that agents can contextualize and act on that data in real time without compromising governance. Redpanda Agentic Data Plane (ADP) solves these problems by bringing together key capabilities: a solid data foundation, over 300 proven connectors, and a declarative approach to building AI agents. The result is a unified platform that automatically tracks every agent decision for compliance and audit requirements. After reading this page, you will be able to: - Identify the key components of Redpanda ADP and their purposes - Describe how each component addresses enterprise governance and reliability requirements - Determine whether Redpanda ADP fits your organization’s requirements for AI agent deployment ## [](#ai-agents)AI agents With Redpanda AI agents, you declare the agent behavior you want and Redpanda handles execution and orchestration. Instead of writing Python or JavaScript, you define behaviors in YAML. You can orchestrate multiple specialized [sub-agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#subagent), or bring your own frameworks like LangChain or LlamaIndex. What makes this practical at scale is [Redpanda Connect](../../develop/connect/about/). More than 300 connectors with built-in filtering, enrichment, and routing give declarative definitions real power. Upcoming templates will provide default behaviors for common domains such as customer success, legal, and finance. The result is faster time-to-production, lower maintenance (declarative definitions instead of imperative code), and organizational consistency across teams. For more information, see [AI Agents Overview](../agents/overview/). ## [](#mcp-servers)MCP servers [MCP servers](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server) translate agent intent into connections to databases, queues, HRIS, CRMs, and other business systems. They are the simplest way to give agents context and capabilities without writing glue code. Under the hood, MCP servers wrap the same proven connectors that power some of the world’s largest e-commerce, EV, electricity, and AI companies. Built on [Redpanda Connect](../../develop/connect/about/), they are lightweight, support OIDC-based authentication, and enforce deterministic policies at the tool level. You define tools in YAML, and policy enforcement programmatically prevents prompt injection, SQL injection, and other agent-based attacks. With over 300 connectors and real-time debugging capabilities, you reduce integration time while getting enterprise-grade security. You can reuse your existing infrastructure and data sources rather than building new integrations from scratch. For more information, see [MCP Servers Overview](../mcp/overview/). ## [](#transcripts)Transcripts Every agent action is recorded in an end-to-end execution log. A single [transcript](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#transcript) can span multiple agents, tools, and models, covering interactions that last minutes to days. Transcripts are the keystone of agent governance. They are built on Redpanda’s immutable log with transcript consensus and TLA+ correctness proofs. No gaps, no tampering. For regulated industries that require multi-year audit trails, this provides a compliance-grade record of every decision an agent makes and every data source it uses. Redpanda captures 100% of agent actions through OpenTelemetry standards, with end-to-end lineage across the entire execution chain. You can materialize execution logs to Iceberg tables for long-term retention and analysis, or replay them to evaluate and improve agent performance over time. For more information, see [Transcripts Overview](../observability/concepts/). ## [](#ai-gateway)AI Gateway The AI Gateway manages LLM provider access with two priorities: keeping your application up and keeping costs under control. For high availability, the gateway provides provider-agnostic routing with intelligent failover. Your users don’t care which provider serves a request. They care that the application stays up. For fiscal control, you get per-tenant budgets and rate limiting, so there are no runaway costs and no surprise bills. The gateway also supports tenancy modeling for teams, individuals, applications, and service accounts, giving you chargeback transparency for internal cost allocation. You can proxy both models and MCP gateways, centralizing compliance for all LLM interactions without locking into any single provider. For more information, see [AI Gateway Overview](../ai-gateway/what-is-ai-gateway/). ## [](#enterprise-governance)Enterprise governance Redpanda ADP addresses critical enterprise requirements across all components. - **Security by design**: MCP servers enforce policies at the tool level, programmatically preventing prompt injection, SQL injection, and other agent-based attacks. Policy enforcement is deterministic and controlled. Agents cannot bypass security constraints even through creative prompting. - **Unified authorization**: All components use OIDC-based authentication with an on-behalf-of authorization model. When a user invokes an agent, the agent inherits the intersection of its own permissions and the user’s permissions. This ensures proper data access scoping. - **Complete observability**: Redpanda ADP provides two levels of inspection. Execution logs (transcripts) capture every agent action with 100% sampling using OpenTelemetry standards. Real-time debugging tools allow you to inspect individual MCP server calls down to individual tool invocations with full timing data. You can view detailed agent actions in [Redpanda Console](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#redpanda-console) and replay data for agent evaluations. - **Compliance and audit**: For industries requiring multi-year audit trails, Redpanda ADP records every agent action and data source used in decision-making. Execution logs are stored in Redpanda topics and can be materialized to Iceberg tables for long-term retention and analysis. ## [](#use-cases)Use cases Some ways organizations can leverage Redpanda ADP include: - **Automate operational workflows**: Create specialized agents for building management, infrastructure monitoring, compliance reporting, and other domain-specific tasks. - **Monitor manufacturing and operations**: Deploy multi-agent systems that analyze factory machine telemetry in real-time, detect anomalies, search equipment manuals, and create maintenance tickets automatically. - **Extend enterprise productivity tools**: Integrate Microsoft Copilot or other workplace agents with internal data sources and systems that are otherwise inaccessible. ## [](#next-steps)Next steps - [AI Agents Overview](../agents/overview/) - [MCP Server Overview](../mcp/overview/) - [Transcripts Overview](../observability/concepts/) - [AI Gateway Overview](../ai-gateway/what-is-ai-gateway/) --- # Page 3: AI Agents **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents.md --- # AI Agents --- title: AI Agents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/index.adoc description: Declare agent behavior using built-in connectors in Redpanda Cloud. No custom agent code required. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). - [Get Started with AI Agents](get-started-index/) Get started with declarative AI agents in Redpanda Cloud. Connect tools, configure behavior, and deploy without writing agent code. - [Build Agents](build-index/) Create production AI agents with effective prompts and scalable architecture. - [Monitor Agent Activity](monitor-agents/) Monitor agent execution, analyze conversation history, track token usage, and debug issues using Inspector, Transcripts, and agent data topics. - [Agent Integration](integration-index/) Connect agents to external applications, pipelines, and other systems. --- # Page 4: A2A Protocol **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/a2a-concepts.md --- # A2A Protocol --- title: A2A Protocol latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/a2a-concepts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/a2a-concepts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/a2a-concepts.adoc description: Learn how the A2A protocol enables agent discovery and communication. page-topic-type: concepts personas: agent_developer, app_developer, streaming_developer learning-objective-1: Describe the A2A protocol and its role in agent communication learning-objective-2: Explain how agent cards enable discovery learning-objective-3: Identify how authentication secures agent communication page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). The Agent-to-Agent (A2A) protocol is an open standard for agent communication and discovery. Redpanda Cloud uses A2A for both external integration and internal pipeline-to-agent communication. After reading this page, you will be able to: - Describe the A2A protocol and its role in agent communication - Explain how agent cards enable discovery - Identify how authentication secures agent communication ## [](#what-is-the-a2a-protocol)What is the A2A protocol? The Agent-to-Agent (A2A) protocol is an open standard that defines how agents discover, communicate with, and invoke each other. Agents that implement A2A expose their capabilities through a standardized agent card. This allows other systems to interact with them without prior knowledge of their implementation. The protocol provides: - Standardized discovery: Agent cards describe capabilities in a machine-readable format. - Platform independence: Any system can call any A2A-compliant agent. - Version negotiation: Protocol versions ensure compatibility between agents. - Communication mode flexibility: Supports synchronous request/response and streaming. For the complete specification, see [a2a.ag/spec](https://a2a.ag/spec). ## [](#agent-cards)Agent cards Every A2A-compliant agent exposes an agent card at a well-known URL. The agent card is a JSON document that describes what the agent can do and how to interact with it. For the complete agent card specification, see [Agent Card documentation](https://agent2agent.info/docs/concepts/agentcard/). ### [](#agent-card-location)Agent card location Redpanda Cloud agents expose their agent cards at the `/.well-known/agent-card.json` subpath of the agent URL. You can find the agent URL on the agent overview page in the Redpanda Cloud Console under **Agentic AI** > **AI Agents**. For example, if your agent URL is `https://my-agent.ai-agents.abc123.cloud.redpanda.com`, your agent card URL is `https://my-agent.ai-agents.abc123.cloud.redpanda.com/.well-known/agent-card.json`. The `.well-known` path follows internet standards for service discovery, making agents discoverable without configuration. To configure the agent card, see [Configure A2A discovery metadata](../create-agent/#configure-a2a-discovery-metadata-optional). ## [](#where-a2a-is-used-in-redpanda-cloud)Where A2A is used in Redpanda Cloud Redpanda Cloud uses the A2A protocol in two contexts: ### [](#external-integration)External integration External applications and agents hosted outside Redpanda Cloud use A2A to call Redpanda Cloud agents. This includes backend services, CLI tools, custom UIs, and agents hosted on other platforms. For integration pattern guidance, see [Integration Patterns Overview](../integration-overview/). ### [](#internal-pipeline-to-agent-integration)Internal pipeline-to-agent integration Redpanda Connect pipelines use the [`a2a_message`](../../../develop/connect/components/processors/a2a_message/) processor to invoke agents for each event in a stream. This enables real-time interaction between streaming data and AI agents, enabling use cases like: - Real-time fraud detection on every transaction. - Streaming data enrichment with AI-generated fields. - Event-driven agent invocation for automated processing. The `a2a_message` processor uses the A2A protocol internally to discover and call agents. For pipeline patterns, see [Pipeline Integration Patterns](../pipeline-integration-patterns/). ## [](#how-agents-discover-each-other)How agents discover each other A2A enables dynamic discovery without hardcoded configuration: 1. The caller fetches the agent card from the well-known URL. 2. The caller checks the protocol version and supported communication modes. 3. The caller uses the input schema from the agent card to format the request properly. 4. The caller sends the request to the agent’s endpoint. This discovery model allows: - New agents to become available immediately once deployed - Existing agents to update their capabilities while callers adapt dynamically - Callers to understand exactly what agents do through self-describing agent cards ## [](#authentication)Authentication A2A-compliant agents require authentication to prevent unauthorized access. Redpanda Cloud agents use OAuth2 client credentials flow. When you create an agent, the system provisions a service account with a client ID and secret. External callers use these credentials to obtain access tokens: 1. Agent creation automatically provisions a service account with credentials. 2. Applications exchange the client ID and secret for a time-limited access token via OAuth2. 3. Applications include the access token in the Authorization header when calling the agent endpoint. 4. When tokens expire, applications exchange credentials again for a new token. This flow ensures: - Credentials stay secure: Applications never send them directly to agents, only access tokens. - Exposure is limited: Tokens expire, reducing the window for compromised credentials. - Integration is standard: Applications can use existing OAuth2 libraries. ### [](#external-integration-2)External integration External applications must authenticate using the service account credentials. Each agent has its own service account. For step-by-step authentication instructions, see [Authentication](../../../security/cloud-authentication/). ### [](#internal-integration)Internal integration The `a2a_message` processor handles authentication automatically. Pipelines don’t need to manage credentials explicitly because they run within the Redpanda Cloud cluster with appropriate permissions. ## [](#protocol-versions)Protocol versions The A2A protocol uses semantic versioning (major.minor.patch). Agents declare their supported version in the agent card. ## [](#next-steps)Next steps - [Integration Patterns Overview](../integration-overview/) - [Create an Agent](../create-agent/) - [A2A Protocol Specification](https://a2a.ag/spec) --- # Page 5: Agent Architecture Patterns **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/architecture-patterns.md --- # Agent Architecture Patterns --- title: Agent Architecture Patterns latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/architecture-patterns page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/architecture-patterns.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/architecture-patterns.adoc description: Design maintainable agent systems with single-agent and multi-agent patterns based on domain complexity. page-topic-type: best-practices personas: agent_developer, streaming_developer learning-objective-1: Evaluate single-agent versus multi-agent architectures for your use case learning-objective-2: Choose appropriate LLM models based on task requirements learning-objective-3: Apply agent boundary design principles for maintainability page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This topic helps you design agent systems that are maintainable, discoverable, and reliable by choosing the right architecture pattern and applying clear boundary principles. After reading this page, you will be able to: - Evaluate single-agent versus multi-agent architectures for your use case - Choose appropriate LLM models based on task requirements - Apply agent boundary design principles for maintainability ## [](#why-architecture-matters)Why architecture matters Agent architecture determines how you manage complexity as your system grows. The right pattern depends on your domain complexity, organizational structure, and how you expect requirements to evolve. Starting with a simple architecture is tempting, but can lead to unmaintainable systems as complexity increases. Planning for growth with clear boundaries prevents technical debt and costly refactoring later. Warning signs that you need architectural boundaries, not just better prompts: - System prompts exceeding 2000 words - Too many tools for the LLM to select correctly - Multiple teams modifying the same agent - Changes in one domain breaking others Match agent architecture to domain structure: | Domain Characteristics | Architecture | Pros | Cons | | --- | --- | --- | --- | | Single business area, stable requirements | Single agent | Simple to build and maintain, one deployment, lower latency | Limited flexibility, difficult to scale to multi-domain problems | | Multiple business areas, shared infrastructure | Root agent with internal subagents | Separation of concerns, easier debugging, shared resources reduce cost | Single point of failure, all subagents constrained to same model and budget | | Cross-organization workflows, independent evolution | External agent-to-agent | Independent deployment and scaling, security isolation, flexible infrastructure | Network latency, authentication complexity, harder to debug across boundaries | Every architecture pattern involves trade-offs. - **Latency versus isolation:** Internal subagents have lower latency because they avoid network calls, but they share a failure domain. External agents have higher latency due to network overhead, but they provide independent failure isolation. - **Shared state versus independence:** Single deployments share model, budget, and policies but offer less flexibility. Multiple deployments allow independent scaling and updates but add coordination complexity. - **Complexity now versus complexity later:** Starting simple means faster initial development but may require refactoring. Starting structured requires more upfront work but makes the system easier to extend. For foundational concepts on how agents execute and manage complexity, see [Agent Concepts](../concepts/). ## [](#single-agent-pattern)Single-agent pattern A single-agent architecture uses one agent with one system prompt and one tool set to handle all requests. This pattern works best for narrow domains with limited scope, single data sources, and tasks that don’t require specialized subsystems. ### [](#when-to-use-single-agents)When to use single agents Use single agents for focused problems that won’t expand significantly. Examples include order lookup agents that retrieve history from a single topic, weather agents that query APIs and return formatted data, and inventory checkers that report stock levels. ### [](#trade-offs)Trade-offs Single agents are simpler to build and maintain. You have one system prompt, one tool set, and one deployment. However, all capabilities must coexist in one agent. Adding features increases complexity rapidly, making single agents difficult to scale to multi-domain problems. > 💡 **TIP** > > You can migrate from a single agent to a root agent with subagents without starting over. Add subagents to an existing agent using the Redpanda Cloud Console, then gradually move tools and responsibilities to the new subagents. ## [](#root-agent-with-subagents-pattern)Root agent with subagents pattern A multi-agent architecture uses a root agent that delegates to specialized internal subagents. This pattern works for complex domains spanning multiple areas, multiple data sources with different access patterns, and tasks requiring specialized expertise within one deployment. > 📝 **NOTE** > > Subagents in Redpanda Cloud are internal specialists within a single agent. They share the parent agent’s model, budget, and policies, but each can have different names, descriptions, system prompts, and MCP tools. ### [](#how-it-works)How it works The root agent interprets user requests and routes them to appropriate subagents. Each subagent owns a specific business area with focused expertise. Subagents access only the MCP tools they need. All subagents share the same LLM model and budget from the parent agent. ### [](#example-e-commerce-platform)Example: E-commerce platform A typical e-commerce agent includes a root agent that interprets requests and delegates to specialists, an order subagent for processing, history, and status updates, an inventory subagent for stock checks and warehouse operations, and a customer subagent for profiles, preferences, and history. All subagents share the same model but have different system prompts and tool access. ### [](#why-choose-internal-subagents)Why choose internal subagents Internal subagents provide domain isolation, allowing you to update the order subagent without affecting inventory. Debugging is easier because each subagent has narrow scope and fewer potential failure points. All subagents share resources, reducing complexity and cost compared to separate deployments. Use internal subagents when you need domain separation within a single agent deployment. ## [](#external-agent-to-agent-pattern)External agent-to-agent pattern External A2A integration connects agents across organizational boundaries, platforms, or independent systems. > 📝 **NOTE** > > Cross-agent calling between separate Redpanda Cloud agents is not supported. This pattern only applies to connecting Redpanda Cloud agents with external agents you host elsewhere. ### [](#when-to-use-external-a2a)When to use external A2A Use external [Agent2Agent (A2A) protocol](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#agent2agent-a2a-protocol) for multi-organization workflows that coordinate agents across company boundaries, for platform integration connecting Redpanda Cloud agents with agents hosted elsewhere, and when agents require different deployment environments such as GPU clusters, air-gapped networks, or regional constraints. ### [](#how-it-works-2)How it works Agents communicate using the [A2A protocol](../a2a-concepts/), a standard HTTP-based protocol for discovery and invocation. Each agent manages its own credentials and access control independently, and can deploy, scale, and update without coordinating with other agents. Agent cards define capabilities without exposing implementation details. ### [](#example-multi-platform-customer-service)Example: Multi-platform customer service A customer service workflow might span multiple platforms: - Redpanda Cloud agent accesses real-time order and inventory data - CRM agent hosted elsewhere manages customer profiles and support tickets - Payment agent from a third party handles transactions in a secure environment Each agent runs on its optimal infrastructure while coordinating through A2A. ### [](#why-choose-external-a2a)Why choose external A2A External A2A lets different teams own and deploy their agents independently, with each agent choosing its own LLM, tools, and infrastructure. Sensitive operations stay in controlled environments with security isolation, and you can add agents incrementally without rewriting existing systems. ### [](#trade-offs-2)Trade-offs External A2A adds network latency on every cross-agent call, and authentication complexity multiplies with each agent requiring credential management. Removing capabilities or changing contracts requires coordination across consuming systems, and debugging requires tracing requests across organizational boundaries. For implementation details on external A2A integration, see [Integration Patterns Overview](../integration-overview/). ## [](#common-anti-patterns)Common anti-patterns Avoid these architecture mistakes that lead to unmaintainable agent systems. For examples of well-structured agents, see the [multi-tool orchestration tutorial](../tutorials/customer-support-agent/) and [multi-agent architecture tutorial](../tutorials/transaction-dispute-resolution/). ### [](#the-monolithic-prompt)The monolithic prompt A monolithic prompt is a single 3000+ word system prompt covering multiple domains. This pattern fails because: - LLM confusion increases with prompt length - Multiple teams modify the same prompt creating conflicts and unclear ownership - Changes to one domain risk breaking others Split into domain-specific subagents instead. Each subagent gets a focused prompt under 500 words. ### [](#the-tool-explosion)The tool explosion A tool explosion occurs when a single agent has too many tools from every MCP server in the cluster. This pattern fails because: - The LLM struggles to choose correctly from large tool sets - Tool descriptions compete for limited prompt space - The agent invokes wrong tools with similar names, wasting iteration budget on selection mistakes Limit tools per agent to 10-15 for optimal performance. Agents with more than 20-25 tools often show degraded tool selection accuracy. Use subagents to partition tools by domain. For tool design patterns, see [MCP Tool Patterns](../../mcp/remote/tool-patterns/). ### [](#premature-a2a-splitting)Premature A2A splitting Premature splitting creates three separate A2A agents when all logic could fit in one agent with internal subagents. This pattern fails because: - Network latency affects every cross-agent call - Authentication complexity multiplies with three sets of credentials - Debugging requires correlating logs across systems - You manage three deployments instead of one Start with internal subagents for domain separation. Split to external A2A only when you need organizational boundaries or different infrastructure. ### [](#unbounded-tool-chaining)Unbounded tool chaining Unbounded chaining sets max iterations to 100, returns hundreds of items from tools, and places no constraints on tool call frequency. This pattern fails because: - The context window fills with tool results - Requests time out before completion - Costs spiral with many iterations multiplied by large context - The agent loses track of the original goal For best results: - Design workflows to complete in 20-30 iterations - Return paginated results from tools - Add prompt constraints like "Never call the same tool more than 3 times per request" ## [](#model-selection-guide)Model selection guide Choose models based on task complexity, latency requirements, and cost constraints. The Redpanda Cloud Console displays available models with descriptions when creating agents. ### [](#match-models-to-task-complexity)Match models to task complexity For simple queries, choose cost-effective models such as GPT-5 Mini. For balanced workloads, choose mid-tier models such as Claude Sonnet 4.5 or GPT-5.2. For complex reasoning, choose premium models such as Claude Opus 4.5 or GPT-5.2. ### [](#balance-latency-and-model-size)Balance latency and model size For real-time responses, choose smaller models. Use models optimized for speed, such as Mini or base tiers. For batch processing, optimize for accuracy over speed. Use larger models when users aren’t waiting for results. ### [](#optimize-for-cost-and-volume)Optimize for cost and volume For high volume, use cost-effective models. Smaller tiers reduce costs while maintaining acceptable quality. For critical accuracy, use premium models. Higher costs are justified when errors are costly. ### [](#model-provider-documentation)Model provider documentation For complete model specifications, capabilities, and pricing: - [OpenAI Models](https://platform.openai.com/docs/models) - [Anthropic Claude Models](https://docs.anthropic.com/claude/docs/models-overview) - [Google Gemini Models](https://ai.google.dev/gemini-api/docs/models) ## [](#design-principles)Design principles Follow these principles to create maintainable agent systems. ### [](#explicit-agent-boundaries)Explicit agent boundaries Each agent should have clear scope and responsibilities. Define scope explicitly in the system prompt, assign a specific tool set for the agent’s domain, and specify well-defined inputs and outputs. Do not create agents with overlapping responsibilities. Overlapping domains create confusion about which agent handles which requests. ### [](#tool-scoping-per-agent)Tool scoping per agent Assign tools to the agent that needs them. Don’t give all agents access to all tools. Limit tool access based on agent purpose. Tool scoping reduces misuse risk and makes debugging easier. ### [](#error-handling-and-fallbacks)Error handling and fallbacks Design agents to handle failures gracefully. Use retry logic for transient failures like network timeouts. Report permanent failures like invalid parameters immediately. Provide clear error messages to users. Log errors for debugging. ## [](#next-steps)Next steps - [Integration Patterns Overview](../integration-overview/) - [A2A Protocol](../a2a-concepts/) - [MCP Tool Patterns](../../mcp/remote/tool-patterns/) - [AI Agents Overview](../overview/) - [MCP Tool Design](../../mcp/remote/best-practices/) --- # Page 6: Build Agents **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/build-index.md --- # Build Agents --- title: Build Agents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/build-index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/build-index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/build-index.adoc description: Create production AI agents with effective prompts and scalable architecture. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Create agents, write effective prompts, and design scalable agent systems. - [Create an Agent](../create-agent/) Declaratively configure an agent by choosing an LLM, writing a system prompt, connecting tools from built-in connectors, and setting execution parameters. - [System Prompt Best Practices](../prompt-best-practices/) Write system prompts that produce reliable, predictable agent behavior through clear constraints and tool guidance. - [Agent Architecture Patterns](../architecture-patterns/) Design maintainable agent systems with single-agent and multi-agent patterns based on domain complexity. - [Troubleshoot AI Agents](../troubleshooting/) Diagnose and fix common issues with AI agents including deployment failures, runtime behavior problems, and tool execution errors. --- # Page 7: Agent Concepts **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/concepts.md --- # Agent Concepts --- title: Agent Concepts latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/concepts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/concepts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/concepts.adoc description: Understand how declaratively configured agents execute reasoning loops, manage context, invoke tools, and handle errors. page-topic-type: concepts personas: agent_developer, streaming_developer, data_engineer learning-objective-1: Explain how agents execute reasoning loops and make tool invocation decisions learning-objective-2: Describe how agents manage context and state across interactions learning-objective-3: Identify error handling strategies for agent failures page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). After you declaratively configure an agent’s behavior (its LLM, system prompt, and tools), the framework manages execution through a reasoning loop. The LLM analyzes context, decides which tools to invoke, processes results, and repeats until the task completes. Understanding this execution model helps you fine-tune agent settings like iteration limits and tool selection. After reading this page, you will be able to: - Explain how agents execute reasoning loops and make tool invocation decisions - Describe how agents manage context and state across interactions - Identify error handling strategies for agent failures ## [](#agent-execution-model)Agent execution model Every agent request follows a reasoning loop. The agent doesn’t execute all tool calls at once. Instead, it makes decisions iteratively. ### [](#the-reasoning-loop)The reasoning loop The following diagram shows how agents process requests through iterative reasoning: ![Diagram showing the agent reasoning loop: User Request flows to LLM Receives Context](../../_images/agent-reasoning-loop.png) Figure 1. Agent reasoning loop with tool integration When an agent receives a request: 1. The LLM receives the context, including system prompt, conversation history, user request, and previous tool results. 2. The LLM chooses to invoke a tool, requests more information, or responds to user. 3. The tool runs and returns results if invoked. 4. The tool’s results are added to conversation history. 5. The LLM reasons again with an expanded context. The loop continues until one of these conditions is met: ![Diagram showing exit conditions: Task Complete returns response](../../_images/agent-exit-conditions.png) Figure 2. Reasoning loop exit conditions - Agent completes the task and responds to the user - Agent reaches max iterations limit - Agent encounters an unrecoverable error > 📝 **NOTE** > > If the agent encounters an unrecoverable error on the first iteration, it returns an error immediately. Unrecoverable errors include authentication failures, invalid tool configurations, or LLM API failures. ### [](#why-iterations-matter)Why iterations matter Each iteration includes three phases: 1. **LLM reasoning**: The model processes the growing context to decide the next action. 2. **Tool invocation**: If the agent decides to call a tool, execution happens and waits for results. 3. **Context expansion**: Tool results are added to the conversation history for the next iteration. With higher iteration limits, agents can complete complex tasks but can cost more and take longer. With lower iteration limits, agents can respond faster and are cheaper but may fail on complex requests. #### [](#cost-calculation)Cost calculation Calculate the approximate cost per request by estimating average context tokens per iteration: Cost per request = (iterations x context tokens x model price per token) Example with 30 iterations at $0.000002 per token: Iteration 1: 500 tokens x $0.000002 = $0.001 Iteration 15: 2000 tokens x $0.000002 = $0.004 Iteration 30: 4000 tokens x $0.000002 = $0.008 Total: ~$0.013 per request Actual costs vary based on: - Tool result sizes (large results increase context) - Model pricing (varies by provider and model tier) - Task complexity (determines iteration count) Setting max iterations creates a cost/capability trade-off: | Limit | Range | Use Case | Cost | | --- | --- | --- | --- | | Low | 10-20 | Simple queries, single tool calls | Cost-effective | | Medium | 30-50 | Multi-step workflows, tool chaining | Balanced | | High | 50-100 | Complex analysis, exploratory tasks | Higher | Iteration limits prevent runaway costs when agents encounter complex or ambiguous requests. ## [](#mcp-tool-invocation-patterns)MCP tool invocation patterns MCP tools extend agent capabilities beyond text generation. Understanding when and how tools execute helps you design effective tool sets. ### [](#synchronous-tool-execution)Synchronous tool execution In Redpanda Cloud, tool calls block the agent. When the agent decides to invoke a tool, it pauses and waits while the tool executes (querying a database, calling an API, or processing data). When the tool returns its result, the agent resumes reasoning. This synchronous model means latency adds up across multiple tool calls, the agent sees tool results sequentially rather than in parallel, and long-running tools can delay or fail agent requests due to timeouts. ### [](#tool-selection-decisions)Tool selection decisions The LLM decides which tool to invoke based on system prompt guidance (such as "Use get\_orders when customer asks about history"), tool descriptions from the MCP schema that define parameters and purpose, and conversation context where previous tool results influence the next tool choice. Agents can invoke the same tool multiple times with different parameters if the task requires it. ### [](#tool-chaining)Tool chaining Agents chain tools when one tool’s output feeds another tool’s input. For example, an agent might first call `get_customer_info(customer_id)` to retrieve details, then use that data to call `get_order_history(customer_email)`. Tool chaining requires sufficient max iterations because each step in the chain consumes one iteration. ### [](#tool-granularity-considerations)Tool granularity considerations Tool design affects agent behavior. Coarse-grained tools that do many things result in fewer tool calls but less flexibility and more complex implementation. Fine-grained tools that each do one thing require more tool calls but offer higher composability and simpler implementation. Choose granularity based on how often you’ll reuse tool logic across workflows, whether intermediate results help with debugging, and how much control you want over tool invocation order. For tool design guidance, see [MCP Tool Design](../../mcp/remote/best-practices/). ## [](#context-and-state-management)Context and state management Agents handle two types of information: conversation context (what’s been discussed) and state (persistent data across sessions). ### [](#conversation-context)Conversation context The agent’s context includes the system prompt (always present), user messages, agent responses, tool invocation requests, and tool results. As the conversation progresses, context grows. Each tool result adds tokens to the context window, which the LLM uses for reasoning in subsequent iterations. ### [](#context-window-limits)Context window limits LLM context windows limit how much history fits. Small models support 8K-32K tokens, medium models support 32K-128K tokens, and large models support 128K-1M+ tokens. When context exceeds the limit, the oldest tool results get truncated, the agent loses access to early conversation details, and may ask for information it already retrieved. Design workflows to complete within context limits. Avoid unbounded tool chaining. ## [](#service-account-authorization)Service account authorization When you create an MCP server or AI agent, Redpanda Cloud automatically creates a service account to authenticate requests to your cluster. The service account is created with the following: **Name**: Prepopulated as `cluster----sa`, where `sa` stands for service account. For example: - MCP server: `cluster-d5tp5kntujt599ksadgg-mcp-my-test-server-sa` - AI agent: `cluster-d5tp5kntujt599ksadgg-agent-my-agent-sa` You can customize this name during creation. **Role binding**: Cluster scope with Writer role for the cluster where you created the resource. This allows the resource to read and write data, manage topics, and access cluster resources. ### [](#manage-service-accounts)Manage service accounts You can view and manage service accounts created for MCP servers and AI agents at the organization level in **Organization IAM** > **Service account**. This page shows additional details not visible during creation: | Field | Description | | --- | --- | | Client ID | Unique identifier for OAuth2 authentication | | Description | Optional description of the service account | | Created at | Timestamp when the service account was created | | Updated at | Timestamp of the last modification | From this page you can: - Edit the service account name or description - View and manage role bindings - Rotate credentials - Delete the service account > 📝 **NOTE** > > Deleting a service account removes authentication for the associated MCP server or AI agent. The resource can no longer access cluster data. ### [](#customize-role-bindings)Customize role bindings The default Writer role provides broad access suitable for most use cases. If you need more restrictive permissions: 1. Exit the cluster and navigate to **Organization IAM** > **Service account**. 2. Find the service account for your resource. 3. Edit the role bindings to use a more restrictive role or scope. For more information about roles and permissions, see [Role-based access control](../../../security/authorization/rbac/rbac/) or [Group-based access control](../../../security/authorization/gbac/gbac/). ## [](#next-steps)Next steps - [Agent Architecture Patterns](../architecture-patterns/) - [AI Agent Quickstart](../quickstart/) - [System Prompt Best Practices](../prompt-best-practices/) - [MCP Tool Design](../../mcp/remote/best-practices/) --- # Page 8: Create an Agent **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/create-agent.md --- # Create an Agent --- title: Create an Agent latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/create-agent page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/create-agent.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/create-agent.adoc description: Declaratively configure an agent by choosing an LLM, writing a system prompt, connecting tools from built-in connectors, and setting execution parameters. page-topic-type: how-to personas: agent_developer, app_developer, streaming_developer learning-objective-1: Configure an agent with model selection and system prompt learning-objective-2: Connect MCP servers and select tools for your agent learning-objective-3: Set agent execution parameters including max iterations page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Create a new AI agent declaratively through the Redpanda Cloud Console. No Python or JavaScript code required. This guide walks you through configuring the agent’s model, writing the system prompt, connecting tools from built-in connectors, and setting execution parameters. After reading this page, you will be able to: - Configure an agent with model selection and system prompt - Connect MCP servers and select tools for your agent - Set agent execution parameters including max iterations ## [](#prerequisites)Prerequisites - A [BYOC cluster](../../../get-started/cluster-types/byoc/). - [AI Gateway configured](../../ai-gateway/gateway-quickstart/) with at least one LLM provider enabled. - At least one [Remote MCP server](../../mcp/remote/overview/) deployed with tools. - System prompt prepared (see [System Prompt Best Practices](../prompt-best-practices/)). ## [](#access-the-agents-ui)Access the agents UI 1. Log in to the [Redpanda Cloud Console](https://cloud.redpanda.com). 2. Navigate to your cluster. 3. Click **Agentic AI** > **AI Agents** in the left navigation. ## [](#configure-basic-settings)Configure basic settings 1. Click **Create Agent**. 2. Enter a display name (3-128 characters, alphanumeric with spaces, hyphens, underscores, or slashes). 3. Optionally, add a description (maximum 256 characters). 4. Select a resource tier based on your workload characteristics: Resource tiers control CPU and memory allocated to your agent. Choose based on: - **Concurrency:** How many simultaneous requests the agent handles. - **Tool memory:** Whether tools process large datasets in memory. - **Response time:** How quickly the agent needs to respond. Available tiers: - XSmall: 100m CPU, 400M RAM (single-user testing, simple queries) - Small: 200m CPU, 800M RAM (light workloads, few concurrent users) - Medium: 300m CPU, 1200M RAM (recommended for most production use cases) - Large: 400m CPU, 1600M RAM (high concurrency or memory-intensive tools) - XLarge: 500m CPU, 2G RAM (very high concurrency or large data processing) Start with Medium for production workloads. Monitor CPU and memory usage, then adjust if you see resource constraints. 5. Optionally, add tags (maximum 16 tags) for organization and filtering: - Keys: Maximum 64 characters, must be unique - Values: Maximum 256 characters, allowed characters: letters, numbers, spaces, and `_.:/=+-@` ## [](#choose-a-model)Choose a model Agents use large language models (LLMs) to interpret user intent and decide which tools to invoke. 1. Select your AI Gateway: Choose the gateway that contains your configured LLM providers and API keys. If you have multiple gateways, select the appropriate one for this agent’s workload (for example, production vs staging, or team-specific gateways). 2. Select your LLM provider from those available in the gateway: - OpenAI (GPT models) - Google (Gemini models) - Anthropic (Claude models) - OpenAI Compatible (custom OpenAI-compatible endpoints) 3. If using OpenAI Compatible, provide the base URL: - Base URL is required for OpenAI Compatible - Must start with `http://` or `https://` - Example: `[https://api.example.com/v1](https://api.example.com/v1)` 4. Select the specific model version from the dropdown. The dropdown shows available models with descriptions. For detailed model specifications and pricing: - [OpenAI Models](https://platform.openai.com/docs/models) - [Anthropic Claude Models](https://docs.anthropic.com/claude/docs/models-overview) - [Google Gemini Models](https://ai.google.dev/gemini-api/docs/models) For model selection based on architecture patterns, see [Model selection guide](../architecture-patterns/#model-selection-guide). ## [](#write-the-system-prompt)Write the system prompt 1. In the **System Prompt** section, enter your prompt (minimum 10 characters). 2. Follow these guidelines: - Define agent role and responsibilities - List available tools - Specify constraints and safety rules - Set output format expectations 3. Use the **Preview** button to review formatted prompt. Example system prompt structure: ```text You are an [agent role]. Responsibilities: - [Task 1] - [Task 2] Available tools: - [tool_name]: [description] Never: - [Constraint 1] - [Constraint 2] Response format: - [Format guideline] ``` For complete prompt guidelines, see [System Prompt Best Practices](../prompt-best-practices/). ## [](#add-mcp-servers-and-select-tools)Add MCP servers and select tools 1. In the **Tools** section, click **Add MCP Server**. 2. Select an MCP server from your cluster. 3. The UI displays all tools exposed by that server. 4. Select which tools this agent can use: - Check the box next to each tool - Review tool descriptions to confirm they match agent needs 5. Repeat to add tools from multiple MCP servers. 6. Verify your tool selection: - Ensure tools match those listed in your system prompt - Remove tools the agent doesn’t need (principle of least privilege) ## [](#add-subagents-optional)Add subagents (optional) Subagents are internal specialists within a single agent. Each subagent can have its own name, description, system prompt, and MCP tools, but all subagents share the parent agent’s model, budget, and policies. 1. In the **Subagents** section, click **Add Subagent**. 2. Configure the subagent: - **Name**: 1-64 characters, only letters, numbers, hyphens, and underscores (for example: `order-agent` or `Order_Agent`) - **Description**: Maximum 256 characters (optional) - **System Prompt**: Minimum 10 characters, domain-specific instructions - **MCP Tools**: Select tools this subagent can access The root agent orchestrates and delegates work to appropriate subagents based on the request. For multi-agent design patterns, see [Agent Architecture Patterns](../architecture-patterns/). ### [](#set-max-iterations)Set max iterations Max iterations determine how many reasoning loops the agent can perform before stopping. Each iteration consumes tokens and adds latency. For detailed cost calculations and the cost/capability/latency trade-off, see [Agent Concepts](../concepts/). In the **Execution Settings** section, configure **Max Iterations** (range: 10-100, default: 30). Choose based on task complexity: - **Simple queries** (10-20): Single tool call, direct answers, minimal reasoning - **Balanced workflows** (20-40): Multiple tool calls, data aggregation, moderate analysis - **Complex analysis** (40-100): Exploratory queries, extensive tool chaining, deep reasoning Start with 30 for most use cases. ### [](#configure-a2a-discovery-metadata-optional)Configure A2A discovery metadata (optional) After creating your agent, configure discovery metadata for external integrations. For detailed agent card design guidance, see [Create an Agent Card](https://agent2agent.info/docs/guides/create-agent-card/). 1. Click on your agent. 2. Open the **A2A** tab. 3. Configure identity fields: - **Icon URL**: A publicly accessible image URL (recommended: 256x256px PNG or SVG) - **Documentation URL**: Link to comprehensive agent documentation 4. Configure provider information: - **Organization**: Your organization or team name - **URL**: Website or contact URL 5. Configure capabilities by adding skills: Skills describe what your agent can do for capability-based discovery. External systems use skills to find agents with the right capabilities. 1. Click **\+ Add Skill** to define what this agent can do. 2. For each skill, configure: - **Skill ID** (required): Unique identifier using lowercase letters, numbers, and hyphens (for example, `fraud-analysis`, `order-lookup`) - **Skill Name** (required): Human-readable name displayed in agent directories (for example, "Fraud Analysis", "Order Lookup") - **Description** (required): Explain what this skill does and when to use it. Be specific about inputs, outputs, and use cases. - **Tags** (optional): Add tags for categorization and search. Use common terms like `fraud`, `security`, `finance`, `orders`. - **Examples** (optional): Click **\+ Add Example** to provide sample queries demonstrating how to invoke this skill. Examples help users understand how to interact with your agent. 3. Add multiple skills if your agent handles different types of requests. For example, a customer service agent might have separate skills for "Order Status Lookup", "Shipping Tracking", and "Returns Processing". 6. Click **Save Changes**. The updated metadata appears immediately at `https://your-agent-url/.well-known/agent-card.json`. For more about what these fields mean and how they’re used, see [Agent cards](../a2a-concepts/#agent-cards). ### [](#review-and-create)Review and create 1. Review all settings. 2. Configure the service account name (optional): A service account is automatically created to authenticate your agent with cluster resources. The default name follows the pattern `cluster--agent--sa`. You can customize this name (3-128 characters, cannot contain `<` or `>` characters). For details about default permissions and how to manage service accounts, see [Service account authorization](../concepts/#service-account-authorization). 3. Click **Create Agent**. 4. Wait for agent creation to complete. When your agent is running, Redpanda Cloud provides an HTTP endpoint URL with the pattern: https://.ai-agents.. You can use this URL to call your agent programmatically or integrate it with external systems. The **Inspector** tab in the Cloud Console automatically uses this URL to connect to your agent for testing. For programmatic access or external agent integration, see [Integration Patterns Overview](../integration-overview/). ## [](#test-your-agent)Test your agent 1. In the agent details view, click the **Inspector** tab. 2. Enter a test prompt. 3. Verify the agent: - Selects appropriate tools - Follows system prompt constraints - Returns expected output format 4. Iterate on the system prompt or tool selection as needed. For detailed testing strategies, see [Monitor Agent Activity](../monitor-agents/). ## [](#example-configurations)Example configurations Here are example configurations for different agent types: ### [](#simple-query-agent)Simple query agent - **Model**: GPT-5 Mini (fast, cost-effective) - **Tools**: Single MCP server with `get_orders` tool - **Max iterations**: 10 - **Use case**: Customer order lookups ### [](#complex-analytics-agent)Complex analytics agent - **Model**: Claude Sonnet 4.5 (balanced) - **Tools**: Multiple servers with data query, aggregation, and formatting tools - **Max iterations**: 30 - **Use case**: Multi-step data analysis ### [](#multi-agent-orchestrator)Multi-agent orchestrator - **Model**: Claude Opus 4.5 (advanced reasoning) - **Tools**: Agent delegation tools - **Subagents**: Order Agent, Inventory Agent, Customer Agent - **Max iterations**: 20 - **Use case**: E-commerce operations ## [](#next-steps)Next steps - [Integration Patterns Overview](../integration-overview/) - [System Prompt Best Practices](../prompt-best-practices/) - [Create an MCP Tool](../../mcp/remote/create-tool/) - [Agent Architecture Patterns](../architecture-patterns/) - [Troubleshoot AI Agents](../troubleshooting/) --- # Page 9: Get Started with AI Agents **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/get-started-index.md --- # Get Started with AI Agents --- title: Get Started with AI Agents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/get-started-index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/get-started-index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/get-started-index.adoc description: Get started with declarative AI agents in Redpanda Cloud. Connect tools, configure behavior, and deploy without writing agent code. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Start here to create your first declarative AI agent. Select an LLM, define behavior, and connect tools from built-in connectors. - [AI Agents Overview](../overview/) Learn how Redpanda Cloud agents use a declarative approach backed by 300+ built-in connectors to replace custom agent code. - [Agent Concepts](../concepts/) Understand how declaratively configured agents execute reasoning loops, manage context, invoke tools, and handle errors. - [AI Agent Quickstart](../quickstart/) Create your first AI agent in Redpanda Cloud that generates and publishes event data through natural language commands. - [Learn Multi-Tool Agent Orchestration](../tutorials/customer-support-agent/) Learn how agents coordinate multiple tools, make decisions based on conversation context, and handle errors through building a customer support agent. - [Build Multi-Agent Systems for Transaction Dispute Resolution](../tutorials/transaction-dispute-resolution/) Learn how to build multi-agent systems with domain separation, handle sensitive financial data, and monitor multi-agent execution through transaction investigation. --- # Page 10: Agent Integration **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/integration-index.md --- # Agent Integration --- title: Agent Integration latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/integration-index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/integration-index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/integration-index.adoc description: Connect agents to external applications, pipelines, and other systems. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Choose integration patterns and connect agents to your systems. - [Integration Patterns Overview](../integration-overview/) Choose the right integration pattern for connecting agents, pipelines, and external applications. - [Pipeline Integration Patterns](../pipeline-integration-patterns/) Build Redpanda Connect pipelines that invoke agents for event-driven processing and streaming enrichment. - [A2A Protocol](../a2a-concepts/) Learn how the A2A protocol enables agent discovery and communication. --- # Page 11: Integration Patterns Overview **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/integration-overview.md --- # Integration Patterns Overview --- title: Integration Patterns Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/integration-overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/integration-overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/integration-overview.adoc description: Choose the right integration pattern for connecting agents, pipelines, and external applications. page-topic-type: best-practices personas: agent_developer, streaming_developer, app_developer, data_engineer learning-objective-1: Choose the integration pattern that fits your use case learning-objective-2: Apply appropriate authentication for internal versus external integration learning-objective-3: Select the right communication protocol for your integration scenario page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Redpanda Cloud supports multiple integration patterns for agents, pipelines, and external applications. Choose the pattern that matches your integration scenario. After reading this page, you will be able to: - Choose the integration pattern that fits your use case - Apply appropriate authentication for internal versus external integration - Select the right communication protocol for your integration scenario ## [](#integration-scenarios)Integration scenarios Redpanda Cloud supports three primary integration scenarios based on who initiates the call and where the caller is located: | Scenario | Description | When to Use | Guide | | --- | --- | --- | --- | | Agent needs capabilities | Your agent invokes MCP tools to fetch data, call APIs, or access external systems on-demand | Agent-initiated, synchronous, interactive workflows | MCP Tool Patterns | | Pipeline processes events | Your Redpanda Connect pipeline invokes agents for each event in a stream using the a2a_message processor | Event-driven, automated, high-volume stream processing | Pipeline Integration Patterns | | External system calls agent | Your application or agent (hosted outside Redpanda Cloud) calls Redpanda Cloud agents using the A2A protocol | Backend services, CLI tools, custom UIs, multi-platform agent workflows | A2A Protocol | ## [](#common-use-cases-by-pattern)Common use cases by pattern Each integration pattern serves different scenarios based on how data flows and who initiates the interaction. ### [](#agent-needs-capabilities)Agent needs capabilities (MCP tools) Use MCP tools when your agent needs on-demand access to data or capabilities. The agent decides when to invoke tools as part of its reasoning process. It waits for responses before continuing. This pattern works well for interactive workflows: customer support lookups, approval flows, or context-aware chatbots. Avoid MCP tools for high-volume stream processing or automated workflows without user interaction. Use pipeline-initiated integration instead. For implementation details, see [MCP Tool Patterns](../../mcp/remote/tool-patterns/). ### [](#pipeline-processes-events)Pipeline processes events (`a2a_message`) Use the `a2a_message` processor when your pipeline needs to invoke agents for every event in a stream. The pipeline controls when agents execute. This pattern is ideal for automated, high-volume processing where each event requires AI reasoning. Common scenarios include real-time fraud detection, sentiment scoring for customer reviews, and content moderation that classifies and routes content. For implementation details, see [Pipeline Integration Patterns](../pipeline-integration-patterns/). ### [](#external-system-calls-agent)External system calls agent Use external integration when your applications, services, or agents hosted outside Redpanda Cloud need to call Redpanda Cloud agents. External systems send requests using the A2A protocol and receive responses synchronously. This works for backend services, CLI tools, custom UIs, and agents hosted on other platforms. Common scenarios include backend services analyzing data as part of workflows, CLI tools invoking agents for batch tasks, custom UIs displaying agent responses, CRM agents coordinating with Redpanda agents, and multi-platform workflows spanning different infrastructure. To learn how the A2A protocol enables this integration, see [A2A Protocol](../a2a-concepts/). ## [](#pattern-comparison)Pattern comparison The following table compares the two primary internal integration patterns: | Criterion | Agents Invoking MCP Tools | Pipelines Calling Agents | | --- | --- | --- | | Trigger | User question or agent decision | Event arrival in topic | | Frequency | Ad-hoc, irregular, as needed | Continuous, every event | | Latency | Low (agent waits for response) | Higher (async acceptable) | | Control Flow | Agent decides when to invoke | Pipeline decides when to invoke | | Use Case | "Fetch me data", "Run this query" | "Process this stream", "Enrich all events" | | Human in Loop | Often yes (user-driven) | Often no (automated) | ## [](#security-considerations-for-external-integration)Security considerations for external integration When integrating external applications with Redpanda Cloud agents, protect credentials and tokens. ### [](#protect-service-account-credentials)Protect service account credentials Store the client ID and secret in secure credential stores, not in code. Use environment variables or [secrets management](../../../security/secrets/). Rotate credentials if compromised and restrict access based on the principle of least privilege. ### [](#protect-access-tokens)Protect access tokens Access tokens grant full access to the agent. Anyone with a valid token can send requests, receive responses, and consume agent resources (subject to rate limits). Treat access tokens like passwords and never log them or include them in error messages. ## [](#next-steps)Next steps - [A2A Protocol](../a2a-concepts/) - [MCP Tool Patterns](../../mcp/remote/tool-patterns/) - [Pipeline Integration Patterns](../pipeline-integration-patterns/) --- # Page 12: Monitor Agent Activity **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/monitor-agents.md --- # Monitor Agent Activity --- title: Monitor Agent Activity latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/monitor-agents page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/monitor-agents.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/monitor-agents.adoc description: Monitor agent execution, analyze conversation history, track token usage, and debug issues using Inspector, Transcripts, and agent data topics. page-topic-type: how-to personas: agent_developer, platform_admin learning-objective-1: Verify agent behavior using the Inspector tab learning-objective-2: Track token usage and performance metrics learning-objective-3: Debug agent execution using Transcripts page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Use monitoring to track agent performance, analyze conversation patterns, debug execution issues, and optimize token costs. After reading this page, you will be able to: - Verify agent behavior using the **Inspector** tab - Track token usage and performance metrics - Debug agent execution using **Transcripts** For conceptual background on traces and observability, see [Transcripts and AI Observability](../../observability/concepts/). ## [](#prerequisites)Prerequisites You must have a running agent. If you do not have one, see [AI Agent Quickstart](../quickstart/). ## [](#debug-agent-execution-with-transcripts)Debug agent execution with Transcripts The **Transcripts** view shows execution traces with detailed timing, errors, and performance metrics. Use this view to debug issues, verify agent behavior, and monitor performance in real-time. ### [](#navigate-the-transcripts-view)Navigate the transcripts view 1. Click **Transcripts**. 2. Select a recent transcript from your agent executions. The transcripts view displays: - **Timeline**: Visual history of recent executions with success/error indicators - **Trace list**: Hierarchical view of traces and spans - **Summary panel**: Detailed metrics when you select a transcript #### [](#timeline-visualization)Timeline visualization The timeline shows execution patterns over time: - Green bars: Successful executions - Red bars: Failed executions with errors - Gray bars: Incomplete traces or traces still loading - Time range: Displays the last few hours by default Use the timeline to spot patterns like error clusters, performance degradation over time, or gaps indicating downtime. #### [](#trace-hierarchy)Trace hierarchy The trace list shows nested operations with visual duration bars indicating how long each operation took. Click the expand arrows (▶) to drill into nested spans and see the complete execution flow. For details on span types, see [Agent trace hierarchy](../../observability/concepts/#agent-trace-hierarchy). #### [](#summary-panel)Summary panel When you select a transcript, the summary panel shows: - Duration: Total execution time for this request - Total Spans: Number of operations in the trace - Token Usage: Input tokens, output tokens, and total (critical for cost tracking) - LLM Calls: How many times the agent called the language model - Service: The agent identifier - Conversation ID: Links to session data topics ### [](#check-agent-health)Check agent health Use the **Transcripts** view to verify your agent is healthy. Look for consistent green bars in the timeline, which indicate successful executions. Duration should stay within your expected range, while token usage remains stable without unexpected growth. Several warning signs indicate problems. Red bars in the timeline mean errors or failures that need investigation. When duration increases over time, your context window may be growing or tool calls could be slowing down. Many LLM calls for simple requests often signal that the agent is stuck in loops or making unnecessary iterations. If you see missing transcripts, the agent may be stopped or encountering deployment issues. Pay attention to patterns across multiple executions. When all recent transcripts show errors, start by checking agent status, MCP server connectivity, and system prompt configuration. A spiky timeline that alternates between success and error typically points to intermittent tool failures or external API issues. If duration increases steadily over a session, your context window is likely filling up. Clear the conversation history to reset it. High token usage combined with relatively few LLM calls usually means tool results are large or your system prompts are verbose. ### [](#debug-with-transcripts)Debug with Transcripts Use **Transcripts** to diagnose specific issues: If the agent is not responding: 1. Check the timeline for recent transcripts. If none appear, the agent may be stopped. 2. Verify agent status in the main **AI Agents** view. 3. Look for error transcripts with deployment or initialization failures. If the agent fails during execution: 1. Select the failed transcript (red bar in timeline). 2. Expand the trace hierarchy to find the tool invocation span. 3. Check the span details for error messages. 4. Cross-reference with MCP server status. If performance is slow: 1. Compare duration across multiple transcripts in the summary panel. 2. Look for specific spans with long durations (wide bars in trace list). 3. Check if LLM calls are taking longer than expected. 4. Verify tool execution time by examining nested spans. ### [](#track-token-usage-and-costs)Track token usage and costs View token consumption in the **Summary** panel when you select a transcript. The breakdown shows input tokens (everything sent to the LLM including system prompt, conversation history, and tool results), output tokens (what the LLM generates in agent responses), and total tokens as the sum of both. Calculate cost per request: Cost = (input\_tokens x input\_price) + (output\_tokens x output\_price) Example: GPT-5.2 with 4,302 input tokens and 1,340 output tokens at $0.00000175 per input token and $0.000014 per output token costs $0.026 per request. For cost optimization strategies, see [Cost calculation](../concepts/#cost-calculation). ## [](#test-agent-behavior-with-inspector)Test agent behavior with Inspector The **Inspector** tab provides real-time conversation testing. Use it to test agent responses interactively and verify behavior before deploying changes. ### [](#access-inspector)Access Inspector 1. Navigate to **Agentic AI** > **AI Agents** in the Redpanda Cloud Console. 2. Click your agent name. 3. Open the **Inspector** tab. 4. Enter test queries and review responses. 5. Check the conversation panel to see tool calls. 6. Start a new session to test fresh conversations or click **Clear context** to reset history. ### [](#testing-best-practices)Testing best practices Test your agents systematically by exploring edge cases and potential failure scenarios. Begin with boundary testing. Requests at the edge of agent capabilities verify that scope enforcement works correctly. Error handling becomes clear when you request unavailable data and observe whether the agent degrades gracefully. Even with proper system prompt constraints, testing confirms that your agent responds appropriately to edge cases. Monitor iteration counts during complex requests to ensure they complete within your configured limits. Ambiguous or vague queries reveal whether the agent asks clarifying questions or makes risky assumptions. Throughout testing, track token usage per request to estimate costs and identify which query patterns consume the most resources. ## [](#next-steps)Next steps - [Transcripts and AI Observability](../../observability/concepts/) - [Troubleshoot AI Agents](../troubleshooting/) - [Agent Concepts](../concepts/) --- # Page 13: AI Agents Overview **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/overview.md --- # AI Agents Overview --- title: AI Agents Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/overview.adoc description: Learn how Redpanda Cloud agents use a declarative approach backed by 300+ built-in connectors to replace custom agent code. page-topic-type: overview personas: evaluator, agent_developer, app_developer, streaming_developer learning-objective-1: Describe what AI agents are and their essential components learning-objective-2: Explain how Redpanda Cloud streaming infrastructure benefits agent architectures learning-objective-3: Identify use cases where Redpanda Cloud agents provide value page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). AI agents in Redpanda Cloud take a declarative approach: instead of writing Python or JavaScript agent code, you declare the behavior you want by selecting an LLM, writing a system prompt, and connecting tools drawn from 300+ built-in Redpanda Connect connectors. The framework handles execution, tool orchestration, and scaling, backed by real-time streaming infrastructure and built-in filtering and data enrichment. After reading this page, you will be able to: - Describe what AI agents are and their essential components - Explain how Redpanda Cloud streaming infrastructure benefits agent architectures - Identify use cases where Redpanda Cloud agents provide value ## [](#what-is-an-ai-agent)What is an AI agent? An AI agent is a system built around a [large language model (LLM)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#large-language-model-llm) that interprets user intent, selects the right tools, and chains multiple steps into a workflow. In Redpanda Cloud, agents are declarative: you configure what the agent should do (its role, constraints, and available tools) rather than writing imperative agent code. This is possible because Redpanda Connect provides the connectors and robust data processing capabilities that the framework orchestrates for you. ## [](#how-agents-work)How agents work When you create an agent, you configure the components through the Redpanda Cloud Console rather than writing code: - **System prompt**: Defines the agent’s role, responsibilities, and constraints - **LLM**: Interprets user intent and decides which tools to invoke - **Tools**: External capabilities exposed through the [Model Context Protocol (MCP)](../../mcp/remote/overview/) - **Context**: Conversation history, tool results, and real-time events from Redpanda topics Agents can invoke Redpanda Connect components as tools on-demand. Redpanda Connect pipelines can also invoke agents for event-driven processing. This bidirectional integration supports both interactive workflows and automated streaming. When a user makes a request, the LLM receives the system prompt and context, decides which tools to invoke, and processes the results. This cycle repeats until the task completes. For a deeper understanding of how agents execute, manage context, and maintain state, see [Agent Concepts](../concepts/). ## [](#key-benefits)Key benefits A declarative approach means you configure agent behavior instead of coding it, with access to 300+ built-in Redpanda Connect connectors for data sources, APIs, and services. Real-time streaming data ensures agents access live events instead of batch snapshots. [Remote MCP](../../mcp/remote/overview/) support enables standardized tool access. Managed infrastructure handles deployment, scaling, and security for you. Low-latency execution means tools run close to your data. Integrated secrets management securely stores API keys and credentials. ## [](#use-cases)Use cases AI agents in Redpanda Cloud unlock new capabilities across multiple fields. ### [](#for-ai-agent-developers)For AI agent developers Build agents grounded in real-time data instead of static snapshots. Connect your agent to live order status, inventory levels, and customer history so responses reflect current business state, not stale training data. ### [](#for-application-developers)For application developers Add conversational AI to existing applications without rebuilding your backend. Expose your services as MCP tools and let agents orchestrate complex multi-step workflows through natural language. ### [](#for-streaming-developers)For streaming developers Process every event with AI reasoning at scale. Invoke agents automatically from pipelines for fraud detection, content moderation, or sentiment analysis. No batch jobs, no delayed insights. ## [](#limitations)Limitations - Agents are available only on [BYOC clusters](../../../get-started/cluster-types/byoc/) - MCP servers must be hosted in Redpanda Cloud clusters - Cross-agent calling between separate agents hosted in Redpanda Cloud is not currently supported (use internal subagents for delegation within a single agent) ## [](#next-steps)Next steps - [AI Agent Quickstart](../quickstart/) - [Agent Concepts](../concepts/) - [Agent Architecture Patterns](../architecture-patterns/) - [Integration Patterns Overview](../integration-overview/) - [Create an Agent](../create-agent/) --- # Page 14: Pipeline Integration Patterns **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/pipeline-integration-patterns.md --- # Pipeline Integration Patterns --- title: Pipeline Integration Patterns latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/pipeline-integration-patterns page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/pipeline-integration-patterns.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/pipeline-integration-patterns.adoc description: Build Redpanda Connect pipelines that invoke agents for event-driven processing and streaming enrichment. page-topic-type: best-practices personas: streaming_developer, agent_developer learning-objective-1: Identify when pipelines should call agents for stream processing learning-objective-2: Design event-driven agent invocation using the a2a_message processor learning-objective-3: Implement streaming enrichment with AI-generated fields page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Build Redpanda Connect pipelines that invoke agents for automated, event-driven processing. Pipelines use the `a2a_message` processor to call agents for each event in a stream when you need AI reasoning, classification, or enrichment at scale. After reading this page, you will be able to: - Identify when pipelines should call agents for stream processing - Design event-driven agent invocation using the `a2a_message` processor - Implement streaming enrichment with AI-generated fields This page focuses on pipelines calling agents (pipeline-initiated integration). For agents invoking MCP tools, see [Agent needs capabilities](../integration-overview/#agent-needs-capabilities). For external applications calling agents, see [External system calls agent](../integration-overview/#external-system-calls-agent). ## [](#how-pipelines-invoke-agents)How pipelines invoke agents Pipelines use the [`a2a_message`](../../../develop/connect/components/processors/a2a_message/) processor to invoke agents for each event in a stream. The processor uses the [A2A protocol](../a2a-concepts/) to discover and communicate with agents. When the `a2a_message` processor receives an event, it sends the event data to the specified agent along with any prompt you provide. The agent processes the event using its reasoning capabilities and returns a response. The processor then adds the agent’s response to the event for further processing or output. The pipeline determines when to invoke agents based on events, not agent reasoning. ## [](#when-to-use-this-pattern)When to use this pattern Use the `a2a_message` processor when pipelines need AI reasoning for every event in a stream. The `a2a_message` processor is appropriate when: - **Every event needs AI analysis:** Each message requires reasoning, classification, or decision-making. - **You need streaming enrichment:** Add AI-generated fields to events at scale. - **Processing is fully automated:** No human in the loop, event-driven workflows. - **Batch latency is acceptable:** Agent reasoning time is tolerable for your use case. - **You’re handling high-volume streams:** Processing thousands or millions of events. ## [](#use-cases)Use cases Use the `a2a_message` processor in pipelines for these common patterns. ### [](#event-driven-agent-invocation)Event-driven agent invocation Invoke agents automatically for each event: ```yaml # Event-driven agent invocation pipeline # Invokes an agent for each event in a stream input: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topics: [transactions] consumer_group: fraud-detector tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" pipeline: processors: - a2a_message: agent_card_url: "${AGENT_CARD_URL}" prompt: "Analyze this transaction: ${!content()}" output: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: fraud-alerts tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" ``` Replace `AGENT_CARD_URL` with your actual agent card URL. See [Agent card location](../a2a-concepts/#agent-card-location). **Use case:** Real-time fraud detection on every transaction. ### [](#streaming-data-enrichment)Streaming data enrichment Add AI-generated metadata to events: ```yaml processors: - branch: request_map: 'root = this.text' processors: - a2a_message: agent_card_url: "${AGENT_CARD_URL}" result_map: 'root.sentiment = content()' ``` Replace `AGENT_CARD_URL` with your actual agent card URL. See [Agent card location](../a2a-concepts/#agent-card-location). **Use case:** Add sentiment scores to every customer review in real-time. ### [](#asynchronous-workflows)Asynchronous workflows Process events in the background: ```yaml input: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topics: [daily-reports] consumer_group: report-analyzer tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" pipeline: processors: - a2a_message: agent_card_url: "${AGENT_CARD_URL}" prompt: "Summarize this report: ${!content()}" ``` Replace `AGENT_CARD_URL` with your actual agent card URL. See [Agent card location](../a2a-concepts/#agent-card-location). **Use case:** Nightly batch summarization of reports where latency is acceptable. ### [](#multi-agent-pipeline-orchestration)Multi-agent pipeline orchestration Chain multiple agents in sequence: ```yaml processors: - a2a_message: agent_card_url: "${TRANSLATOR_AGENT_URL}" - a2a_message: agent_card_url: "${SENTIMENT_AGENT_URL}" - a2a_message: agent_card_url: "${ROUTER_AGENT_URL}" ``` Replace the agent URL variables with your actual agent card URLs. See [Agent card location](../a2a-concepts/#agent-card-location). **Use case:** Translate feedback, analyze sentiment, then route to appropriate team. ### [](#agent-as-transformation-node)Agent as transformation node Use agent reasoning for complex transformations: ```yaml processors: - a2a_message: agent_card_url: "${AGENT_CARD_URL}" prompt: "Convert to SQL: ${!this.natural_language_query}" ``` Replace `AGENT_CARD_URL` with your actual agent card URL. See [Agent card location](../a2a-concepts/#agent-card-location). **Use case:** Convert natural language queries to SQL for downstream processing. ## [](#when-not-to-use-this-pattern)When not to use this pattern Do not use the `a2a_message` processor when: - Users need to interact with agents interactively. - The transformation is simple and does not require AI reasoning. - Agents need to dynamically decide what data to fetch based on context. For a detailed comparison between pipeline-initiated and agent-initiated integration patterns, see [Pattern comparison](../integration-overview/#pattern-comparison). ## [](#example-real-time-fraud-detection)Example: Real-time fraud detection This example shows a complete pipeline that analyzes every transaction with an agent. ### [](#pipeline-configuration)Pipeline configuration ```yaml # Fraud detection pipeline with score-based routing # Analyzes every transaction and routes to different topics based on fraud score input: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topics: [transactions] consumer_group: fraud-detector tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" pipeline: processors: - branch: request_map: | root.transaction_id = this.id root.amount = this.amount root.merchant = this.merchant root.user_id = this.user_id processors: - a2a_message: agent_card_url: "${AGENT_CARD_URL}" prompt: | Analyze this transaction for fraud: Amount: ${! json("amount") } Merchant: ${! json("merchant") } User: ${! json("user_id") } Return JSON: { "fraud_score": 0-100, "reason": "explanation", "recommend_block": true/false } result_map: | root = this root.fraud_analysis = content().parse_json().catch({}) - mapping: | root = this meta fraud_score = this.fraud_analysis.fraud_score output: switch: cases: - check: 'meta("fraud_score") >= 80' output: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: fraud-alerts-high tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" - check: 'meta("fraud_score") >= 50' output: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: fraud-alerts-medium tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" - output: redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: transactions-cleared tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${REDPANDA_USERNAME}" password: "${REDPANDA_PASSWORD}" ``` Replace `AGENT_CARD_URL` with your agent card URL. See [Agent card location](../a2a-concepts/#agent-card-location). This pipeline: - Consumes every transaction from the `transactions` topic. - Sends each transaction to the fraud detection agent using `a2a_message`. - Routes transactions to different topics based on fraud score. - Runs continuously, analyzing every transaction in real-time. ## [](#next-steps)Next steps - [MCP Tool Patterns](../../mcp/remote/tool-patterns/) - [Integration Patterns Overview](../integration-overview/) - [A2A Protocol](../a2a-concepts/) - [Processors](../../../develop/connect/components/processors/about/) --- # Page 15: System Prompt Best Practices **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/prompt-best-practices.md --- # System Prompt Best Practices --- title: System Prompt Best Practices latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/prompt-best-practices page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/prompt-best-practices.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/prompt-best-practices.adoc description: Write system prompts that produce reliable, predictable agent behavior through clear constraints and tool guidance. page-topic-type: best-practices personas: agent_developer, app_developer, streaming_developer learning-objective-1: Identify effective system prompt patterns for agent reliability learning-objective-2: Apply constraint patterns to prevent unintended agent behavior learning-objective-3: Evaluate system prompts for clarity and completeness page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Write system prompts that produce reliable, predictable agent behavior. Good prompts define scope, specify constraints, and guide tool usage. After reading this page, you will be able to: - Identify effective system prompt patterns for agent reliability - Apply constraint patterns to prevent unintended agent behavior - Evaluate system prompts for clarity and completeness ## [](#role-definition)Role definition Define what your agent does and the boundaries of its responsibilities. A clear role prevents scope creep and helps the agent refuse out-of-scope requests appropriately. ### [](#be-specific-about-agent-identity)Be specific about agent identity Define what the agent does, not what it is. Do ```text You are an order fulfillment agent for an e-commerce platform. You help customers track orders, update shipping addresses, and process returns. ``` Don’t ```text You are a helpful assistant. ``` ### [](#define-what-the-agent-does-and-doesnt-do)Define what the agent does and doesn’t do Explicitly state boundaries: what tasks the agent handles, what tasks it should refuse or delegate, and when to ask for human assistance. ```text Responsibilities: - Track customer orders - Update shipping addresses - Process returns up to $500 Do not: - Provide product recommendations (redirect to website) - Process refunds above $500 (escalate to manager) - Access orders from other customers ``` ## [](#tool-specification)Tool specification Tell the agent which tools are available and when to use them. Explicit tool guidance reduces errors and prevents the agent from guessing when to invoke capabilities. ### [](#list-available-tools)List available tools Name each tool the agent can use: ```text Available tools: - get_customer_orders: Retrieve order history for a customer - update_order_status: Change order state (shipped, delivered, canceled) - calculate_refund: Compute refund amount based on return policy ``` ### [](#explain-when-to-use-each-tool)Explain when to use each tool Provide decision criteria for tool selection. Do ```text Use get_customer_orders when: - Customer asks about order history - You need order details to answer a question Use update_order_status only when: - Customer explicitly requests a cancellation - You have confirmed the order is eligible for status changes ``` Don’t ```text Use the tools as needed. ``` ## [](#constraints-and-safety)Constraints and safety Set explicit boundaries to prevent unintended agent behavior. ### [](#define-data-boundaries)Define data boundaries Specify what data the agent can access: ```text Data access: - Only orders from the last 90 days - Only data for the authenticated customer - No access to employee records or internal systems ``` ### [](#set-response-guidelines)Set response guidelines Control output format and content: ```text Response guidelines: - Present order details as tables - Always include order numbers in responses - State the analysis time window when showing trends - If you cannot complete a task, explain why and suggest alternatives ``` ## [](#context-and-conversation-management)Context and conversation management Guide the agent on how to handle unclear requests and stay within conversation scope. These guidelines keep interactions focused and prevent the agent from making assumptions. ### [](#handle-ambiguous-requests)Handle ambiguous requests Guide the agent on how to clarify: ```text When request is unclear: 1. Ask clarifying questions 2. Suggest common interpretations 3. Do not guess customer intent ``` ### [](#define-conversation-boundaries)Define conversation boundaries Set limits on conversation scope: ```text Conversation scope: - Answer questions about orders, shipping, and returns - Do not provide product recommendations (redirect to website) - Do not engage in general conversation unrelated to orders ``` ## [](#error-handling)Error handling Guide agents to handle failures gracefully through clear prompt instructions. Agent errors fall into two categories: tool failures (external system issues) and reasoning failures (agent confusion or limits). ### [](#tool-failure-types)Tool failure types Tools can fail for multiple reasons. Transient failures include network timeouts, temporary unavailability, and rate limits. Permanent failures include invalid parameters, permission denied, and resource not found errors. Partial failures occur when tools return incomplete data or warnings. ### [](#graceful-degradation)Graceful degradation Design prompts so agents continue when tools fail: Example prompt guidance for graceful degradation ```text When a tool fails: 1. Attempt an alternative tool if available 2. If no alternative exists, explain the limitation 3. Offer partial results if you retrieved some data before failure 4. Do not make up information to fill gaps ``` Agents that degrade gracefully provide value even when systems are partially down. Implement retries in tools, not in agent prompts. The tool should retry network calls automatically before returning an error to the agent. ### [](#escalation-patterns)Escalation patterns Some failures require human intervention. Budget exceeded errors occur when max iterations are reached before task completion. Insufficient tools means no tool is available for the required action. Ambiguous requests happen when the agent can’t determine user intent after clarification attempts. Data access failures occur when multiple tools fail with no alternative path. Design prompts to recognize escalation conditions: Example prompt guidance for escalation ```text When you cannot complete the task: 1. Explain what you tried and why it didn't work 2. Tell the user what information or capability is missing 3. Suggest how they can help (provide more details, contact support, etc.) ``` ### [](#common-error-scenarios)Common error scenarios Include guidance for specific error types in your system prompt: **Timeout during tool execution:** When a tool takes longer than the agent timeout limit, the agent receives a timeout error in context. The agent should explain the delay to the user and suggest a retry. **Invalid tool parameters:** When the agent passes a wrong data type or missing required field, the tool returns a validation error. The agent should reformat parameters and retry, or ask the user for correct input. **Authentication failure:** When a tool can’t access a protected resource, it returns a permission denied error. The agent should explain the access limitation without exposing credentials or internal details. ## [](#output-formatting)Output formatting Control how the agent presents information to users. Consistent formatting makes responses easier to read and ensures critical information appears in predictable locations. ### [](#specify-structure)Specify structure Define how the agent presents information: ```text Output format: - Use tables for multiple items - Use bulleted lists for steps or options - Use code blocks for tracking numbers or order IDs - Include units (dollars, kilograms) in all numeric values ``` ## [](#evaluation-and-testing)Evaluation and testing Test system prompts systematically to verify behavior matches intent. Follow this process to validate prompts: | Test Type | What to Test | Example | | --- | --- | --- | | Boundary cases | Requests at edge of agent scope | Just inside: "Track order 123" (should work)Just outside: "Recommend products" (should refuse)Ambiguous: "Help with my order" (should clarify) | | Tool selection | Agent chooses correct tools | Create requests requiring each toolTest multiple applicable tools (verify best choice)Test no applicable tools (verify explanation) | | Constraint compliance | Agent follows "never" rules | Explicit forbidden: "Show payment methods"Indirect forbidden: "What’s the credit card number?"Verify refusal with explanation | | Error handling | Tool failures and limitations | Disable MCP server tool temporarilySend request requiring disabled toolVerify graceful response (no fabricated data) | | Ambiguous requests | Clarification behavior | Vague: "Check my stuff"Verify specific questions: "Orders, returns, or account?"Ensure no guessing of user intent | ## [](#design-principles)Design principles Apply these principles when writing system prompts to create reliable agent systems. ### [](#design-for-inspectability)Design for inspectability Make agent reasoning transparent so you can debug by reading conversation history. Your system prompt should encourage clear explanations: ```text Response format: - State what you're doing before calling each tool - Explain why you chose this tool over alternatives - If a tool fails, describe what went wrong and what you tried ``` Log all tool invocations with parameters, record tool results in structured format, and store agent responses with reasoning traces. Opaque agents that "just work" are impossible to fix when they break. ### [](#design-for-testability)Design for testability Test agents with boundary cases (requests at the edge of agent capability), error injection (simulate tool failures to verify graceful degradation), context limits (long conversations approaching token limits), and ambiguous requests (unclear user input to verify clarification behavior). Use the systematic testing approach in [Evaluation and testing](#evaluation-and-testing). ### [](#design-for-cost-control)Design for cost control Write clear system prompts that reduce wasted iterations. Vague prompts cause agent confusion and unnecessary tool calls. Each wasted iteration costs tokens. Guide agents to: - Request only needed data from tools (use pagination, filters) - Avoid redundant tool calls (check context before calling) - Stop when the task completes (don’t continue exploring) For cost management strategies including iteration limits and monitoring, see [Agent Concepts](../concepts/). ## [](#example-system-prompt-with-all-best-practices)Example: System prompt with all best practices This complete example demonstrates all the patterns described in this guide: ```text You are an order analytics agent for Acme E-commerce. Responsibilities: - Answer questions about customer order trends - Analyze order data from Redpanda topics - Provide insights on order patterns Available tools: - get_customer_orders: Retrieve order history (parameters: customer_id, start_date, end_date) - analyze_recent_orders: Compute order statistics (parameters: time_window, group_by) When to use tools: - Use get_customer_orders for individual customer queries - Use analyze_recent_orders for trend analysis across multiple orders Never: - Expose customer payment information or addresses - Analyze data older than 90 days unless explicitly requested - Make business recommendations without data to support them Data access: - Only orders from the authenticated customer account - Maximum of 90 days of historical data Response guidelines: - Present structured data as tables - Always state the analysis time window - Include order counts in trend summaries - If data is unavailable, explain the limitation When request is unclear: - Ask which time period to analyze - Confirm whether to include canceled orders - Do not assume customer intent ``` ## [](#common-anti-patterns)Common anti-patterns Avoid these patterns that lead to unpredictable agent behavior. ### [](#vague-role-definition)Vague role definition Define specific agent responsibilities and scope. Generic role definitions fail because the agent has no guidance on what tasks to handle, what requests to refuse, or when to escalate to humans. Don’t ```text You are a helpful AI assistant. ``` This doesn’t constrain behavior or set expectations. The agent might attempt tasks outside its capabilities or handle requests it should refuse. Do ```text You are an order fulfillment agent for an e-commerce platform. You help customers track orders, update shipping addresses, and process returns up to $500. Do not: - Provide product recommendations (redirect to website) - Process refunds above $500 (escalate to manager) ``` Clear scope prevents the agent from attempting out-of-scope tasks and defines escalation boundaries. ### [](#missing-constraints)Missing constraints Set explicit boundaries on data access and operations. Without constraints, agents may access sensitive data, process excessive historical records, or perform operations beyond their authorization. Don’t ```text You can access customer data to help answer questions. ``` This provides no boundaries on what data, how much history, or which customers. The agent might retrieve payment information, access other customers' data, or query years of records. Do ```text Data access: - Only orders from the authenticated customer - Maximum of 90 days of historical data - No access to payment methods or billing addresses ``` Explicit boundaries prevent unauthorized access and scope queries to reasonable limits. ### [](#implicit-tool-selection)Implicit tool selection Specify when to use each tool with clear decision criteria. Vague tool guidance forces agents to guess based on tool names alone, leading to wrong tool choices, unnecessary calls, or skipped tools. Don’t ```text Use the available tools to complete tasks. ``` The agent must guess which tool applies when. This leads to calling the wrong tool first, calling all tools unnecessarily, or fabricating answers without using tools. Do ```text Use get_customer_orders when: - Customer asks about order history - You need order details to answer a question Use update_order_status only when: - Customer explicitly requests a cancellation - You have confirmed the order is eligible for status changes ``` Decision criteria enable reliable tool selection based on request context. ## [](#next-steps)Next steps - [AI Agent Quickstart](../quickstart/) - [AI Agents Overview](../overview/) - [MCP Tool Design](../../mcp/remote/best-practices/) --- # Page 16: AI Agent Quickstart **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/quickstart.md --- # AI Agent Quickstart --- title: AI Agent Quickstart latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/quickstart page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/quickstart.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/quickstart.adoc description: Create your first AI agent in Redpanda Cloud that generates and publishes event data through natural language commands. page-topic-type: tutorial personas: agent_developer, evaluator learning-objective-1: Create an AI agent in Redpanda Cloud that uses MCP tools learning-objective-2: Configure the agent with a system prompt and model selection learning-objective-3: Test the agent by generating and publishing events through natural language page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This quickstart helps you build your first AI agent in Redpanda Cloud. You’ll create an agent that understands natural language requests and uses MCP tools to generate and publish event data to Redpanda topics. After completing this quickstart, you will be able to: - Create an AI agent in Redpanda Cloud that uses MCP tools - Configure the agent with a system prompt and model selection - Test the agent by generating and publishing events through natural language ## [](#prerequisites)Prerequisites - A [BYOC cluster](../../../get-started/cluster-types/byoc/) (agents are not available on Dedicated or Serverless clusters) - [AI Gateway configured](../../ai-gateway/gateway-quickstart/) with at least one LLM provider enabled (OpenAI, Anthropic, or Google AI) - Completed the [Remote MCP Quickstart](../../mcp/remote/quickstart/) to create an MCP server with the following tools deployed: - `generate_input`: Generates fake user event data - `redpanda_output`: Publishes data to Redpanda topics ## [](#what-youll-build)What you’ll build An Event Data Manager agent that: - Generates fake user event data (logins, purchases, page views) - Publishes events to Redpanda topics - Understands natural language requests like "Generate 5 login events and publish them" The agent orchestrates the `generate_input` and `redpanda_output` tools you created in the Remote MCP quickstart. ## [](#create-the-agent)Create the agent 1. Log in to the [Redpanda Cloud Console](https://cloud.redpanda.com/). 2. Navigate to your cluster and click **Agentic AI** > **AI Agents** in the left navigation. 3. Click **Create Agent**. 4. Configure basic settings: - **Display Name**: `event-data-manager` - **Description**: `Generates and publishes fake user event data to Redpanda topics` - **Resource Tier**: Select **XSmall** (sufficient for this quickstart) 5. Select your AI Gateway and model: - **AI Gateway**: Select the gateway you configured (contains provider and API key configuration) - **Provider**: Select a provider available in your gateway (OpenAI, Anthropic, or Google) - **Model**: Choose any balanced model from the dropdown 6. Write the system prompt: ```text You are an Event Data Manager agent for Redpanda Cloud. Your responsibilities: - Generate realistic fake user event data - Publish events to Redpanda topics - Help users test streaming data pipelines Available tools: - generate_input: Creates fake user events (login, logout, purchase, view) - redpanda_output: Publishes data to the events topic When a user asks you to generate events: 1. Use generate_input to create the event data 2. Use redpanda_output to publish the events to Redpanda 3. Confirm how many events were published Always publish events after generating them unless the user explicitly says not to. Response format: - State what you're doing before calling each tool - Show the generated event data - Confirm successful publication with a count ``` 7. Select MCP tools: - Click **Add MCP Server** - Select the `event-data-generator` server (created in the MCP quickstart) - Check both tools: - `generate_input` - `redpanda_output` 8. Set execution parameters: - **Max Iterations**: `30` (allows multiple tool calls per request) 9. Review your configuration and click **Create Agent**. > 💡 **TIP** > > A service account is automatically created to authenticate your agent with cluster resources. For details about default permissions and how to manage service accounts, see [Service account authorization](../concepts/#service-account-authorization). 10. Wait for the agent status to change from **Starting** to **Running**. ## [](#test-your-agent)Test your agent Now test your agent with natural language requests. 1. In the agent details view, open the **Inspector** tab. 2. Try these example requests: Generate and publish 3 events Generate 3 user events and publish them to the events topic. The agent should respond with these steps: 1. Call `generate_input` to create 3 fake user events. 2. Call `redpanda_output` to publish them to the `events` topic. 3. Confirm the events were published. You should see the agent’s reasoning and the tool execution results. Generate specific event types Create 5 login events for testing and publish them to Redpanda. The agent understands the request requires login events specifically and generates appropriate test data. Generate events without publishing Show me what 3 sample purchase events would look like, but don't publish them yet. The agent calls only `generate_input` and displays the data without publishing. 3. Navigate to **Topics** in the left navigation to verify events were published to the `events` topic. ## [](#iterate-on-your-agent)Iterate on your agent Try modifying the agent to change its behavior: 1. Click **Edit configuration** in the agent details view. 2. Update the system prompt to change how the agent responds. For example: - Add constraints: "Never publish more than 10 events at once" - Change output format: "Always format events as a table" - Add validation: "Before publishing, show the user the generated data and ask for confirmation" 3. Click **Save** to update the agent. 4. Test your changes in the **Inspector** tab. ## [](#troubleshoot)Troubleshoot For comprehensive troubleshooting guidance, see [Troubleshoot AI Agents](../troubleshooting/). Common quickstart issue: **Events not appearing in topic:** Verify the `events` topic exists and review the MCP server logs for publishing errors. ## [](#next-steps)Next steps You’ve created an agent that orchestrates MCP tools through natural language. Explore more: - [AI Agents Overview](../overview/) - [Create an Agent](../create-agent/) - [System Prompt Best Practices](../prompt-best-practices/) - [Agent Architecture Patterns](../architecture-patterns/) - [MCP Tool Patterns](../../mcp/remote/tool-patterns/) --- # Page 17: Troubleshoot AI Agents **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/troubleshooting.md --- # Troubleshoot AI Agents --- title: Troubleshoot AI Agents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/troubleshooting page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/troubleshooting.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/troubleshooting.adoc description: Diagnose and fix common issues with AI agents including deployment failures, runtime behavior problems, and tool execution errors. page-topic-type: troubleshooting personas: agent_developer, app_developer, streaming_developer learning-objective-1: Diagnose deployment failures and resource allocation errors learning-objective-2: Resolve runtime behavior issues including tool selection and iteration limits learning-objective-3: Fix tool execution problems and authentication failures page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Use this page to diagnose and fix common issues with AI agents, including deployment failures, runtime behavior problems, tool execution errors, and integration issues. ## [](#deployment-issues)Deployment issues Fix issues that prevent agents from connecting to required resources. ### [](#mcp-server-connection-failures)MCP server connection failures **Symptoms:** Agent starts but the tools don’t respond or return connection errors. **Causes:** - MCP server stopped or crashed after agent creation - Network connectivity issues between agent and MCP server - MCP server authentication or permission issues **Solution:** 1. Verify MCP server status in **Agentic AI** > **Remote MCP**. 2. Check MCP server logs for errors. 3. Restart the MCP server if needed. 4. Verify agent has permission to access the MCP server. **Prevention:** - Monitor MCP server health - Use appropriate retry logic in tools ## [](#runtime-behavior-issues)Runtime behavior issues Resolve problems with agent decision-making, tool selection, and response generation. ### [](#agent-not-calling-tools)Agent not calling tools **Symptoms:** Agent responds without calling any tools, or fabricates information instead of using tools. **Causes:** - System prompt doesn’t clearly specify when to use tools - Tool descriptions are vague or missing - LLM model lacks sufficient reasoning capability - Max iterations is too low **Solution:** 1. Strengthen tool usage guidance in your system prompt: ```text ALWAYS use get_order_status when customer mentions an order ID. NEVER respond about order status without calling the tool first. ``` 2. Review tool descriptions in your MCP server configuration. 3. Use a more capable model from the supported list for your gateway. 4. Increase max iterations if the agent is stopping before reaching tools. **Prevention:** - Write explicit tool selection criteria in system prompts - Test agents with the [systematic testing approach](../prompt-best-practices/#evaluation-and-testing) - Use models appropriate for your task complexity ### [](#calling-wrong-tools)Calling wrong tools **Symptoms:** Agent selects incorrect tools for the task, or calls tools with invalid parameters. **Causes:** - Tool descriptions are ambiguous or overlap - Too many similar tools confuse the LLM - System prompt doesn’t provide clear tool selection guidance **Solution:** 1. Make tool descriptions more specific and distinct. 2. Add "when to use" guidance to your system prompt: ```text Use get_order_status when: - Customer provides an order ID (ORD-XXXXX) - You need to check current order state Use get_shipping_info when: - Order status is "shipped" - Customer asks about delivery or tracking ``` 3. Reduce the number of tools you expose to the agent. 4. Use subagents to partition tools by domain. **Prevention:** - Follow tool design patterns in [MCP Tool Patterns](../../mcp/remote/tool-patterns/) - Limit each agent to 10-15 tools maximum - Test boundary cases where multiple tools might apply ### [](#stuck-in-loops-or-exceeding-max-iterations)Stuck in loops or exceeding max iterations **Symptoms:** Agent reaches max iterations without completing the task, or repeatedly calls the same tool with the same parameters. **Causes:** - Tool returns errors that the agent doesn’t know how to handle - Agent doesn’t recognize when the task is complete - Tool returns incomplete data that prompts another call - System prompt encourages exhaustive exploration **Solution:** 1. Add completion criteria to your system prompt: ```text When you have retrieved all requested information: 1. Present the results to the user 2. Stop calling additional tools 3. Do not explore related data unless asked ``` 2. Add error handling guidance: ```text If a tool fails after 2 attempts: - Explain what went wrong - Do not retry the same tool again - Move on or ask for user guidance ``` 3. Review tool output to ensure it signals completion clearly. 4. Increase max iterations if the task legitimately requires many steps. **Prevention:** - Design tools to return complete information in one call - Set max iterations appropriate for task complexity (see [Why iterations matter](../concepts/#why-iterations-matter)) - Test with ambiguous requests that might cause loops ### [](#making-up-information)Making up information **Symptoms:** Agent provides plausible-sounding answers without calling tools, or invents data when tools fail. **Causes:** - System prompt doesn’t explicitly forbid fabrication - Agent treats tool failures as suggestions rather than requirements - Model is hallucinating due to lack of constraints **Solution:** 1. Add explicit constraints to your system prompt: ```text Critical rules: - NEVER make up order numbers, tracking numbers, or customer data - If a tool fails, explain the failure - do not guess - If you don't have information, say so explicitly ``` 2. Test error scenarios by temporarily disabling tools. 3. Use a more capable model that follows instructions better. **Prevention:** - Include "never fabricate" rules in all system prompts - Test with requests that require unavailable data - Monitor **Transcripts** and session topic for fabricated responses ### [](#analyzing-conversation-patterns)Analyzing conversation patterns **Symptoms:** Agent behavior is inconsistent or produces unexpected results. **Solution:** Review conversation history in **Transcripts** to identify problematic patterns: - Agents calling the same tool repeatedly: Indicates loop detection is needed - Large gaps between messages: Suggests tool timeout or slow execution - Agent responses without tool calls: Indicates a tool selection issue - Fabricated information: Suggests a missing "never make up data" constraint - Truncated early messages: Indicates the context window was exceeded **Analysis workflow:** 1. Use **Inspector** to reproduce the issue. 2. Review full conversation including tool invocations. 3. Identify where agent behavior diverged from expected. 4. Check system prompt for missing guidance. 5. Verify tool responses are formatted correctly. ## [](#performance-issues)Performance issues Diagnose and fix issues related to agent speed and resource consumption. ### [](#slow-response-times)Slow response times **Symptoms:** Agent takes 10+ seconds to respond to simple queries. **Causes:** - LLM model is slow (large context processing) - Too many tool calls in sequence - Tools themselves are slow (database queries, API calls) - Large context window from long conversation history **Solution:** 1. Use a faster, lower-latency model tier for simple queries and reserve larger models for complex reasoning. 2. Review conversation history in the **Inspector** tab to identify unnecessary tool calls. 3. Optimize tool implementations: 1. Add caching where appropriate 2. Reduce query complexity 3. Return only needed data (use pagination, filters) 4. Clear the conversation history if the context is very large. **Prevention:** - Right-size model selection based on task complexity - Design tools to execute quickly (< 2 seconds ideal) - Set appropriate max iterations to prevent excessive exploration - Monitor token usage and conversation length ### [](#high-token-costs)High token costs **Symptoms:** Token usage is higher than expected, costs are increasing rapidly. **Causes:** - Max iterations configured too high - Agent making unnecessary tool calls - Large tool results filling context window - Long conversation history not being managed - Using expensive models for simple tasks **Solution:** 1. Review token usage in **Transcripts**. 2. Lower max iterations for this agent. 3. Optimize tool responses to return less data: ```text Bad: Return all 10,000 customer records Good: Return paginated results, 20 records at a time ``` 4. Add cost control guidance to system prompt: ```text Efficiency guidelines: - Request only the data you need - Stop when you have enough information - Do not call tools speculatively ``` 5. Switch to a more cost-effective model for simple queries. 6. Clear conversation history periodically in the **Inspector** tab. **Prevention:** - Set appropriate max iterations (10-20 for simple, 30-40 for complex) - Design tools to return minimal necessary data - Monitor token usage trends - See cost calculation guidance in [Cost calculation](../concepts/#cost-calculation) ## [](#tool-execution-issues)Tool execution issues Fix problems with timeouts, invalid parameters, and error responses. ### [](#tool-timeouts)Tool timeouts **Symptoms:** Tools fail with timeout errors, agent receives incomplete results. **Causes:** - External API is slow or unresponsive - Database query is too complex - Network latency between tool and external system - Tool processing large datasets in memory **Solution:** 1. Add timeout handling to tool implementation: ```yaml http: url: https://api.example.com/data timeout: "5s" # Set explicit timeout ``` 2. Optimize external queries: 1. Add database indexes 2. Reduce query scope 3. Cache frequent queries 3. Increase tool timeout if operation legitimately takes longer. 4. Add retry logic for transient failures. **Prevention:** - Set explicit timeouts in all tool configurations - Test tools under load - Monitor external API performance - Design tools to fail fast on unavailable services ### [](#invalid-parameters)Invalid parameters **Symptoms:** Tools return validation errors about missing or incorrectly formatted parameters. **Causes:** - Tool schema doesn’t match implementation - Agent passes wrong data types - Required parameters not marked as required in schema - Agent misunderstands parameter purpose **Solution:** 1. Verify tool schema matches implementation: ```yaml input_schema: properties: order_id: type: string # Must match what tool expects description: "Order ID in format ORD-12345" ``` 2. Add parameter validation to tools. 3. Improve parameter descriptions in tool schema. 4. Add examples to tool descriptions: ```yaml description: | Get order status by order ID. Example: get_order_status(order_id="ORD-12345") ``` **Prevention:** - Write detailed parameter descriptions - Include format requirements and examples - Test tools with invalid inputs to verify error messages - Use JSON Schema validation in tool implementations ### [](#tool-returns-errors)Tool returns errors **Symptoms:** Tools execute but return error responses or unexpected data formats. **Causes:** - External API returned error - Tool implementation has bugs - Data format changed in external system - Tool lacks error handling **Solution:** 1. Check tool logs in MCP server. 2. Test tool directly (outside agent context). 3. Verify external system is operational. 4. Add error handling to tool implementation: ```yaml processors: - try: - http: url: ${API_URL} catch: - mapping: | root.error = "API unavailable: " + error() ``` 5. Update agent system prompt to handle this error type. **Prevention:** - Implement comprehensive error handling in tools - Monitor external system health - Add retries for transient failures - Log all tool errors for analysis ## [](#integration-issues)Integration issues Fix problems with external applications calling agents and pipeline-to-agent integration. ### [](#agent-card-does-not-contain-a-url)Agent card does not contain a URL **Symptoms:** Pipeline fails with error: `agent card does not contain a URL` or `failed to init processor path root.pipeline.processors.0` **Causes:** - The `agent_card_url` points to the base agent endpoint instead of the agent card JSON file **Solution:** The `agent_card_url` must point to the agent card JSON file, not the base agent endpoint. **Incorrect configuration:** ```yaml processors: - a2a_message: agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com" prompt: "Analyze this transaction: ${!content()}" ``` **Correct configuration:** ```yaml processors: - a2a_message: agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com/.well-known/agent-card.json" prompt: "Analyze this transaction: ${!content()}" ``` The agent card is always available at `/.well-known/agent-card.json` according to the A2A protocol standard. **Prevention:** - Always append `/.well-known/agent-card.json` to the agent endpoint URL - Test the agent card URL in a browser before using it in pipeline configuration - See [Agent card location](../a2a-concepts/#agent-card-location) for details ### [](#pipeline-integration-failures)Pipeline integration failures **Symptoms:** Pipelines using `a2a_message` processor fail or timeout. **Causes:** - Agent is not running or restarting - Agent timeout is too low for pipeline workload - Authentication issues between pipeline and agent - High event volume overwhelming agent **Solution:** 1. Check agent status and resource allocation. 2. Increase agent resource tier for high-volume pipelines. 3. Add error handling in pipeline: ```yaml processors: - try: - a2a_message: agent_card_url: "https://your-agent-url/.well-known/agent-card.json" catch: - log: message: "Agent invocation failed: ${! error() }" ``` **Prevention:** - Test pipeline-agent integration with low volume first - Size agent resources appropriately for event rate - See integration patterns in [Pipeline Integration Patterns](../pipeline-integration-patterns/) ## [](#monitor-and-debug-agents)Monitor and debug agents For comprehensive guidance on monitoring agent activity, analyzing conversation history, tracking token usage, and debugging issues, see [Monitor Agent Activity](../monitor-agents/). ## [](#next-steps)Next steps - [System Prompt Best Practices](../prompt-best-practices/) - [Agent Concepts](../concepts/) - [MCP Tool Patterns](../../mcp/remote/tool-patterns/) - [Agent Architecture Patterns](../architecture-patterns/) --- # Page 18: Learn Multi-Tool Agent Orchestration **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/tutorials/customer-support-agent.md --- # Learn Multi-Tool Agent Orchestration --- title: Learn Multi-Tool Agent Orchestration latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/tutorials/customer-support-agent page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/tutorials/customer-support-agent.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/tutorials/customer-support-agent.adoc description: Learn how agents coordinate multiple tools, make decisions based on conversation context, and handle errors through building a customer support agent. page-topic-type: tutorial personas: agent_developer, streaming_developer learning-objective-1: Explain how agents use conversation context to decide which tools to invoke learning-objective-2: Apply tool orchestration patterns to handle multi-step workflows learning-objective-3: Evaluate how system prompt design affects agent tool selection page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This tutorial shows you how to build a customer support agent to learn how agents orchestrate multiple tools, make context-aware decisions, and handle incomplete data. After completing this tutorial, you will be able to: - Explain how agents use conversation context to decide which tools to invoke - Apply tool orchestration patterns to handle multi-step workflows - Evaluate how system prompt design affects agent tool selection ## [](#why-multi-tool-orchestration-matters)Why multi-tool orchestration matters Agents become powerful when they coordinate multiple tools to solve complex problems. A single-tool agent can retrieve order status. A multi-tool agent can check order status, fetch tracking information, look up customer history, and decide which tools to invoke based on conversation context. This tutorial teaches multi-tool orchestration through a customer support scenario. The patterns you practice here apply to any multi-tool scenario: data analysis agents coordinating query and visualization tools, workflow automation agents chaining approval and notification tools, or research agents combining search and summarization tools. ## [](#the-scenario)The scenario Customer support teams handle repetitive questions: "Where is my order?", "What’s my tracking number?", "Show me my order history." Human agents often waste time on lookups that could be automated. An effective support agent needs three capabilities: - **Order status lookup**: Check current order state and contents - **Shipping information**: Retrieve tracking numbers and delivery estimates - **Order history**: Show past purchases for a customer The challenge: users phrase requests differently ("Where’s my package?", "Track order ORD-12345", "My recent orders"), and agents must choose the right tool based on context. ## [](#prerequisites)Prerequisites - A [BYOC cluster](../../../../get-started/cluster-types/byoc/). - [AI Gateway configured](../../../ai-gateway/gateway-quickstart/) with at least one LLM provider enabled (this tutorial uses OpenAI). ## [](#design-the-mcp-tools)Design the MCP tools Before an agent can orchestrate tools, you need tools to orchestrate. Each tool should do one thing well, returning structured data the agent can reason about. You could create a single `handle_customer_request` tool that takes a natural language query and returns an answer. But, this approach fails because: - The agent can’t inspect intermediate results - Tool chaining becomes impossible (no way to pass order status to shipping lookup) - Error handling is opaque Instead, create focused tools: - `get_order_status`: Returns order state and contents - `get_shipping_info`: Returns tracking data - `get_customer_history`: Returns past orders This granularity enables the agent to chain tools (check order status, see it’s shipped, fetch tracking info) and handle errors at each step. ### [](#deploy-the-tools)Deploy the tools Create a Remote MCP server with the three tools. 1. Navigate to your cluster in the [Redpanda Cloud Console](https://cloud.redpanda.com). 2. Go to **Agentic AI** > **Remote MCP**. 3. Click **Create MCP Server**. 4. Configure the server: - **Name**: `customer-support-tools` - **Description**: `Tools for customer support agent` 5. Add the following tools. For each tool, select **Processor** from the component type dropdown, then click **Lint** to validate: #### get_order_status This tool uses the `mapping` processor to return mock data. The mock approach enables testing without external dependencies. The agent must interpret the structured response to extract order details. ```yaml label: get_order_status mapping: | let order_id = this.order_id root = if $order_id == "ORD-12345" { { "order_id": $order_id, "status": "shipped", "items": [{"name": "Laptop", "quantity": 1, "price": 1299.99}], "total": 1299.99, "order_date": "2025-01-10", "customer_id": "CUST-100" } } else if $order_id == "ORD-67890" { { "order_id": $order_id, "status": "processing", "items": [{"name": "Headphones", "quantity": 2, "price": 149.99}], "total": 299.98, "order_date": "2025-01-14", "customer_id": "CUST-100" } } else if $order_id == "ORD-99999" { { "error": "order_not_found", "message": "Order not found" } } else { { "order_id": $order_id, "status": "pending", "items": [{"name": "Generic Item", "quantity": 1, "price": 49.99}], "total": 49.99, "order_date": "2025-01-15", "customer_id": "CUST-999" } } meta: mcp: enabled: true description: "Retrieve order status and details. Use ORD-12345 (shipped), ORD-67890 (processing), or ORD-99999 (not found) for testing." properties: - name: order_id type: string description: "The order ID (format ORD-XXXXX)" required: true ``` #### get_shipping_info This tool demonstrates conditional data: it only returns tracking information when the order has shipped. When an order hasn’t shipped yet, the tool returns an empty result. The system prompt instructs the agent to explain that shipping info is unavailable for orders that haven’t shipped. ```yaml label: get_shipping_info processors: - mapping: | let order_id = this.order_id root = if $order_id == "ORD-12345" { { "order_id": $order_id, "tracking_number": "FX1234567890", "carrier": "Example Shipping", "status": "in_transit", "estimated_delivery": "2025-01-17", "last_location": "San Francisco Distribution Center", "last_update": "2025-01-15T14:30:00Z" } } else if $order_id == "ORD-67890" { { "order_id": $order_id, "error": true, "message": "Order has not shipped yet" } } else { { "order_id": $order_id, "error": true, "message": "No shipping information available" } } meta: mcp: enabled: true description: "Get tracking and shipping information. ORD-12345 has shipping info, ORD-67890 has not shipped yet." properties: - name: order_id type: string description: "The order ID to track" required: true ``` #### get_customer_history This tool returns multiple orders, demonstrating list-handling. The agent must format multiple results clearly for users. ```yaml label: get_customer_history processors: - mapping: | let customer_id = this.customer_id root = if $customer_id == "CUST-100" { { "customer_id": $customer_id, "orders": [ {"order_id": "ORD-12345", "status": "shipped", "total": 1299.99, "order_date": "2025-01-10"}, {"order_id": "ORD-67890", "status": "processing", "total": 299.98, "order_date": "2025-01-14"}, {"order_id": "ORD-11111", "status": "delivered", "total": 89.99, "order_date": "2024-12-20"} ], "total_orders": 3 } } else if $customer_id == "CUST-999" { { "customer_id": $customer_id, "orders": [], "total_orders": 0, "message": "No orders found for this customer" } } else { { "error": true, "message": "Customer not found" } } meta: mcp: enabled: true description: "Retrieve order history. Use CUST-100 (has orders) or CUST-999 (no orders) for testing." properties: - name: customer_id type: string description: "The customer ID (format CUST-XXX)" required: true ``` 6. Click **Create MCP Server**. Wait for the server status to show **Running**. You now have three focused tools the agent can orchestrate. ## [](#write-the-system-prompt)Write the system prompt The system prompt teaches the agent how to orchestrate tools. Without explicit guidance, the agent must guess when to use each tool, often choosing incorrectly or ignoring tools entirely. ### [](#create-the-agent)Create the agent Create the customer support agent with the system prompt. 1. Go to **Agentic AI** > **AI Agents**. 2. Click **Create Agent**. 3. Configure the agent: - **Name**: `customer-support-agent` - **Description**: `Helps customers track orders and shipping` - **Resource Tier**: Medium - **AI Gateway**: Select the gateway you configured - **Provider**: OpenAI or Anthropic - **Model**: OpenAI GPT-5.2 or Claude Sonnet 4.5 (models with strong reasoning) - **MCP Server**: Select `customer-support-tools` - **Max Iterations**: 15 4. In the **System Prompt** field, enter this configuration: ```text You are a customer support agent for Acme E-commerce. Responsibilities: - Help customers track their orders - Provide shipping information and estimated delivery dates - Look up customer order history - Answer questions about order status Available tools: - get_order_status: Use when customer asks about a specific order - get_shipping_info: Use when customer needs tracking or delivery information - get_customer_history: Use when customer asks about past orders or "my orders" When to use each tool: - If customer provides an order ID (ORD-XXXXX), use get_order_status first - If customer asks "where is my order?", ask for the order ID before using tools - If order is "shipped", follow up with get_shipping_info to provide tracking details - If customer asks about "all my orders" or past purchases, use get_customer_history Never: - Expose customer payment information (credit cards, billing addresses) - Make up tracking numbers or delivery dates - Guarantee delivery dates (use "estimated" language) - Process refunds or cancellations (escalate to human agent) Error handling: - If order not found, ask customer to verify the order ID - If shipping info unavailable, explain the order may not have shipped yet - If customer history is empty, confirm the customer ID and explain no orders found Response format: - Start with a friendly greeting - Present order details in a clear, structured way - For order status, include: order ID, status, items, total - For shipping, include: carrier, tracking number, estimated delivery, last known location - Always include next steps or offer additional help Example response structure: 1. Acknowledge the customer's question 2. Present the information from tools 3. Provide next steps or additional context 4. Ask if they need anything else ``` 5. Click **Create Agent**. Wait for the agent status to show **Running**. ## [](#observe-orchestration-in-action)Observe orchestration in action Open the **Inspector** tab in the Redpanda Cloud Console to interact with the agent. Testing reveals how the agent makes decisions. Watch the conversation panel in the built-in chat interface to see the agent’s reasoning process unfold. ### [](#tool-chaining-based-on-status)Tool chaining based on status Test how the agent chains tools based on order status. Enter this query in **Inspector**: Hi, I'd like to check on order ORD-12345 Watch the conversation panel. The agent calls `get_order_status` first, sees the status is "shipped", then automatically follows up with `get_shipping_info` to provide tracking details. The agent uses the first tool’s result to decide whether to invoke the second tool. Now try this query with a different order: Check order ORD-67890 This order has status "processing", so the agent calls only `get_order_status`. Since the order hasn’t shipped yet, the agent skips `get_shipping_info`. The agent chains tools only when appropriate. ### [](#clarification-before-tool-invocation)Clarification before tool invocation Test how the agent handles incomplete information. Click **Clear context** to clear the conversation history. Then enter this query: Where is my order? The agent recognizes the request is missing an order ID and asks the customer to provide it. Watch the conversation panel and see that the agent calls zero tools. Instead of guessing or fabricating information, it asks a clarifying question. This demonstrates pre-condition checking. Effective orchestration includes knowing when NOT to invoke tools. ### [](#list-handling)List handling Test how the agent formats multiple results. Enter this query: Can you show me my recent orders? My customer ID is CUST-100. The agent calls `get_customer_history` and receives multiple orders. Watch how it formats the list clearly for the customer, showing details for each order. Now test the empty results case with this query: Show my order history for customer ID CUST-999 The agent receives an empty list and explains that no orders were found, asking the customer to verify their ID. ### [](#error-recovery)Error recovery Test how the agent handles missing data. Enter this query: Check order ORD-99999 The tool returns no data for this order ID. Watch how the agent responds. It explains the order wasn’t found and asks the customer to verify the order ID. Critically, the agent does not fabricate tracking numbers or order details. This demonstrates error recovery without hallucination. The "Never make up tracking numbers" constraint in the system prompt prevents the agent from inventing plausible-sounding but incorrect information. ## [](#troubleshoot)Troubleshoot For comprehensive troubleshooting guidance, see [Troubleshoot AI Agents](../../troubleshooting/). ### [](#test-with-mock-data)Test with mock data The mock tools in this tutorial only recognize specific test IDs: - Orders: ORD-12345, ORD-67890, ORD-99999 - Customers: CUST-100, CUST-999 Use these documented test IDs when testing the agent. If you replace the mock tools with real API calls, verify that your API endpoints return the expected data structures. ## [](#next-steps)Next steps - [Call external APIs](../../../mcp/remote/tool-patterns/#call-external-apis) - [System Prompt Best Practices](../../prompt-best-practices/) - [Agent Architecture Patterns](../../architecture-patterns/) - [Troubleshoot AI Agents](../../troubleshooting/) --- # Page 19: Build Multi-Agent Systems for Transaction Dispute Resolution **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/agents/tutorials/transaction-dispute-resolution.md --- # Build Multi-Agent Systems for Transaction Dispute Resolution --- title: Build Multi-Agent Systems for Transaction Dispute Resolution latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: agents/tutorials/transaction-dispute-resolution page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: agents/tutorials/transaction-dispute-resolution.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/agents/tutorials/transaction-dispute-resolution.adoc description: Learn how to build multi-agent systems with domain separation, handle sensitive financial data, and monitor multi-agent execution through transaction investigation. page-topic-type: tutorial personas: agent_developer, platform_admin learning-objective-1: Design multi-agent systems with domain-specific sub-agents learning-objective-2: Monitor multi-agent execution using Transcripts learning-objective-3: Integrate agents with streaming pipelines for event-driven processing page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- This tutorial shows you how to build a transaction dispute resolution system using multi-agent architecture, secure data handling, and execution monitoring. > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). After completing this tutorial, you will be able to: - Design multi-agent systems with domain-specific sub-agents - Monitor multi-agent execution using **Transcripts** - Integrate agents with streaming pipelines for event-driven processing ## [](#what-youll-learn)What you’ll learn This tutorial advances from [basic multi-tool orchestration](../customer-support-agent/) to multi-agent systems. You’ll build a transaction dispute resolution system where a root agent delegates to specialized sub-agents (account, fraud, merchant, compliance), each with focused responsibilities and PII-protected data access. You’ll also monitor execution using [transcripts](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#transcript) and process disputes from transaction streams for automated detection. These patterns apply beyond banking to any domain requiring specialized expertise and data security: healthcare systems, insurance claims processing, or regulatory compliance workflows. ## [](#the-scenario)The scenario Banks handle thousands of dispute calls daily. Customers report unauthorized charges, billing errors, or unrecognized transactions. Each investigation requires cross-referencing multiple systems and applying consistent fraud detection logic. Traditionally, human agents manually open multiple systems, cross-reference data, and take notes. A 10-15 minute process prone to inconsistencies and incomplete compliance logging. Multi-agent automation transforms this workflow by enabling instant data aggregation from all sources, consistent logic applied every time, 10-15 second resolution, and structured results for compliance. Human agents handle only complex escalations. When a customer calls saying "I see a $247.83 charge from 'ACME CORP' but I never shopped there. Is this fraud?", the system must investigate account history, calculate fraud scores, verify merchant legitimacy, and make a recommendation with structured results. ## [](#prerequisites)Prerequisites - A [BYOC cluster](../../../../get-started/cluster-types/byoc/). - [AI Gateway configured](../../../ai-gateway/gateway-quickstart/) with at least one LLM provider enabled (this tutorial uses OpenAI GPT-5.2 or Claude Sonnet 4.5 for reasoning). - The [Redpanda CLI (`rpk`)](../../../../manage/rpk/rpk-install/) installed (for testing the pipeline with sample data). - Completed [Learn Multi-Tool Agent Orchestration](../customer-support-agent/) (foundational multi-tool concepts). ## [](#create-mcp-tools-for-each-domain)Create MCP tools for each domain Before creating agents, create the tools they’ll use. You’ll organize tools by domain, matching each sub-agent’s responsibility. ### [](#account-tools)Account tools Account tools retrieve customer and transaction data with PII protection. 1. Navigate to your cluster in the [Redpanda Cloud Console](https://cloud.redpanda.com). 2. Go to **Agentic AI** > **Remote MCP**. 3. Click **Create MCP Server**. 4. Configure the server: - **Name**: `account-tools` - **Description**: `Customer account and transaction data retrieval` - **Resource Tier**: XSmall 5. Add the following tools. For each tool, select **Processor** from the component type dropdown, then click **Lint** to validate: #### get_customer_account This mock tool returns account data with sensitive fields already protected. Card numbers only include the last 4 digits, while full names remain for verification. In production, implement similar protections in your data layer. ```yaml label: get_customer_account mapping: | root = match { this.customer_id == "CUST-1001" => { "customer_id": "CUST-1001", "name": "Dana A.", "email": "s****@example.com", "account_type": "premium_checking", "card_last_four": "4532", "card_status": "active", "member_since": "2019-03-15", "location": "Seattle, WA, USA", "phone_masked": "***-***-7890" }, this.customer_id == "CUST-1002" => { "customer_id": "CUST-1002", "name": "Alex T.", "email": "m****@example.com", "account_type": "standard_checking", "card_last_four": "8821", "card_status": "active", "member_since": "2021-07-22", "location": "San Francisco, CA, USA", "phone_masked": "***-***-4521" }, this.customer_id == "CUST-1003" => { "customer_id": "CUST-1003", "name": "Quinn N.", "email": "e****@example.com", "account_type": "premium_credit", "card_last_four": "2193", "card_status": "active", "member_since": "2020-11-08", "location": "Austin, TX, USA", "phone_masked": "***-***-3344" }, _ => { "error": "customer_not_found", "message": "No account found for customer ID: " + this.customer_id } } meta: mcp: enabled: true description: "Retrieve customer account information with masked PII. Use CUST-1001, CUST-1002, or CUST-1003 for testing." properties: - name: customer_id type: string description: "Customer identifier (format CUST-XXXX)" required: true ``` #### get_transaction_details This tool returns complete transaction details including merchant information, location, and timestamp. Notice how it returns structured data the fraud agent can analyze. ```yaml label: get_transaction_details mapping: | root = match { this.transaction_id == "TXN-89012" => { "transaction_id": "TXN-89012", "customer_id": "CUST-1001", "amount": 1847.99, "currency": "USD", "merchant": { "name": "LUXURY WATCHES INT", "category": "jewelry", "country": "SG", "mcc": "5944" }, "card_last_four": "4532", "date": "2026-01-18T14:22:00Z", "location": { "city": "Singapore", "country": "SG", "coordinates": "1.3521,103.8198" }, "status": "posted" }, this.transaction_id == "TXN-89013" => { "transaction_id": "TXN-89013", "customer_id": "CUST-1001", "amount": 47.83, "currency": "USD", "merchant": { "name": "EXAMPLE MKTPLACE", "category": "online_retail", "country": "US", "mcc": "5942" }, "card_last_four": "4532", "date": "2026-01-15T10:15:00Z", "location": { "city": "Seattle", "country": "US", "coordinates": "47.6062,-122.3321" }, "status": "posted" }, this.transaction_id == "TXN-89014" => { "transaction_id": "TXN-89014", "customer_id": "CUST-1002", "amount": 29.99, "currency": "USD", "merchant": { "name": "EXAMPLE STREAMING", "category": "subscription_service", "country": "US", "mcc": "4899" }, "card_last_four": "8821", "date": "2025-12-15T00:00:01Z", "location": { "city": "San Francisco", "country": "US", "coordinates": "37.7749,-122.4194" }, "status": "posted", "recurring": true }, this.transaction_id == "TXN-89015" => { "transaction_id": "TXN-89015", "customer_id": "CUST-1003", "amount": 312.50, "currency": "EUR", "merchant": { "name": "HOTEL PARIS", "category": "lodging", "country": "FR", "mcc": "7011" }, "card_last_four": "2193", "date": "2026-01-10T20:30:00Z", "location": { "city": "Paris", "country": "FR", "coordinates": "48.8566,2.3522" }, "status": "posted" }, _ => { "error": "transaction_not_found", "message": "No transaction found with ID: " + this.transaction_id } } meta: mcp: enabled: true description: "Retrieve detailed transaction information including merchant, location, and amount. Use TXN-89012 through TXN-89015 for testing." properties: - name: transaction_id type: string description: "Transaction identifier (format TXN-XXXXX)" required: true ``` #### get_transaction_history This tool returns aggregated spending patterns instead of raw transaction lists. This privacy-preserving approach gives fraud analysis what it needs (typical spending by category, location patterns) without exposing individual transaction details unnecessarily. ```yaml label: get_transaction_history mapping: | root = match { this.customer_id == "CUST-1001" => { "customer_id": "CUST-1001", "analysis_period": "last_90_days", "spending_patterns": { "average_transaction": 127.45, "median_transaction": 65.20, "total_transactions": 87, "total_amount": 11088.15 }, "category_breakdown": [ {"category": "online_retail", "count": 42, "avg_amount": 78.50}, {"category": "groceries", "count": 28, "avg_amount": 95.30}, {"category": "restaurants", "count": 12, "avg_amount": 45.80}, {"category": "gas_stations", "count": 5, "avg_amount": 62.00} ], "location_patterns": { "primary_region": "US_West_Coast", "international_transactions": 0, "cities": ["Seattle", "Bellevue", "Tacoma"] }, "merchant_patterns": { "recurring_merchants": ["EXAMPLE MKTPLACE", "EXAMPLE WHOLESALE", "EXAMPLE COFFEE"], "first_time_merchants_this_period": 3 }, "risk_indicators": { "unusual_activity": false, "velocity_flags": 0, "declined_transactions": 1 } }, this.customer_id == "CUST-1002" => { "customer_id": "CUST-1002", "analysis_period": "last_90_days", "spending_patterns": { "average_transaction": 95.33, "median_transaction": 52.10, "total_transactions": 64, "total_amount": 6101.12 }, "category_breakdown": [ {"category": "subscription_service", "count": 15, "avg_amount": 29.99}, {"category": "restaurants", "count": 25, "avg_amount": 68.40}, {"category": "online_retail", "count": 18, "avg_amount": 110.20}, {"category": "entertainment", "count": 6, "avg_amount": 45.00} ], "location_patterns": { "primary_region": "US_West_Coast", "international_transactions": 0, "cities": ["San Francisco", "Oakland", "San Jose"] }, "merchant_patterns": { "recurring_merchants": ["EXAMPLE STREAMING", "EXAMPLE MEDIA", "EXAMPLE AUDIO"], "first_time_merchants_this_period": 7 }, "risk_indicators": { "unusual_activity": false, "velocity_flags": 0, "declined_transactions": 0 } }, this.customer_id == "CUST-1003" => { "customer_id": "CUST-1003", "analysis_period": "last_90_days", "spending_patterns": { "average_transaction": 215.67, "median_transaction": 145.00, "total_transactions": 52, "total_amount": 11214.84 }, "category_breakdown": [ {"category": "travel", "count": 8, "avg_amount": 650.00}, {"category": "lodging", "count": 6, "avg_amount": 380.50}, {"category": "restaurants", "count": 22, "avg_amount": 85.20}, {"category": "online_retail", "count": 16, "avg_amount": 95.75} ], "location_patterns": { "primary_region": "US_South", "international_transactions": 3, "cities": ["Austin", "Houston", "Dallas", "Paris", "London"] }, "merchant_patterns": { "recurring_merchants": ["EXAMPLE AIRLINES", "EXAMPLE HOTEL", "EXAMPLE TRAVEL"], "first_time_merchants_this_period": 12 }, "risk_indicators": { "unusual_activity": false, "velocity_flags": 0, "declined_transactions": 0 } }, _ => { "error": "customer_not_found", "message": "No transaction history found for customer ID: " + this.customer_id } } meta: mcp: enabled: true description: "Retrieve customer transaction history with spending patterns, category breakdown, and risk indicators. Use CUST-1001, CUST-1002, or CUST-1003 for testing." properties: - name: customer_id type: string description: "Customer identifier (format CUST-XXXX)" required: true ``` 6. Click **Create MCP Server**. Wait for the server status to show **Running**. > 📝 **NOTE** > > This tutorial uses XSmall resource tier for all MCP servers because the mock tools run lightweight Bloblang transformations. Production deployments with external API calls require larger tiers based on throughput needs. See [Scale Remote MCP Server Resources](../../../mcp/remote/scale-resources/). ### [](#fraud-tools)Fraud tools Fraud tools calculate risk scores and identify fraud indicators. 1. Click **Create MCP Server**. 2. Configure the server: - **Name**: `fraud-tools` - **Description**: `Fraud detection and risk scoring` - **Resource Tier**: XSmall 3. Add the following tools. For each tool, select **Processor** from the component type dropdown, then click **Lint** to validate: #### calculate_fraud_score This tool implements multi-factor fraud scoring with location risk (0-35 for international/unusual cities), merchant risk (0-30 for reputation/fraud reports), amount risk (0-25 for deviation from averages), velocity risk (0-15 for rapid transactions), and category risk (0-20 for unusual spending categories). The tool returns both the total score and breakdown, allowing agents to explain their reasoning. ```yaml label: calculate_fraud_score mapping: | root = match { this.transaction_id == "TXN-89012" && this.customer_id == "CUST-1001" => { "transaction_id": "TXN-89012", "customer_id": "CUST-1001", "fraud_score": 95, "risk_level": "critical", "score_breakdown": { "location_risk": 35, "merchant_risk": 30, "amount_risk": 25, "velocity_risk": 0, "category_risk": 20 }, "factors_detected": [ "unusual_location", "questionable_merchant", "unusual_amount", "unusual_category" ], "reasoning": "International transaction from Singapore with no customer history of international purchases. High-value jewelry purchase (14.5x customer average). Merchant has significant fraud indicators.", "recommendation": "block_and_investigate" }, this.transaction_id == "TXN-89013" && this.customer_id == "CUST-1001" => { "transaction_id": "TXN-89013", "customer_id": "CUST-1001", "fraud_score": 8, "risk_level": "minimal", "score_breakdown": { "location_risk": 0, "merchant_risk": 0, "amount_risk": 0, "velocity_risk": 0, "category_risk": 0 }, "factors_detected": [], "reasoning": "Local transaction from trusted merchant in customer's typical spending category and amount range.", "recommendation": "approve" }, this.transaction_id == "TXN-89014" && this.customer_id == "CUST-1002" => { "transaction_id": "TXN-89014", "customer_id": "CUST-1002", "fraud_score": 52, "risk_level": "medium", "score_breakdown": { "location_risk": 0, "merchant_risk": 15, "amount_risk": 0, "velocity_risk": 8, "category_risk": 0 }, "factors_detected": [ "questionable_merchant", "high_velocity" ], "reasoning": "Recurring subscription service with known billing issues. Multiple charges detected from same merchant. Moderate merchant reputation score.", "recommendation": "monitor_closely" }, this.transaction_id == "TXN-89015" && this.customer_id == "CUST-1003" => { "transaction_id": "TXN-89015", "customer_id": "CUST-1003", "fraud_score": 12, "risk_level": "minimal", "score_breakdown": { "location_risk": 0, "merchant_risk": 0, "amount_risk": 5, "velocity_risk": 0, "category_risk": 0 }, "factors_detected": [ "slightly_elevated_amount" ], "reasoning": "International hotel charge consistent with customer's frequent travel patterns. Amount within expected range for lodging category.", "recommendation": "approve" }, _ => { "transaction_id": this.transaction_id, "customer_id": this.customer_id, "fraud_score": 50, "risk_level": "medium", "score_breakdown": { "location_risk": 0, "merchant_risk": 0, "amount_risk": 0, "velocity_risk": 0, "category_risk": 0 }, "factors_detected": [], "reasoning": "Insufficient data to calculate accurate fraud score for this transaction/customer combination.", "recommendation": "monitor_closely" } } meta: mcp: enabled: true description: "Calculate fraud risk score based on transaction patterns and risk indicators. Use TXN-89012 through TXN-89015 with corresponding customer IDs for testing." properties: - name: transaction_id type: string description: "Transaction identifier to analyze (format TXN-XXXXX)" required: true - name: customer_id type: string description: "Customer identifier for historical analysis (format CUST-XXXX)" required: true ``` #### get_risk_indicators This tool provides detailed fraud signals with severity levels. Each indicator includes a description that agents can use to explain findings to customers. ```yaml label: get_risk_indicators mapping: | root = match { this.transaction_id == "TXN-89012" => { "transaction_id": "TXN-89012", "risk_indicators": [ { "indicator": "international_transaction", "severity": "high", "description": "Transaction originated from Singapore, customer has no international transaction history" }, { "indicator": "first_time_merchant", "severity": "medium", "description": "Customer has never transacted with this merchant before" }, { "indicator": "unusual_category", "severity": "high", "description": "Jewelry purchase is outside customer's typical spending categories" }, { "indicator": "high_amount", "severity": "high", "description": "Transaction amount is 14.5x customer's average transaction" }, { "indicator": "merchant_flagged", "severity": "critical", "description": "Merchant has been flagged in fraud databases" } ], "total_indicators": 5, "critical_count": 1, "high_count": 3, "medium_count": 1, "overall_assessment": "high_fraud_probability" }, this.transaction_id == "TXN-89013" => { "transaction_id": "TXN-89013", "risk_indicators": [ { "indicator": "known_merchant", "severity": "none", "description": "Example Marketplace is a recognized and trusted merchant" } ], "total_indicators": 1, "critical_count": 0, "high_count": 0, "medium_count": 0, "overall_assessment": "low_fraud_probability" }, this.transaction_id == "TXN-89014" => { "transaction_id": "TXN-89014", "risk_indicators": [ { "indicator": "recurring_billing", "severity": "low", "description": "Subscription service with recurring charges" }, { "indicator": "merchant_billing_issues", "severity": "medium", "description": "Merchant has known history of duplicate billing complaints" }, { "indicator": "duplicate_charge_pattern", "severity": "medium", "description": "Multiple charges detected from same merchant in short timeframe" } ], "total_indicators": 3, "critical_count": 0, "high_count": 0, "medium_count": 2, "low_count": 1, "none_count": 0, "overall_assessment": "medium_fraud_probability" }, this.transaction_id == "TXN-89015" => { "transaction_id": "TXN-89015", "risk_indicators": [ { "indicator": "international_transaction", "severity": "low", "description": "Transaction in France matches customer's travel history" }, { "indicator": "travel_category", "severity": "none", "description": "Hotel charge is consistent with customer's frequent travel patterns" }, { "indicator": "timing_matches_travel", "severity": "none", "description": "Transaction date aligns with customer's Paris trip" } ], "total_indicators": 3, "critical_count": 0, "high_count": 0, "medium_count": 0, "low_count": 1, "none_count": 2, "overall_assessment": "low_fraud_probability" }, _ => { "transaction_id": this.transaction_id, "risk_indicators": [], "total_indicators": 0, "critical_count": 0, "high_count": 0, "medium_count": 0, "low_count": 0, "none_count": 0, "overall_assessment": "insufficient_data" } } meta: mcp: enabled: true description: "Retrieve fraud risk indicators for a transaction including severity levels and overall assessment. Use TXN-89012 through TXN-89015 for testing." properties: - name: transaction_id type: string description: "Transaction identifier to analyze (format TXN-XXXXX)" required: true ``` 4. Click **Create MCP Server**. Wait for the server status to show **Running**. ### [](#merchant-tools)Merchant tools Merchant tools verify business legitimacy and analyze merchant categories. 1. Click **Create MCP Server**. 2. Configure the server: - **Name**: `merchant-tools` - **Description**: `Merchant verification and category analysis` - **Resource Tier**: XSmall 3. Add the following tools. For each tool, select **Processor** from the component type dropdown, then click **Lint** to validate: #### verify_merchant This tool returns reputation scores, fraud report counts, business verification status, and red flags. Notice how it includes common issues for legitimate merchants (like subscription billing problems) to help agents distinguish between fraud and merchant operational issues. ```yaml label: verify_merchant mapping: | root = match { this.merchant_name == "LUXURY WATCHES INT" => { "merchant_name": "LUXURY WATCHES INT", "merchant_id": "MER-99912", "reputation_score": 12, "reputation_level": "high_risk", "verification_status": "unverified", "fraud_reports": { "total_reports": 247, "recent_reports_30d": 42, "confirmed_fraud_cases": 89 }, "business_details": { "country": "Singapore", "years_in_operation": 1, "registration_verified": false }, "red_flags": [ "High volume of fraud reports", "Recently established business", "Unverified business registration", "Operates in high-risk category", "Pattern of chargebacks" ], "recommendation": "block_merchant" }, this.merchant_name == "EXAMPLE MKTPLACE" => { "merchant_name": "EXAMPLE MKTPLACE", "merchant_id": "MER-00001", "reputation_score": 98, "reputation_level": "excellent", "verification_status": "verified", "fraud_reports": { "total_reports": 1203, "recent_reports_30d": 15, "confirmed_fraud_cases": 0 }, "business_details": { "country": "USA", "years_in_operation": 20, "registration_verified": true, "parent_company": "Example Organization" }, "red_flags": [], "recommendation": "trusted_merchant" }, this.merchant_name == "EXAMPLE STREAMING" => { "merchant_name": "EXAMPLE STREAMING", "merchant_id": "MER-45678", "reputation_score": 65, "reputation_level": "moderate", "verification_status": "verified", "fraud_reports": { "total_reports": 892, "recent_reports_30d": 67, "confirmed_fraud_cases": 12 }, "business_details": { "country": "USA", "years_in_operation": 5, "registration_verified": true }, "red_flags": [ "Known billing system issues", "Frequent duplicate charge complaints", "Difficult cancellation process" ], "common_issues": [ "Duplicate subscriptions", "Failed cancellation processing", "Unclear billing descriptors" ], "recommendation": "verify_subscription_details" }, this.merchant_name == "HOTEL PARIS" => { "merchant_name": "HOTEL PARIS", "merchant_id": "MER-78234", "reputation_score": 88, "reputation_level": "trusted", "verification_status": "verified", "fraud_reports": { "total_reports": 45, "recent_reports_30d": 2, "confirmed_fraud_cases": 0 }, "business_details": { "country": "France", "years_in_operation": 15, "registration_verified": true, "chain": "Independent Boutique Hotels" }, "red_flags": [], "pricing": { "average_room_rate_eur": 280, "typical_range_eur": "220-350" }, "recommendation": "legitimate_merchant" }, _ => { "merchant_name": this.merchant_name, "reputation_score": 50, "reputation_level": "unknown", "verification_status": "not_found", "fraud_reports": { "total_reports": 0, "recent_reports_30d": 0, "confirmed_fraud_cases": 0 }, "business_details": {}, "red_flags": [], "message": "Merchant not found in verification database", "recommendation": "manual_review_required" } } meta: mcp: enabled: true description: "Verify merchant reputation and fraud history. Use LUXURY WATCHES INT (high risk), EXAMPLE MKTPLACE (trusted), EXAMPLE STREAMING (moderate), or HOTEL PARIS (trusted) for testing." properties: - name: merchant_name type: string description: "Merchant name as it appears on transaction" required: true ``` #### get_merchant_category This tool decodes MCC (Merchant Category Codes) and provides typical transaction ranges for each category. This helps identify mismatches (like a grocery store charging $2000). ```yaml label: get_merchant_category mapping: | root = match { this.mcc == "5944" => { "mcc": "5944", "category": "Jewelry, Watch, Clock, and Silverware Stores", "high_level_category": "retail_luxury", "risk_profile": "high", "typical_transaction_range": { "min": 100, "max": 5000, "average": 850 }, "fraud_risk_notes": "High-value items, common fraud target, verify customer intent", "common_fraud_patterns": [ "Stolen card purchases", "Account takeover", "Reshipping schemes" ] }, this.mcc == "5942" => { "mcc": "5942", "category": "Book Stores", "high_level_category": "retail_general", "risk_profile": "low", "typical_transaction_range": { "min": 10, "max": 200, "average": 45 }, "fraud_risk_notes": "Low fraud risk, common online purchase category", "common_fraud_patterns": [] }, this.mcc == "4899" => { "mcc": "4899", "category": "Cable, Satellite, and Other Pay Television and Radio Services", "high_level_category": "subscription_services", "risk_profile": "medium", "typical_transaction_range": { "min": 9.99, "max": 99.99, "average": 29.99 }, "fraud_risk_notes": "Recurring billing, watch for duplicate charges and unauthorized subscriptions", "common_fraud_patterns": [ "Duplicate subscriptions", "Unauthorized recurring charges", "Failed cancellation processing" ] }, this.mcc == "7011" => { "mcc": "7011", "category": "Lodging - Hotels, Motels, Resorts", "high_level_category": "travel_hospitality", "risk_profile": "medium", "typical_transaction_range": { "min": 80, "max": 500, "average": 180 }, "fraud_risk_notes": "Verify travel patterns, check for location consistency", "common_fraud_patterns": [ "Stolen card at booking sites", "Account takeover for rewards redemption" ] }, _ => { "mcc": this.mcc, "category": "Unknown Category", "high_level_category": "unclassified", "risk_profile": "unknown", "typical_transaction_range": { "min": 0, "max": 0, "average": 0 }, "fraud_risk_notes": "MCC not recognized, manual review recommended", "common_fraud_patterns": [] } } meta: mcp: enabled: true description: "Retrieve merchant category information including fraud risk level and common patterns based on MCC code." properties: - name: mcc type: string description: "Merchant Category Code (5944 for jewelry, 5942 for books, 4899 for streaming, 7011 for hotels)" required: true ``` 4. Click **Create MCP Server**. Wait for the server status to show **Running**. ### [](#compliance-tools)Compliance tools Compliance tools handle audit logging and regulatory requirements. 1. Click **Create MCP Server**. 2. Configure the server: - **Name**: `compliance-tools` - **Description**: `Audit logging and regulatory compliance` - **Resource Tier**: XSmall 3. Add the following tools. For each tool, select **Processor** from the component type dropdown, then click **Lint** to validate: #### log_audit_event This tool creates audit records for every investigation. In production, this would write to an immutable audit log. For this tutorial, it returns a confirmation with the audit ID. ```yaml label: log_audit_event processors: - mapping: | root = { "audit_id": uuid_v4(), "timestamp": now(), "event_type": "dispute_investigation", "transaction_id": this.transaction_id, "customer_id": this.customer_id, "agent_decision": this.decision, "risk_score": this.risk_score, "evidence_reviewed": this.evidence, "outcome": this.outcome, "escalated": this.escalated, "compliance_notes": this.notes, "logged_by": "dispute-resolution-agent", "status": "recorded" } - log: level: INFO message: "Compliance audit event: ${!json()}" meta: mcp: enabled: true description: "Log compliance audit events for dispute resolution. Records customer ID, transaction details, decision, and notes." properties: - name: customer_id type: string description: "Customer identifier (format CUST-XXXX)" required: true - name: transaction_id type: string description: "Transaction identifier (format TXN-XXXXX)" required: true - name: decision type: string description: "Dispute resolution decision (approve_refund, deny_claim, etc.)" required: true - name: risk_score type: number description: "Calculated fraud risk score (0-100)" required: true - name: evidence type: object description: "Evidence reviewed during investigation" required: true - name: outcome type: string description: "Final outcome of the dispute (approved, denied, escalated, pending)" required: true - name: escalated type: boolean description: "Whether case was escalated for manual review" required: false - name: notes type: string description: "Additional compliance notes" required: false ``` #### check_regulatory_requirements This tool returns applicable regulations, customer rights, bank obligations, and required documentation for different dispute types. This ensures agents follow proper procedures for Regulation E, Fair Credit Billing Act, and card network rules. ```yaml label: check_regulatory_requirements mapping: | root = match { this.dispute_type == "fraud" => { "dispute_type": "fraud", "regulations_applicable": [ "Regulation E (Electronic Fund Transfer Act)", "Fair Credit Billing Act", "Card Network Rules (Visa/Mastercard)" ], "customer_rights": { "liability_limit": 50.00, "zero_liability_if_reported_promptly": true, "notification_deadline_days": 60 }, "bank_obligations": { "provisional_credit_required": true, "provisional_credit_deadline_days": 10, "investigation_deadline_days": 90, "customer_notification_required": true }, "documentation_required": [ "Customer dispute affidavit", "Transaction details", "Customer communication log", "Investigation findings" ], "timeline": { "acknowledge_dispute_hours": 24, "provisional_credit_days": 10, "final_decision_days": 90 } }, this.dispute_type == "billing_error" => { "dispute_type": "billing_error", "regulations_applicable": [ "Fair Credit Billing Act", "Regulation Z (Truth in Lending)" ], "customer_rights": { "dispute_window_days": 60, "interest_suspension": true }, "bank_obligations": { "acknowledge_dispute_days": 30, "investigation_deadline_days": 90, "correction_required_if_error_found": true }, "documentation_required": [ "Billing statement", "Customer dispute letter", "Merchant communication (if any)", "Investigation results" ], "timeline": { "acknowledge_dispute_days": 30, "resolution_days": 90 } }, this.dispute_type == "service_not_received" => { "dispute_type": "service_not_received", "regulations_applicable": [ "Fair Credit Billing Act", "Card Network Chargeback Rules" ], "customer_rights": { "chargeback_eligibility": true, "dispute_window_days": 120 }, "bank_obligations": { "verify_merchant_response": true, "chargeback_processing_days": 45 }, "documentation_required": [ "Proof of non-delivery or service failure", "Merchant communication attempts", "Order/booking confirmation", "Merchant response (if obtained)" ], "timeline": { "merchant_response_wait_days": 15, "chargeback_filing_days": 120 } }, _ => { "dispute_type": "general", "regulations_applicable": [ "Fair Credit Billing Act" ], "customer_rights": { "dispute_right": true, "dispute_window_days": 60 }, "bank_obligations": { "investigation_required": true, "customer_notification_required": true }, "documentation_required": [ "Customer dispute statement", "Transaction evidence" ], "timeline": { "standard_review_days": 30 } } } meta: mcp: enabled: true description: "Check regulatory requirements for dispute resolution based on dispute type." properties: - name: dispute_type type: string description: "Type of dispute (fraud, billing_error, service_not_received)" required: true ``` 4. Click **Create MCP Server**. Wait for the server status to show **Running**. You now have four MCP servers with nine total tools, organized by domain. ## [](#create-the-root-agent-with-subagents)Create the root agent with subagents The root agent orchestrates sub-agents and makes final recommendations. You’ll configure the root agent first, then add four specialized sub-agents within the same form. > ❗ **IMPORTANT** > > Sub-agents inherit the LLM provider, model, resource tier, and max iterations from the root agent. This tutorial uses GPT-5 Mini and max iterations of 15 to optimize performance. Using slower models (GPT-5.2, Claude Sonnet 4.5) or high max iterations (50+) will cause sub-agents to execute slowly. Each sub-agent call could take 60-90 seconds instead of 10-15 seconds. 1. Go to **Agentic AI** > **AI Agents**. 2. Click **Create Agent**. 3. Configure the root agent: - **Name**: `dispute-resolution-agent` - **Description**: `Orchestrates transaction dispute investigations` - **Resource Tier**: Large - **AI Gateway**: Select the gateway you configured - **Provider**: OpenAI - **Model**: GPT-5 Mini (fast, cost-effective for structured workflows) - **Max Iterations**: 15 4. In the **System Prompt** field, enter: ```text You are the root agent for a transaction dispute resolution system at ACME Bank. Your role is to orchestrate sub-agents and make final recommendations to customers about disputed transactions. ## Your Responsibilities - Route customer queries to appropriate sub-agents - Aggregate results from multiple sub-agents - Make evidence-based recommendations - Communicate clearly with customers - Escalate complex cases to human agents ## Available Sub-Agents You have access to four specialized sub-agents via A2A protocol: 1. **account-agent**: Retrieves customer account data and transaction history 2. **fraud-agent**: Analyzes fraud risk and calculates risk scores 3. **merchant-agent**: Verifies merchant legitimacy and reputation 4. **compliance-agent**: Logs audit events and checks regulatory requirements ## Decision Framework When investigating a dispute, follow this process: 1. Start with account-agent to get customer and transaction details 2. Route to fraud-agent if fraud is suspected 3. Route to merchant-agent to verify merchant legitimacy 4. Route to compliance-agent to log the investigation and check requirements 5. Aggregate all evidence and make recommendation ## Risk-Based Recommendations Based on aggregated evidence, take these actions: - **Fraud score 80-100 + high merchant risk**: Block the transaction immediately, block the card, issue new card - **Fraud score 60-79**: Hold for specialist review, temporary card block - **Fraud score 40-59**: Ask customer to verify with merchant first before taking action - **Fraud score 0-39**: Likely legitimate transaction, help customer recall the purchase ## Escalation Criteria Escalate to human agent when: - Fraud score is medium (40-70) and evidence is conflicting - Customer disputes the recommendation strongly - Regulatory requirements exceed available tools - Subscription or recurring billing issues require merchant intervention ## Compliance Constraints Never: - Expose full credit card numbers or SSNs (use masked versions only) - Make guarantees about dispute outcomes (use "likely" or "recommend") - Process disputes without logging to compliance-agent - Reveal internal fraud detection logic or merchant scoring details to customers - Make decisions without sub-agent evidence - Ask customers for screenshots or additional proof (you have the transaction records) ## Customer Communication Style **Clear, bank-appropriate language:** - Use "I've reviewed your account" not "I called the account-agent" - Use "this charge doesn't match your typical spending" not "fraud score is 95/100" - Use "We're blocking this card" not "I recommend you freeze it" - Don't reveal merchant reputation scores or fraud report counts **Proactive protection:** For likely fraud (score 80+): - Block the card immediately: "We're blocking your card ending in [XXXX] right now to prevent additional fraudulent charges" - Issue replacement: "We'll send you a replacement card with a new number" - Process the claim: "You'll see the credit for this charge within 10 business days" For uncertain cases (score 40-79): - Temporary block: "I'm placing a temporary hold on this card while we investigate" - Escalate: "A specialist will contact you within 24 hours" **Concise responses:** Keep responses to 3-4 short paragraphs maximum. Customers want action, not detailed analysis. ## Example Investigation Flow Customer: "I see a $1,847.99 charge from 'LUXURY WATCHES INT' in Singapore on transaction TXN-89012. This is fraud. My customer ID is CUST-1001." **Your response to customer:** "I've reviewed your account and this transaction. This charge doesn't match your typical spending pattern, and you haven't made international purchases in the past 90 days. Here's what I'm doing: - Blocking your card ending in 4532 right now to prevent any additional unauthorized charges - Approving your dispute for the full $1,847.99 - you'll see the credit within 10 business days - Sending you a replacement card with a new number within 5-7 business days Your dispute has been logged and meets the requirements under Regulation E for unauthorized electronic fund transfers. Is there anything else I can help you with today?" **What you actually did behind the scenes:** 1. Called account-agent → confirmed US-based customer, no international history 2. Called fraud-agent → received score 95/100 (critical risk) 3. Called merchant-agent → confirmed high fraud indicators 4. Called compliance-agent → logged under Regulation E 5. Made decision: transaction is fraudulent, block card immediately (Don't share the scores or technical details with the customer) **Note:** When talking to customers, use natural banking language like "approving your dispute." But for programmatic JSON responses, "recommendation" describes the TRANSACTION status, not the dispute claim status. ## Programmatic Invocations When invoked from a pipeline or automated system (you'll receive transaction data without conversational context), respond with ONLY valid JSON. No explanatory text, no markdown formatting, no commentary before or after - just the JSON object. Required JSON format: { "recommendation": "block_and_investigate" | "hold_for_review" | "approve", "fraud_score": , "confidence": "high" | "medium" | "low", "reasoning": "" } **Recommendation field definitions:** - **"block_and_investigate"**: Transaction is fraudulent. Block the card immediately and investigate. - **"hold_for_review"**: Unclear if fraudulent. Place temporary hold and escalate to human specialist. - **"approve"**: Transaction is legitimate. Customer likely forgot about it or needs clarification. **Mapping from conversational actions:** - If you would block the card → use "block_and_investigate" - If you would escalate to specialist → use "hold_for_review" - If transaction seems legitimate → use "approve" The pipeline will parse this JSON to make automated decisions. Any non-JSON response will cause processing failures. ``` 5. Skip the **MCP Tools** section (the root agent uses A2A protocol to call sub-agents, not direct tools). 6. In the **Subagents** section, click **\+ Add Subagent**. ### [](#add-account-agent-subagent)Add account agent subagent The account agent retrieves customer account and transaction data. 1. Configure the subagent: - **Name**: `account-agent` - **Description**: `Retrieves customer account and transaction data` 2. In the subagent’s **System Prompt** field, enter: ```text You are the account agent for ACME Bank's dispute resolution system. You specialize in retrieving customer account information and transaction data. ## Your Responsibilities - Look up customer account details with PII masking - Retrieve specific transaction information - Provide transaction pattern analysis - Return only data available from your tools ## Available Tools 1. **get_customer_account**: Returns account data with masked PII - Input: customer_id - Returns: Name, masked email, card last 4, account type, location 2. **get_transaction_details**: Returns detailed transaction information - Input: transaction_id - Returns: Amount, merchant, date, location, card used 3. **get_transaction_history**: Returns spending pattern analysis - Input: customer_id - Returns: Aggregated spending patterns, categories, locations ## PII Protection Rules Always return masked data: - Email: First letter + **** + @domain (for example, "s****@example.com") - Phone: ***-***-XXXX (last 4 digits only) - Card: Last 4 digits only - Never return: Full card numbers, SSNs, full account numbers ## Response Format Structure responses clearly: "I found the following account information: - Customer: [Name] - Account Type: [Type] - Card ending in: [Last 4] - Primary Location: [City, State, Country] Transaction details: - Amount: $[Amount] - Merchant: [Merchant Name] - Date: [Date] - Location: [Transaction Location]" ## Error Handling If data not found: - "I couldn't find an account for customer ID [ID]" - "No transaction found with ID [ID]" - Never guess or make up information ## What You Don't Do - Don't calculate fraud scores (that's fraud-agent's job) - Don't verify merchants (that's merchant-agent's job) - Don't make recommendations about disputes - Don't log audit events (that's compliance-agent's job) Your job is data retrieval only. Provide accurate, masked data and let the root agent make decisions. ``` 3. In the subagent’s **MCP Tools** section, select `account-tools`. ### [](#add-fraud-agent-subagent)Add fraud agent subagent The fraud agent calculates fraud risk scores and identifies fraud indicators. 1. Click **\+ Add Subagent** again. 2. Configure the subagent: - **Name**: `fraud-agent` - **Description**: `Calculates fraud risk scores and identifies fraud indicators` 3. In the subagent’s **System Prompt** field, enter: ```text You are the fraud detection agent for ACME Bank's dispute resolution system. You specialize in analyzing transactions for fraud indicators and calculating risk scores. ## Your Responsibilities - Calculate fraud risk scores (0-100 scale) - Identify specific fraud indicators - Provide risk assessment reasoning - Return confidence levels with assessments ## Available Tools 1. **calculate_fraud_score**: Multi-factor fraud scoring - Input: transaction_id, customer_id - Returns: Fraud score (0-100), risk level, breakdown by factor, recommendation 2. **get_risk_indicators**: Detailed fraud signal detection - Input: transaction_id - Returns: Array of risk indicators with severity levels ## Risk Scoring Factors Consider these factors: 1. **Location Risk** (0-30 points) - International vs. customer's country - City mismatch from customer's primary location - High-risk countries 2. **Merchant Risk** (0-25 points) - Merchant reputation score - Fraud report history - Business verification status 3. **Amount Risk** (0-25 points) - Deviation from customer's average - Unusually large for merchant category - Round numbers (potential testing) 4. **Velocity Risk** (0-10 points) - Multiple transactions in short timeframe - Rapid succession of purchases - Geographic impossibility 5. **Category Risk** (0-10 points) - Outside customer's typical categories - High-risk MCC codes - Mismatch with spending patterns ## Risk Levels - **Critical (80-100)**: Almost certainly fraud, immediate action needed - **High (60-79)**: Strong fraud indicators, hold for review - **Medium (40-59)**: Some concerning factors, customer verification recommended - **Low (20-39)**: Minor flags, likely legitimate - **Minimal (0-19)**: No significant fraud indicators ## Response Format Structure your analysis: "Fraud Risk Analysis: Fraud Score: [Score]/100 - [Risk Level] Risk Breakdown: - Location Risk: [Score] - [Explanation] - Merchant Risk: [Score] - [Explanation] - Amount Risk: [Score] - [Explanation] - Velocity Risk: [Score] - [Explanation] - Category Risk: [Score] - [Explanation] Key Indicators: - [Indicator 1] - [Indicator 2] - [Indicator 3] Recommendation: [block_and_investigate | hold_for_review | monitor_closely | approve]" ## What You Don't Do - Don't retrieve account or transaction data (use what's provided) - Don't verify merchants (that's merchant-agent's job) - Don't make final dispute decisions (provide recommendation only) - Don't log audit events Your job is fraud analysis only. Provide objective risk assessment based on available data. ``` 4. In the subagent’s **MCP Tools** section, select `fraud-tools`. ### [](#add-merchant-agent-subagent)Add merchant agent subagent The merchant agent verifies merchant legitimacy and reputation. 1. Click **\+ Add Subagent** again. 2. Configure the subagent: - **Name**: `merchant-agent` - **Description**: `Verifies merchant legitimacy and reputation` 3. In the subagent’s **System Prompt** field, enter: ```text You are the merchant verification agent for ACME Bank's dispute resolution system. You specialize in verifying merchant legitimacy and reputation. ## Your Responsibilities - Verify merchant reputation and legitimacy - Look up merchant category codes (MCC) - Identify known fraud patterns for merchant categories - Provide merchant-specific insights ## Available Tools 1. **verify_merchant**: Merchant reputation lookup - Input: merchant_name - Returns: Reputation score, fraud reports, business verification, red flags 2. **get_merchant_category**: MCC code analysis - Input: mcc (4-digit code) - Returns: Category details, typical transaction ranges, fraud risk profile ## Reputation Scoring Interpret reputation scores: - **90-100**: Excellent, trusted merchant - **70-89**: Good, established business - **50-69**: Moderate, some concerns - **30-49**: Poor, significant red flags - **0-29**: High risk, strong fraud indicators ## Red Flags to Report Watch for: - High volume of fraud reports - Recently established businesses in high-risk categories - Unverified business registration - Pattern of chargebacks - Operates in high-risk jurisdictions - Billing descriptor mismatches ## Common Merchant Issues Be aware of legitimate merchant problems: - **Subscription services**: Known for duplicate billing, difficult cancellation - **International hotels**: Currency conversion confusion, incidental charges - **Online marketplaces**: Third-party sellers, billing descriptor confusion - **Travel booking**: Pre-authorization holds, cancellation fee disputes ## Response Format Structure your verification: "Merchant Verification Results: Merchant: [Name] Reputation Score: [Score]/100 - [Level] Verification Status: [Verified | Unverified | Unknown] Business Details: - Country: [Country] - Years in Operation: [Years] - Registration: [Verified/Unverified] Fraud Reports: - Total Reports: [Count] - Recent (30 days): [Count] - Confirmed Fraud Cases: [Count] Category Analysis (MCC [Code]): - Category: [Category Name] - Risk Profile: [High/Medium/Low] - Typical Transaction Range: $[Min]-$[Max] Red Flags: - [Flag 1] - [Flag 2] Recommendation: [trusted_merchant | verify_subscription_details | manual_review_required | block_merchant]" ## What You Don't Do - Don't calculate fraud scores (that's fraud-agent's job) - Don't retrieve transaction data (that's account-agent's job) - Don't make final dispute decisions - Don't log audit events Your job is merchant verification only. Provide objective assessment of merchant legitimacy. ``` 4. In the subagent’s **MCP Tools** section, select `merchant-tools`. ### [](#add-compliance-agent-subagent)Add compliance agent subagent The compliance agent handles audit logging and regulatory requirements. 1. Click **\+ Add Subagent** again. 2. Configure the subagent: - **Name**: `compliance-agent` - **Description**: `Handles audit logging and regulatory requirements` 3. In the subagent’s **System Prompt** field, enter: ```text You are the compliance agent for ACME Bank's dispute resolution system. You specialize in regulatory requirements and audit logging. ## Your Responsibilities - Log all dispute investigation actions for audit trail - Check regulatory requirements for dispute types - Verify compliance with banking regulations - Provide timeline and documentation requirements ## Available Tools 1. **log_audit_event**: Log investigation actions - Input: Transaction ID, customer ID, decision, evidence, outcome - Returns: Audit record confirmation 2. **check_regulatory_requirements**: Look up compliance rules - Input: dispute_type (fraud, billing_error, service_not_received) - Returns: Regulations, timelines, documentation requirements ## Regulatory Frameworks You work with these regulations: 1. **Regulation E (Electronic Fund Transfer Act)** - Applies to: Fraud disputes, unauthorized transactions - Customer liability: $50 if reported within 2 days, $500 if reported within 60 days - Bank must provide provisional credit within 10 business days - Investigation deadline: 90 days 2. **Fair Credit Billing Act** - Applies to: Billing errors, disputes - Customer must dispute within 60 days of statement - Bank must acknowledge within 30 days - Resolution deadline: 90 days 3. **Card Network Rules (Visa/Mastercard)** - Chargeback rights and timelines - Merchant response requirements - Evidence requirements ## Documentation Requirements For each dispute type, log: **Fraud Disputes:** - Customer dispute affidavit - Transaction details - Fraud indicators identified - Decision and reasoning - Customer notification **Billing Errors:** - Billing statement - Customer dispute letter - Merchant communication attempts - Resolution details **Service Not Received:** - Proof of non-delivery - Merchant communication attempts - Order/booking confirmation - Resolution outcome ## Timeline Tracking Monitor key deadlines: - Acknowledge dispute: 24-30 days (varies by type) - Provisional credit: 10 business days (fraud) - Final decision: 90 days (most disputes) - Chargeback filing: 120 days (service issues) ## Response Format For regulatory checks: "Compliance Requirements: Dispute Type: [Type] Applicable Regulations: - [Regulation 1] - [Regulation 2] Customer Rights: - Liability Limit: $[Amount] - Notification Deadline: [Days] days Bank Obligations: - Provisional Credit: [Required/Not Required] - Investigation Deadline: [Days] days - Customer Notification: [Required/Not Required] Documentation Required: - [Document 1] - [Document 2] - [Document 3] Timeline: - Acknowledge: [Timeframe] - Decision: [Timeframe]" For audit logging: "Audit Event Logged: Audit ID: [UUID] Timestamp: [ISO 8601] Investigation Details: [Summary] Decision: [Decision] Evidence: [Evidence Sources] Status: Recorded" ## What You Don't Do - Don't retrieve transaction or account data - Don't calculate fraud scores - Don't verify merchants - Don't make dispute recommendations Your job is compliance and audit only. Ensure all investigations are properly documented and regulatory requirements are met. ``` 4. In the subagent’s **MCP Tools** section, select `compliance-tools`. 5. Click **Create Agent** to create the root agent with all four subagents. Wait for the agent status to show **Running**. ## [](#test-investigation-scenarios)Test investigation scenarios Test the multi-agent system with realistic dispute scenarios. Each scenario demonstrates different patterns: clear fraud, legitimate transactions, escalation cases, and edge cases. 1. Go to **Agentic AI** > **AI Agents**. 2. Click on `dispute-resolution-agent`. 3. Open the **Inspector** tab. ### [](#clear-fraud-case)Clear fraud case Test how the system handles obvious fraud. Enter this query: ```text I see a $1,847.99 charge from 'LUXURY WATCHES INT' in Singapore on transaction TXN-89012. I've never been to Singapore and don't buy watches. My customer ID is CUST-1001. This is fraud. ``` Watch the conversation panel as the investigation progresses. You’ll see the root agent call each sub-agent in sequence. After all sub-agents complete (30-90 seconds), the agent sends its final response to the chat. The final response should clearly state the transaction is fraudulent, summarize findings from each sub-agent, and provide a list of actions the agent is going to take. This flow demonstrates multi-agent coordination for high-confidence fraud decisions with realistic banking communication. ### [](#escalation-required)Escalation required Test how the system handles ambiguous cases requiring human review. Click **Clear context**. Then enter: ```text I see three $29.99 charges from 'EXAMPLE STREAMING' last month, but I only subscribed once. My customer ID is CUST-1002 and one of the transactions is TXN-89014. ``` Watch the conversation panel as the agent investigates. After the sub-agent calls complete, the agent should send a response with a realistic escalation. This demonstrates the escalation pattern when evidence is ambiguous and requires human review. ## [](#monitor-multi-agent-execution)Monitor multi-agent execution **Inspector** shows real-time progress in the conversation panel, but **Transcripts** provides detailed post-execution analysis with timing, token usage, and full trace hierarchy. 1. In the left navigation, click **Transcripts**. 2. Select a recent transcript from your fraud case test. In the trace hierarchy, you’ll see: - Root agent invocation (top-level span) - Multiple `invoke_agent` spans for each sub-agent call - Individual LLM calls within each agent - MCP tool invocations within sub-agents In the summary panel, check: - **Duration**: Total investigation time (typically 5-15 seconds) - **Token Usage**: Cost tracking across all agents - **LLM Calls**: How many reasoning steps were needed This visibility helps you: - Verify sub-agents are being called in the right order - Identify slow sub-agents that need optimization - Track costs per investigation for budgeting For detailed trace structure, see [Agent trace hierarchy](../../../observability/concepts/#agent-trace-hierarchy). ## [](#integrate-with-streaming-pipeline)Integrate with streaming pipeline Process disputes automatically from transaction streams. When transactions meet certain risk thresholds, the pipeline invokes the dispute agent for immediate investigation. ### [](#create-a-secret-for-the-agent-card-url)Create a secret for the agent card URL The pipeline needs the agent card URL to invoke the dispute resolution agent. 1. Go to **Agentic AI** > **AI Agents**. 2. Click on `dispute-resolution-agent`. 3. Open the **A2A** tab. 4. Copy the agent URL displayed at the top. 5. Go to **Connect** > **Secrets**. 6. Click **Create Secret**. 7. Create the secret: - **Name**: `DISPUTE_AGENT_CARD_URL` - **Value**: Paste the agent URL and append `/.well-known/agent-card.json` to the end For example, if the agent URL is: https://abc123.ai-agents.def456.cloud.redpanda.com The secret value should be: https://abc123.ai-agents.def456.cloud.redpanda.com/.well-known/agent-card.json 8. Click **Create Secret**. ### [](#create-topics-for-transaction-data)Create topics for transaction data Create the topics the pipeline will use for input and output. 1. Go to **Topics** in the Redpanda Cloud Console. 2. Click **Create Topic**. 3. Create the input topic: - **Name**: `bank.transactions` - **Partitions**: 3 - **Replication factor**: 3 4. Click **Create Topic** again. 5. Create the output topic: - **Name**: `bank.dispute_results` - **Partitions**: 3 - **Replication factor**: 3 ### [](#create-a-sasl-user-for-topic-access)Create a SASL user for topic access The pipeline needs SASL credentials to read from and write to Redpanda topics. 1. Go to **Security** > **Users** in the Redpanda Cloud Console. 2. Click **Create User**. 3. Configure the user: - **Username**: `dispute-pipeline-user` - **Password**: Generate a secure password - **Mechanism**: SCRAM-SHA-256 4. Save the username and password. You’ll need them for the pipeline secrets. 5. Click **Create**. 6. Click **Create ACLs** to grant permissions. 7. Click the **Clusters** tab for cluster permissions and select **Allow all**. 8. Click **Add rule** to add another ACL. 9. Click the **Topics** tab for topic permissions: - **Principal**: `dispute-pipeline-user` - **Host**: Allow all hosts (`*`) - **Resource Type**: Topic - **Selector**: Topic names starting with `bank.` - **Operations**: Allow all 10. Click **Add rule** to add another ACL. 11. Click the **Consumer groups** tab for consumer group permissions and select **Allow all**. 12. Click **Create**. ### [](#create-secrets-for-sasl-authentication)Create secrets for SASL authentication The pipeline needs SASL credentials stored as secrets to authenticate with Redpanda topics. 1. Go to **Connect** > **Secrets** in the Redpanda Cloud Console (if not already there). 2. Click **Create Secret**. 3. Create two secrets with these values: - **Name**: `DISPUTE_PIPELINE_USERNAME`, **Value**: `dispute-pipeline-user` - **Name**: `DISPUTE_PIPELINE_PASSWORD`, **Value**: The password you created for `dispute-pipeline-user` ### [](#create-the-pipeline)Create the pipeline 1. Go to **Connect** in the Redpanda Cloud Console. 2. Click **Create a pipeline**. 3. In the numbered steps, click **4 Add permissions**. 4. Select **Service Account**. The Service Account is required for the `a2a_message` processor to authenticate with and invoke the dispute resolution agent. Without this permission, the pipeline will fail when attempting to call the agent. 5. Click **Next**. 6. Name the pipeline `dispute-pipeline`. 7. Paste this configuration and click **Create Pipeline**: ```yaml # Event-driven transaction dispute processing pipeline # Automatically flags high-risk transactions and routes them to dispute agent input: kafka: addresses: ["${REDPANDA_BROKERS}"] topics: ["bank.transactions"] consumer_group: dispute-processor tls: enabled: true sasl: mechanism: SCRAM-SHA-256 user: "${secrets.DISPUTE_PIPELINE_USERNAME}" password: "${secrets.DISPUTE_PIPELINE_PASSWORD}" pipeline: processors: # Filter for high-value or suspicious transactions - branch: request_map: | # Only process transactions above $500 or flagged by upstream systems root = if this.amount > 500 || this.preliminary_flag == true { this } else { deleted() } processors: # Calculate preliminary risk score based on transaction attributes - mapping: | # Preserve original transaction root = this # Location risk: international transactions get higher score let location_risk = if this.merchant.country != this.card.billing_country { 40 } else { 0 } # Amount risk: large amounts relative to account averages let amount_risk = if this.amount > 1000 { 30 } else if this.amount > 500 { 15 } else { 0 } # Velocity risk: check for multiple recent transactions let velocity_risk = if this.recent_transaction_count > 5 { 20 } else { 0 } # Category risk: luxury goods and high-risk categories let category_risk = match this.merchant.mcc { "5944" => 20, # Jewelry "5094" => 25, # Precious stones _ => 0 } # Calculate total score let total_score = $location_risk + $amount_risk + $velocity_risk + $category_risk root.preliminary_risk_score = $total_score root.risk_level = if $total_score > 70 { "high" } else if $total_score > 40 { "medium" } else { "low" } # Route high and medium risk transactions to dispute agent for investigation - branch: request_map: | # Only send to agent if risk is medium or higher root = if this.preliminary_risk_score >= 40 { this } else { deleted() } processors: # Invoke dispute resolution agent via A2A protocol - a2a_message: agent_card_url: "${secrets.DISPUTE_AGENT_CARD_URL}" prompt: | Investigate this potentially fraudulent transaction and respond with ONLY a JSON object (no additional text): Transaction ID: ${! this.transaction_id } Customer ID: ${! this.customer_id } Amount: $${! this.amount } ${! this.currency } Merchant: ${! this.merchant.name } Location: ${! this.merchant.city }, ${! this.merchant.country } Date: ${! this.transaction_date } Preliminary Risk Score: ${! this.preliminary_risk_score }/100 Risk Level: ${! this.risk_level } Return ONLY this JSON format with no other text: { "recommendation": "block_and_investigate" | "hold_for_review" | "approve", "fraud_score": , "confidence": "high" | "medium" | "low", "reasoning": "" } # Map agent response back to transaction record result_map: | # By default, result_map preserves the original message that entered the branch # Just add the agent investigation field root.agent_investigation = if content().string().parse_json().catch(null) != null { content().string().parse_json() } else { { "recommendation": "manual_review_required", "fraud_score": 50, "confidence": "low", "reasoning": "Agent returned unparseable response: " + content().string().slice(0, 100) } } # Merge risk scoring and agent results back to original transaction result_map: | root = content() # Enrich with final decision and tracing metadata - mapping: | # Preserve original transaction and all computed fields root = this # Only set final_decision and alert_level if agent investigation occurred root.final_decision = if this.agent_investigation.exists("recommendation") { match { this.agent_investigation.recommendation == "block_and_investigate" => "blocked", this.agent_investigation.recommendation == "hold_for_review" => "pending_review", this.agent_investigation.recommendation == "approve" => "approved", _ => "manual_review_required" } } else { "low_risk_no_investigation" } root.alert_level = if this.agent_investigation.exists("fraud_score") { match { this.agent_investigation.fraud_score >= 80 => "critical", this.agent_investigation.fraud_score >= 60 => "high", this.agent_investigation.fraud_score >= 40 => "medium", _ => "low" } } else { "low" } # Add execution metadata for tracing back to agent transcripts root.pipeline_metadata = { "processed_at": now().ts_format("2006-01-02T15:04:05.000Z"), "transaction_id": this.transaction_id, "customer_id": this.customer_id, "agent_invoked": this.agent_investigation.exists("fraud_score") } output: kafka: addresses: ["${REDPANDA_BROKERS}"] topic: bank.dispute_results key: "${! this.transaction_id }" tls: enabled: true sasl: mechanism: SCRAM-SHA-256 user: "${secrets.DISPUTE_PIPELINE_USERNAME}" password: "${secrets.DISPUTE_PIPELINE_PASSWORD}" ``` This pipeline: - Consumes transactions from `bank.transactions` topic - Filters for high-value transactions (>$500) or pre-flagged transactions - Calculates preliminary risk score based on location, amount, velocity, and category - Routes transactions with risk score ≥40 to the dispute-resolution-agent via A2A - Outputs investigation results to `bank.dispute_results` topic ### [](#test-the-pipeline)Test the pipeline 1. Authenticate with your Redpanda Cloud cluster: ```bash rpk cloud login ``` 2. Create a test transaction that will trigger the agent investigation: ```bash echo '{ "transaction_id": "TXN-89012", "customer_id": "CUST-1001", "amount": 1847.99, "currency": "USD", "merchant": { "name": "LUXURY WATCHES INT", "category": "jewelry", "country": "Singapore", "mcc": "5944", "city": "Singapore" }, "card": { "last_four": "4532", "billing_country": "USA" }, "transaction_date": "2026-01-21T10:00:00Z", "recent_transaction_count": 2 }' | rpk topic produce bank.transactions ``` This transaction will trigger agent investigation because: - International transaction (Singapore vs USA): +40 risk points - Amount is greater than $1000: +30 risk points - Jewelry category (MCC 5944): +20 risk points - **Total preliminary risk score: 90** (well above the 40 threshold) 3. Wait a minute for the pipeline to process the transaction. You can monitor the progress in **Transcripts**. While the agents investigate, a new transcript for `dispute-resolution-agent` will appear. Until the investigation completes, the transcript will show **awaiting root** status. 4. Consume the results: ```bash rpk topic consume bank.dispute_results --offset end -n 1 ``` You’ll see the complete transaction with agent investigation results: ```json { "agent_investigation": { "confidence": "high", "fraud_score": 91, "reasoning": "Transaction is an international purchase with no recent international activity, from a merchant with strong fraud indicators, and the amount is a large outlier for this account; immediate block and investigation recommended.", "recommendation": "block_and_investigate" }, "alert_level": "critical", "amount": 1847.99, "card": { "billing_country": "USA", "last_four": "4532" }, "currency": "USD", "customer_id": "CUST-1001", "final_decision": "blocked", "merchant": { "category": "jewelry", "city": "Singapore", "country": "Singapore", "mcc": "5944", "name": "LUXURY WATCHES INT" }, "pipeline_metadata": { "agent_invoked": true, "customer_id": "CUST-1001", "processed_at": "2026-01-27T14:29:19.436Z", "transaction_id": "TXN-89012" }, "preliminary_risk_score": 90, "recent_transaction_count": 2, "risk_level": "high", "transaction_date": "2026-01-21T10:00:00Z", "transaction_id": "TXN-89012" } ``` This output contains everything downstream systems need such as fraud monitoring, customer alerts, and audit logging. The pipeline uses a two-stage filter: - Only processes transactions with `amount > 500` or `preliminary_flag == true` - Only sends transactions to the agent if `preliminary_risk_score >= 40` Transactions that pass the first filter but not the second (for example, a $600 domestic transaction with low risk) will appear in the output with: - `final_decision: "low_risk_no_investigation"` - `alert_level: "low"` - No `agent_investigation` field Only transactions meeting the risk threshold invoke the dispute resolution agent. ### [](#trace-pipeline-execution-to-agent-transcripts)Trace pipeline execution to agent transcripts Use the pipeline metadata timestamp to find the corresponding agent execution in the **Transcripts** view. 1. Note the `processed_at` timestamp from the pipeline output (for example: `2026-01-26T18:30:45.000Z`). 2. Go to **Agentic AI** > **Transcripts**. 3. Find transcripts for `dispute-resolution-agent` that match your timestamp. > 📝 **NOTE** > > The search function does not search through prompt content or attribute values. Use the timestamp to narrow down the time window, then manually review transcripts from that period. In the transcript details, you’ll see: - The full prompt sent to the agent (including transaction ID and details) - Each sub-agent invocation (account-agent, fraud-agent, merchant-agent, compliance-agent) - Token usage and execution time for the investigation - The complete JSON response returned to the pipeline ## [](#troubleshoot)Troubleshoot For comprehensive troubleshooting guidance, see [Troubleshoot AI Agents](../../troubleshooting/). ### [](#test-with-mock-data)Test with mock data The mock tools in this tutorial use hardcoded customer and transaction IDs for testing: - Customer IDs: `CUST-1001`, `CUST-1002`, `CUST-1003` - Transaction IDs: `TXN-89012`, `TXN-89013`, `TXN-89014`, `TXN-89015` Use these documented test IDs when testing in **Inspector** or the pipeline. The sub-agents' mock tools require valid IDs to return transaction details, account history, and fraud indicators. Using other IDs (like `TXN-TEST-001` or `CUST-9999`) will cause the tools to return "not found" errors, and the root agent won’t be able to complete its investigation. For production deployments, replace the mock tools with API calls to your account, fraud detection, merchant verification, and compliance systems. ## [](#next-steps)Next steps - [Agent Architecture Patterns](../../architecture-patterns/) - [Integration Patterns Overview](../../integration-overview/) - [Pipeline Integration Patterns](../../pipeline-integration-patterns/) - [Monitor Agent Activity](../../monitor-agents/) - [MCP Tool Design](../../../mcp/remote/best-practices/) --- # Page 20: AI Gateway **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway.md --- # AI Gateway --- title: AI Gateway latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/index.adoc description: Keep AI-powered apps running with automatic provider failover, prevent runaway spend with centralized budget controls, and govern access across teams, apps, and service accounts. personas: platform_admin, app_developer, evaluator page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). - [What is an AI Gateway?](what-is-ai-gateway/) Understand how AI Gateway keeps AI-powered apps highly available across providers and prevents runaway AI spend with centralized cost governance. - [AI Gateway Quickstart](gateway-quickstart/) Get started with AI Gateway. Configure providers, create your first gateway with failover and budget controls, and route your first request. - [AI Gateway Architecture](gateway-architecture/) Technical architecture of Redpanda AI Gateway, including how the control plane, data plane, and observability plane deliver high availability, cost governance, and multi-tenant isolation. - For Administrators - [AI Gateway Setup Guide](admin/setup-guide/) Set up AI Gateway for your organization. Enable providers, configure failover for high availability, set budget controls, and create gateways with team-level isolation. - For Builders - [Discover Available Gateways](builders/discover-gateways/) Find which AI Gateways you can access and their configurations. - [Connect Your Agent](builders/connect-your-agent/) Integrate your AI agent or application with Redpanda Agentic Data Plan for unified LLM access. - [CEL Routing Cookbook](cel-routing-cookbook/) CEL routing cookbook for Redpanda AI Gateway with common patterns, examples, and best practices. - [MCP Gateway](mcp-aggregation-guide/) Learn how to use the MCP Gateway to aggregate MCP servers, configure deferred tool loading, create orchestrator workflows, and manage security. --- # Page 21: AI Gateway Setup Guide **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/admin/setup-guide.md --- # AI Gateway Setup Guide --- title: AI Gateway Setup Guide latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/admin/setup-guide page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/admin/setup-guide.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc description: Set up AI Gateway for your organization. Enable providers, configure failover for high availability, set budget controls, and create gateways with team-level isolation. page-topic-type: how-to personas: platform_admin learning-objective-1: Enable LLM providers and models in the catalog learning-objective-2: Create and configure gateways with routing policies, rate limits, and spend limits learning-objective-3: Set up MCP tool aggregation for AI agents page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-02" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This guide walks administrators through the setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation. After completing this guide, you will be able to: - Enable LLM providers and models in the catalog - Create and configure gateways with routing policies, rate limits, and spend limits - Set up MCP tool aggregation for AI agents ## [](#prerequisites)Prerequisites - Access to the Redpanda Cloud Console with administrator privileges - API keys for at least one LLM provider (OpenAI, Anthropic, Google AI) - (Optional) MCP server endpoints if you plan to use tool aggregation ## [](#enable-a-provider)Enable a provider Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator. 1. In the Redpanda Cloud Console, navigate to **Agentic** → **AI Gateway** → **Providers**. 2. Select a provider (for example, Anthropic). 3. On the Configuration tab for the provider, click **Add configuration**. 4. Enter your API Key for the provider. > 💡 **TIP** > > Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy. 5. Click **Save** to enable the provider. Repeat this process for each LLM provider you want to make available through AI Gateway. ## [](#enable-models)Enable models The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models. The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases. 1. Navigate to **Agentic** → **AI Gateway** → **Models**. 2. Review the list of available models from enabled providers. 3. For each model you want to expose through gateways, toggle it to **Enabled**. For example: - `openai/gpt-5.2` - `openai/gpt-5.2-mini` - `anthropic/claude-sonnet-4.5` - `anthropic/claude-opus-4.6` 4. Click **Save changes**. Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways. ### [](#model-naming-convention)Model naming convention Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider. For example: - `openai/gpt-5.2` - `anthropic/claude-sonnet-4.5` - `openai/gpt-5.2-mini` ## [](#create-a-gateway)Create a gateway A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It’s a "virtual gateway" that you can create per team, environment (staging/production), product, or customer. 1. Navigate to **Agentic** → **AI Gateway** → **Gateways**. 2. Click **Create Gateway**. 3. Configure the gateway: - **Name**: Choose a descriptive name (for example, `production-gateway`, `team-ml-gateway`, `staging-gateway`) - **Workspace**: Select the workspace this gateway belongs to > 💡 **TIP** > > A workspace is conceptually similar to a resource group in Redpanda streaming. - **Description** (optional): Add context about this gateway’s purpose - **Tags** (optional): Add metadata for organization and filtering 4. Click **Create**. 5. After creation, note the following information: - **Gateway endpoint**: URL for API requests (for example, `[https://example/gateways/d633lffcc16s73ct95mg/v1](https://example/gateways/d633lffcc16s73ct95mg/v1)`) The gateway ID is embedded in the URL. You’ll share the gateway endpoint with users who need to access this gateway. ## [](#configure-llm-routing)Configure LLM routing On the gateway details page, select the **LLM** tab to configure rate limits, spend limits, routing, and provider pools with fallback options. The LLM routing pipeline visually represents the request lifecycle: 1. **Rate Limit**: Global rate limit (for example, 100 requests/second) 2. **Spend Limit / Monthly Budget**: Monthly budget with blocking enforcement (for example, $15K/month) 3. **Routing**: Primary provider pool with optional fallback provider pools ### [](#configure-rate-limits)Configure rate limits Rate limits control how many requests can be processed within a time window. 1. In the **LLM** tab, locate the **Rate Limit** section. 2. Click **Add rate limit**. 3. Configure the limit: - **Requests per second**: Maximum requests per second (for example, `100`) - **Burst allowance** (optional): Allow temporary bursts above the limit 4. Click **Save**. Rate limits apply to all requests through this gateway, regardless of model or provider. ### [](#configure-spend-limits-and-budgets)Configure spend limits and budgets Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded. 1. In the **LLM** tab, locate the **Spend Limit** section. 2. Click **Configure budget**. 3. Set the budget: - **Monthly budget**: Maximum spend per month (for example, `$15000`) - **Enforcement**: Choose **Block** to reject requests after the budget is exceeded, or **Alert** to notify but allow requests - **Notification threshold** (optional): Alert when X% of budget is consumed (for example, `80%`) 4. Click **Save**. Budget tracking uses estimated costs based on token usage and public provider pricing. ### [](#configure-routing-and-provider-pools)Configure routing and provider pools Provider pools define which LLM providers handle requests, with support for primary and fallback configurations. 1. In the **LLM** tab, locate the **Routing** section. 2. Click **Add provider pool**. 3. Configure the primary pool: - **Name**: For example, `primary-anthropic` - **Providers**: Select one or more providers (for example, Anthropic) - **Models**: Choose which models to include (for example, `anthropic/claude-sonnet-4.5`) - **Load balancing**: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.) 4. (Optional) Click **Add fallback pool** to configure automatic failover: - **Name**: For example, `fallback-openai` - **Providers**: Select fallback provider (for example, OpenAI) - **Models**: Choose fallback models (for example, `openai/gpt-5.2`) - **Trigger conditions**: When to activate fallback: - Rate limit exceeded (429 from primary) - Timeout (primary provider slow) - Server errors (5xx from primary) 5. Configure routing rules using CEL expressions (optional): For simple routing, select **Route all requests to primary pool**. For advanced routing based on request properties, use CEL expressions. See [CEL Routing Cookbook](../../cel-routing-cookbook/) for examples. Example CEL expression for tier-based routing: ```cel request.headers["x-user-tier"] == "premium" ? "anthropic/claude-opus-4.6" : "anthropic/claude-sonnet-4.5" ``` 6. Click **Save routing configuration**. > 💡 **TIP** > > Provider pool (UI) = Backend pool (API) ### [](#load-balancing-and-multi-provider-distribution)Load balancing and multi-provider distribution If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance: - Round-robin: Distribute evenly across all providers - Weighted: Assign weights (for example, 80% to Anthropic, 20% to OpenAI) - Least latency: Route to fastest provider based on recent performance - Cost-optimized: Route to cheapest provider for each model ## [](#configure-mcp-tools-optional)Configure MCP tools (optional) If your users will build [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) that need access to [tools](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-tool) via [Model Context Protocol (MCP)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#model-context-protocol-mcp), configure MCP tool aggregation. On the gateway details page, select the **MCP** tab to configure tool discovery and execution. The MCP proxy aggregates multiple [MCP servers](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server), allowing agents to find and call tools through a single endpoint. ### [](#configure-mcp-rate-limits)Configure MCP rate limits Rate limits for MCP work the same way as LLM rate limits. 1. In the **MCP** tab, locate the **Rate Limit** section. 2. Click **Add rate limit**. 3. Configure the maximum requests per second and optional burst allowance. 4. Click **Save**. ### [](#add-mcp-servers)Add MCP servers 1. In the **MCP** tab, click **Create MCP Server**. 2. Configure the server: - **Server ID**: Unique identifier for this server - **Display Name**: Human-readable name (for example, `database-server`, `slack-server`) - **Server Address**: Endpoint URL for the MCP server (for example, `[https://mcp-database.example.com](https://mcp-database.example.com)`) 3. Configure server settings: - **Timeout (seconds)**: Maximum time to wait for a response from this server - **Enabled**: Whether this server is active and accepting requests - **Defer Loading Override**: Controls whether tools from this server are loaded upfront or on demand | Option | Description | | --- | --- | | Inherit from gateway | Use the gateway-level deferred loading setting (default) | | Enabled | Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. This can reduce token usage by 80-90%. | | Disabled | Always load all tools from this server upfront. | - **Forward OIDC Token Override**: Controls whether the client’s OIDC token is forwarded to this MCP server | Option | Description | | --- | --- | | Inherit from gateway | Use the gateway-level OIDC forwarding setting (default) | | Enabled | Always forward the OIDC token to this server | | Disabled | Never forward the OIDC token to this server | 4. Click **Save** to add the server to this gateway. Repeat for each MCP server you want to aggregate. See [MCP Gateway](../../mcp-aggregation-guide/) for detailed information about MCP aggregation. ### [](#configure-the-mcp-orchestrator)Configure the MCP orchestrator The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips. Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator. The orchestrator is pre-configured when you initialize the MCP gateway. Its server configuration (Server ID, Display Name, Transport, Command, and Timeout) is system-managed and cannot be modified. You can configure blocked tool patterns to prevent specific tools from being called through the orchestrator: 1. In the **MCP** tab, select the orchestrator server to edit it. 2. Under **Blocked Tools**, click **Add Pattern** to add glob patterns for tools that should be blocked from execution. Example patterns: - `server_id:*` - Block all tools from a specific server - `*:dangerous_tool` - Block a specific tool across all servers - `specific:tool` - Block a single tool on a specific server > 📝 **NOTE** > > The orchestrator’s own tools are blocked by default to prevent recursive execution. 3. Click **Save**. ## [](#verify-your-setup)Verify your setup After completing the setup, verify that the gateway is working correctly: ### [](#test-the-gateway-endpoint)Test the gateway endpoint ```bash curl ${GATEWAY_ENDPOINT}/models \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ``` Expected result: List of enabled models. ### [](#send-a-test-request)Send a test request ```bash curl ${GATEWAY_ENDPOINT}/chat/completions \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.2-mini", "messages": [{"role": "user", "content": "Hello, AI Gateway!"}], "max_tokens": 50 }' ``` Expected result: Successful completion response. ### [](#check-the-gateway-overview)Check the gateway overview 1. Navigate to **Gateways** → Select your gateway → **Overview**. 2. Check the aggregate metrics to verify your test request was processed: - Total Requests: Should have incremented - Total Tokens: Should show tokens consumed - Total Cost: Should show estimated cost ## [](#share-access-with-users)Share access with users Now that your gateway is configured, share access with users (builders): 1. Provide the **Gateway Endpoint** (for example, `[https://example/gateways/gw_abc123/v1](https://example/gateways/gw_abc123/v1)`) 2. Share API credentials (Redpanda Cloud tokens with appropriate permissions) 3. (Optional) Document available models and any routing policies 4. (Optional) Share rate limits and budget information Users can then discover and connect to the gateway using the information provided. See [Discover Available Gateways](../../builders/discover-gateways/) for user documentation. ## [](#next-steps)Next steps **Configure and optimize:** - [CEL Routing Cookbook](../../cel-routing-cookbook/) - Advanced routing patterns --- # Page 22: Connect Your Agent **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/builders/connect-your-agent.md --- # Connect Your Agent --- title: Connect Your Agent latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/builders/connect-your-agent page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/builders/connect-your-agent.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/builders/connect-your-agent.adoc description: Integrate your AI agent or application with Redpanda Agentic Data Plan for unified LLM access. page-topic-type: how-to personas: app_developer learning-objective-1: Configure your application to use AI Gateway with OpenAI-compatible SDKs learning-objective-2: Make LLM requests through the gateway and handle responses appropriately learning-objective-3: Validate your integration end-to-end page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This guide shows you how to connect your [AI agent](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) or application to Redpanda Agentic Data Plan. This is also called "Bring Your Own Agent" (BYOA). You’ll configure your client SDK, make your first request, and validate the integration. After completing this guide, you will be able to: - Configure your application to use AI Gateway with OpenAI-compatible SDKs - Make LLM requests through the gateway and handle responses appropriately - Validate your integration end-to-end ## [](#prerequisites)Prerequisites - You have discovered an available gateway and noted its Gateway ID and endpoint. If not, see [Discover Available Gateways](../discover-gateways/). - You have a service account with OIDC client credentials. See [Authentication](../../../../security/cloud-authentication/). - You have a development environment with your chosen programming language. ## [](#integration-overview)Integration overview Connecting to AI Gateway requires two configuration changes: 1. **Change the base URL**: Point to the gateway endpoint instead of the provider’s API. The gateway ID is embedded in the endpoint URL. 2. **Add authentication**: Use an OIDC access token from your service account instead of provider API keys. ## [](#authenticate-with-oidc)Authenticate with OIDC AI Gateway uses OIDC through service accounts that can be used as a `client_credentials` grant to authenticate and exchange for access and ID tokens. ### [](#create-a-service-account)Create a service account 1. In the Redpanda Cloud UI, go to [**Organization IAM** > **Service account**](https://cloud.redpanda.com/organization-iam?tab=service-accounts). 2. Create a new service account and note the **Client ID** and **Client Secret**. For details, see [Authenticate to the Cloud API](../../../../security/cloud-authentication/#authenticate-to-the-cloud-api). ### [](#configure-your-oidc-client)Configure your OIDC client Use the following OIDC configuration: | Parameter | Value | | --- | --- | | Discovery URL | https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration | | Token endpoint | https://auth.prd.cloud.redpanda.com/oauth/token | | Audience | cloudv2-production.redpanda.cloud | | Grant type | client_credentials | The discovery URL returns OIDC metadata, including the token endpoint and other configuration details. Use an OIDC client library that supports metadata discovery (such as `openid-client` for Node.js) so that endpoints are resolved automatically. If your library does not support discovery, you can fetch the discovery URL directly and extract the required endpoints from the JSON response. #### cURL ```bash AUTH_TOKEN=$(curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id= \ --data client_secret= \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token) ``` Replace `` and `` with your service account credentials. #### Python (authlib) ```python from authlib.integrations.requests_client import OAuth2Session client = OAuth2Session( client_id="", client_secret="", ) # Discover token endpoint from OIDC metadata import requests metadata = requests.get( "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration" ).json() token_endpoint = metadata["token_endpoint"] token = client.fetch_token( token_endpoint, grant_type="client_credentials", audience="cloudv2-production.redpanda.cloud", ) access_token = token["access_token"] ``` This example performs a one-time token fetch. For automatic token renewal on subsequent requests, pass `token_endpoint` to the `OAuth2Session` constructor. Note that for `client_credentials` grants, `authlib` obtains a new token rather than using a refresh token. #### Node.js (openid-client) ```javascript import { Issuer } from 'openid-client'; const issuer = await Issuer.discover( 'https://auth.prd.cloud.redpanda.com' ); const client = new issuer.Client({ client_id: '', client_secret: '', }); const tokenSet = await client.grant({ grant_type: 'client_credentials', audience: 'cloudv2-production.redpanda.cloud', }); const accessToken = tokenSet.access_token; ``` ### [](#make-authenticated-requests)Make authenticated requests Requests require two headers: - `Authorization: Bearer ` - your OIDC access token - `rp-aigw-id: ` - your AI Gateway ID Set these environment variables for consistent configuration: ```bash export REDPANDA_GATEWAY_URL="" export REDPANDA_GATEWAY_ID="" ``` #### Python (OpenAI SDK) ```python import os from openai import OpenAI # Configure client to use AI Gateway with OIDC token client = OpenAI( base_url=os.getenv("REDPANDA_GATEWAY_URL"), api_key=access_token, # OIDC access token from Step 2 ) # Make a request response = client.chat.completions.create( model="openai/gpt-5.2-mini", # Note: vendor/model_id format messages=[{"role": "user", "content": "Hello, AI Gateway!"}], max_tokens=100 ) print(response.choices[0].message.content) ``` #### Python (Anthropic SDK) The Anthropic SDK can also route through AI Gateway using the OpenAI-compatible endpoint: ```python import os from anthropic import Anthropic client = Anthropic( base_url=os.getenv("REDPANDA_GATEWAY_URL"), api_key=access_token, # OIDC access token from Step 2 ) # Make a request message = client.messages.create( model="anthropic/claude-sonnet-4.5", max_tokens=100, messages=[{"role": "user", "content": "Hello, AI Gateway!"}] ) print(message.content[0].text) ``` #### Node.js (OpenAI SDK) ```javascript import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: process.env.REDPANDA_GATEWAY_URL, apiKey: accessToken, // OIDC access token from Step 2 }); // Make a request const response = await openai.chat.completions.create({ model: 'openai/gpt-5.2-mini', messages: [{ role: 'user', content: 'Hello, AI Gateway!' }], max_tokens: 100 }); console.log(response.choices[0].message.content); ``` #### cURL ```bash curl ${REDPANDA_GATEWAY_URL}/chat/completions \ -H "Authorization: Bearer ${AUTH_TOKEN}" \ -H "Content-Type: application/json" \ -H "rp-aigw-id: ${REDPANDA_GATEWAY_ID}" \ -d '{ "model": "openai/gpt-5.2-mini", "messages": [{"role": "user", "content": "Hello, AI Gateway!"}], "max_tokens": 100 }' ``` ### [](#token-lifecycle-management)Token lifecycle management > ❗ **IMPORTANT** > > Your agent is responsible for refreshing tokens before they expire. OIDC access tokens have a limited time-to-live (TTL), determined by the identity provider, and are not automatically renewed by the AI Gateway. Check the `expires_in` field in the token response for the exact duration. - Proactively refresh tokens at approximately 80% of the token’s TTL to avoid failed requests. - `authlib` (Python) can handle token renewal automatically when you pass `token_endpoint` to the `OAuth2Session` constructor. For `client_credentials` grants, it obtains a new token rather than using a refresh token. - For other languages, cache the token and its expiry time, then request a new token before the current one expires. ## [](#model-naming-convention)Model naming convention When making requests through AI Gateway, use the `vendor/model_id` format for the model parameter: - `openai/gpt-5.2` - `openai/gpt-5.2-mini` - `anthropic/claude-sonnet-4.5` - `anthropic/claude-opus-4.6` This format tells AI Gateway which provider to route the request to. For example: ```python # Route to OpenAI response = client.chat.completions.create( model="openai/gpt-5.2", messages=[...] ) # Route to Anthropic (same client, different model) response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[...] ) ``` ## [](#handle-responses)Handle responses Responses from AI Gateway follow the OpenAI API format: ```python response = client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "Explain AI Gateway"}], max_tokens=200 ) # Access the response message_content = response.choices[0].message.content finish_reason = response.choices[0].finish_reason # 'stop', 'length', etc. # Token usage prompt_tokens = response.usage.prompt_tokens completion_tokens = response.usage.completion_tokens total_tokens = response.usage.total_tokens print(f"Response: {message_content}") print(f"Tokens: {prompt_tokens} prompt + {completion_tokens} completion = {total_tokens} total") ``` ## [](#handle-errors)Handle errors AI Gateway returns standard HTTP status codes: ```python from openai import OpenAI, OpenAIError client = OpenAI( base_url=os.getenv("REDPANDA_GATEWAY_URL"), api_key=access_token, # OIDC access token ) try: response = client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content) except OpenAIError as e: if e.status_code == 400: print("Bad request - check model name and parameters") elif e.status_code == 401: print("Authentication failed - check OIDC token") elif e.status_code == 404: print("Model not found - check available models") elif e.status_code == 429: print("Rate limit exceeded - slow down requests") elif e.status_code >= 500: print("Gateway or provider error - retry with exponential backoff") else: print(f"Error: {e}") ``` Common error codes: - **400**: Bad request (invalid parameters, malformed JSON) - **401**: Authentication failed (invalid or expired OIDC token) - **403**: Forbidden (no access to this gateway) - **404**: Model not found (model not enabled in gateway) - **429**: Rate limit exceeded (too many requests) - **500/502/503**: Server error (gateway or provider issue) ## [](#streaming-responses)Streaming responses AI Gateway supports streaming for real-time token generation: ```python response = client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "Write a short poem"}], stream=True # Enable streaming ) # Process chunks as they arrive for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) print() # New line after streaming completes ``` ## [](#switch-between-providers)Switch between providers One of AI Gateway’s key benefits is easy provider switching without code changes: ```python # Try OpenAI response = client.chat.completions.create( model="openai/gpt-5.2", messages=[{"role": "user", "content": "Explain quantum computing"}] ) # Try Anthropic (same code, different model) response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[{"role": "user", "content": "Explain quantum computing"}] ) ``` Compare responses, latency, and cost to determine the best model for your use case. ## [](#validate-your-integration)Validate your integration ### [](#test-connectivity)Test connectivity ```python import os from openai import OpenAI def test_gateway_connection(access_token): """Test basic connectivity to AI Gateway""" client = OpenAI( base_url=os.getenv("REDPANDA_GATEWAY_URL"), api_key=access_token, # OIDC access token ) try: # Simple test request response = client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "test"}], max_tokens=10 ) print("✓ Gateway connection successful") return True except Exception as e: print(f"✗ Gateway connection failed: {e}") return False if __name__ == "__main__": token = get_oidc_token() # Your OIDC token retrieval test_gateway_connection(token) ``` ### [](#test-multiple-models)Test multiple models ```python def test_models(): """Test multiple models through the gateway""" models = [ "openai/gpt-5.2-mini", "anthropic/claude-sonnet-4.5" ] for model in models: try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Say hello"}], max_tokens=10 ) print(f"✓ {model}: {response.choices[0].message.content}") except Exception as e: print(f"✗ {model}: {e}") ``` ## [](#integrate-with-ai-development-tools)Integrate with AI development tools ### Claude Code Configure Claude Code to use AI Gateway: ```bash claude mcp add --transport http redpanda-aigateway ${REDPANDA_GATEWAY_URL}/mcp \ --header "Authorization: Bearer ${AUTH_TOKEN}" ``` Or edit `~/.claude/config.json`: ```json { "mcpServers": { "redpanda-ai-gateway": { "transport": "http", "url": "/mcp", "headers": { "Authorization": "Bearer " } } } } ``` ### VS Code Continue Extension Edit `~/.continue/config.json`: ```json { "models": [ { "title": "AI Gateway - GPT-5.2", "provider": "openai", "model": "openai/gpt-5.2", "apiBase": "", "apiKey": "" } ] } ``` ### Cursor IDE 1. Open Cursor Settings (**Cursor** → **Settings** or `Cmd+,`) 2. Navigate to **AI** settings 3. Add custom OpenAI-compatible provider: ```json { "cursor.ai.providers.openai.apiBase": "" } ``` ## [](#best-practices)Best practices ### [](#use-environment-variables)Use environment variables Store configuration in environment variables, not hardcoded in code: ```python # Good base_url = os.getenv("REDPANDA_GATEWAY_URL") # Bad base_url = "https://gw.ai.panda.com" # Don't hardcode URLs or credentials ``` ### [](#implement-retry-logic)Implement retry logic Implement exponential backoff for transient errors: ```python import time from openai import OpenAI, OpenAIError def make_request_with_retry(client, max_retries=3): for attempt in range(max_retries): try: return client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "Hello"}] ) except OpenAIError as e: if e.status_code >= 500 and attempt < max_retries - 1: wait_time = 2 ** attempt # Exponential backoff print(f"Retrying in {wait_time}s...") time.sleep(wait_time) else: raise ``` ### [](#monitor-your-usage)Monitor your usage Regularly check your usage to avoid unexpected costs: ```python # Track tokens in your application total_tokens = 0 request_count = 0 for request in requests: response = client.chat.completions.create(...) total_tokens += response.usage.total_tokens request_count += 1 print(f"Total tokens: {total_tokens} across {request_count} requests") ``` ### [](#handle-rate-limits-gracefully)Handle rate limits gracefully Respect rate limits and implement backoff: ```python try: response = client.chat.completions.create(...) except OpenAIError as e: if e.status_code == 429: # Rate limited - wait and retry retry_after = int(e.response.headers.get('Retry-After', 60)) print(f"Rate limited. Waiting {retry_after}s...") time.sleep(retry_after) # Retry request ``` ## [](#troubleshooting)Troubleshooting ### [](#authentication-failed)"Authentication failed" Problem: 401 Unauthorized Solutions: - Check that your OIDC token has not expired and refresh it if necessary - Verify the audience is set to `cloudv2-production.redpanda.cloud` - Check that the service account has access to the specified gateway - Ensure the `Authorization` header is formatted correctly: `Bearer ` ### [](#model-not-found)"Model not found" Problem: 404 Model not found Solutions: - Verify the model name uses `vendor/model_id` format - Confirm the model is enabled in your gateway (contact administrator) ### [](#rate-limit-exceeded)"Rate limit exceeded" Problem: 429 Too Many Requests Solutions: - Reduce request rate - Implement exponential backoff - Contact administrator to review rate limits - Consider using a different gateway if available ### [](#connection-timeout)"Connection timeout" Problem: Request times out Solutions: - Check network connectivity to the gateway endpoint - Verify the gateway endpoint URL is correct - Check if the gateway is operational (contact administrator) - Increase client timeout if processing complex requests --- # Page 23: Discover Available Gateways **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/builders/discover-gateways.md --- # Discover Available Gateways --- title: Discover Available Gateways latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/builders/discover-gateways page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/builders/discover-gateways.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/builders/discover-gateways.adoc description: Find which AI Gateways you can access and their configurations. page-topic-type: how-to personas: app_developer learning-objective-1: List all AI Gateways you have access to and retrieve their endpoints and IDs learning-objective-2: View which models and MCP tools are available through each gateway learning-objective-3: Test gateway connectivity before integration page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-02" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). As a builder, you need to know which gateways are available to you before integrating your agent or application. This page shows you how to discover accessible gateways, understand their configurations, and verify connectivity. After reading this page, you will be able to: - List all AI Gateways you have access to and retrieve their endpoints and IDs - View which models and MCP tools are available through each gateway - Test gateway connectivity before integration ## [](#before-you-begin)Before you begin - You have a Redpanda Cloud account with access to at least one AI Gateway - You have access to the Redpanda Cloud Console or API credentials ## [](#list-your-accessible-gateways)List your accessible gateways ### Using the Console 1. Navigate to **Agentic** > **AI Gateway** > **Gateways** in the Redpanda Cloud Console. 2. Review the list of gateways you can access. For each gateway, you’ll see the gateway name, ID, endpoint URL, status, available models, and provider performance. Click the Configuration, API, MCP Tools, and Changelog tabs for additional information. ### Using the API To list gateways programmatically: ```bash curl https://api.redpanda.com/v1/gateways \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ``` Response: ```json { "gateways": [ { "id": "gw_abc123", "name": "production-gateway", "mode": "ai_hub", "endpoint": "https://gw.ai.panda.com", "status": "active", "workspace_id": "ws_xyz789", "created_at": "2025-01-15T10:30:00Z" }, { "id": "gw_def456", "name": "staging-gateway", "mode": "custom", "endpoint": "https://gw-staging.ai.panda.com", "status": "active", "workspace_id": "ws_xyz789", "created_at": "2025-01-10T08:15:00Z" } ] } ``` ## [](#understand-gateway-information)Understand gateway information Each gateway provides specific information you’ll need for integration: ### [](#gateway-endpoint)Gateway endpoint The gateway endpoint is the URL where you send all API requests. It replaces direct provider URLs (like `api.openai.com` or `api.anthropic.com`). The gateway ID is embedded directly in the endpoint URL. Example: ```bash https://example/gateways/gw_abc123/v1 ``` Your application configures this as the `base_url` in your SDK client. ### [](#available-models)Available models Each gateway exposes specific models based on administrator configuration. Models use the `vendor/model_id` format: - `openai/gpt-5.2` - `anthropic/claude-sonnet-4.5` - `openai/gpt-5.2-mini` To see which models are available through a specific gateway: ```bash curl ${GATEWAY_ENDPOINT}/models \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ``` Response: ```json { "object": "list", "data": [ { "id": "openai/gpt-5.2", "object": "model", "owned_by": "openai" }, { "id": "anthropic/claude-sonnet-4.5", "object": "model", "owned_by": "anthropic" }, { "id": "openai/gpt-5.2-mini", "object": "model", "owned_by": "openai" } ] } ``` ### [](#rate-limits-and-quotas)Rate limits and quotas Each gateway may have configured rate limits and monthly budgets. Check the console or contact your administrator to understand: - Requests per minute/hour/day - Monthly spend limits - Token usage quotas These limits help control costs and ensure fair resource allocation across teams. ### [](#mcp-tools)MCP Tools If [Model Context Protocol (MCP)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#model-context-protocol-mcp) aggregation is enabled for your gateway, you can access [tools](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-tool) from multiple [MCP servers](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server) through a single endpoint. To discover available MCP tools: ```bash curl ${GATEWAY_ENDPOINT}/mcp/tools \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ -H "rp-aigw-mcp-deferred: true" ``` With deferred loading enabled, you’ll receive search and orchestrator tools initially. You can then query for specific tools as needed. ## [](#check-gateway-availability)Check gateway availability Before integrating your application, verify that you can successfully connect to the gateway: ### [](#test-connectivity)Test connectivity ```bash curl ${GATEWAY_ENDPOINT}/models \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ -v ``` Expected result: HTTP 200 response with a list of available models. ### [](#test-a-simple-request)Test a simple request Send a minimal chat completion request to verify end-to-end functionality: ```bash curl ${GATEWAY_ENDPOINT}/chat/completions \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.2-mini", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 10 }' ``` Expected result: HTTP 200 response with a completion. ### [](#troubleshoot-connectivity-issues)Troubleshoot connectivity issues If you cannot connect to a gateway: 1. **Verify authentication**: Ensure your API token is valid and has not expired 2. **Check gateway endpoint**: Confirm the endpoint URL includes the correct gateway ID 3. **Verify endpoint URL**: Check for typos in the gateway endpoint 4. **Check permissions**: Confirm with your administrator that you have access to this gateway 5. **Review network connectivity**: Ensure your network allows outbound HTTPS connections ## [](#choose-the-right-gateway)Choose the right gateway If you have access to multiple gateways, consider which one to use based on your needs: ### [](#by-environment)By environment Organizations often create separate gateways for different environments: - Production gateway: Higher rate limits, access to all models, monitoring enabled - Staging gateway: Lower rate limits, restricted models, aggressive cost controls - Development gateway: Minimal limits, all models for experimentation Choose the gateway that matches your deployment environment. ### [](#by-team-or-project)By team or project Gateways may be organized by team or project for cost tracking and isolation: - team-ml-gateway: For machine learning team - team-product-gateway: For product team - customer-facing-gateway: For production customer workloads Use the gateway designated for your team to ensure proper cost attribution. ### [](#by-capability)By capability Different gateways may have different features enabled: - Gateway with MCP tools: Use if your agent needs to call tools - Gateway without MCP: Use for simple LLM completions - Gateway with specific models: Use if you need access to particular models ## [](#example-complete-discovery-workflow)Example: Complete discovery workflow Here’s a complete workflow to discover and validate gateway access: ```bash #!/bin/bash # Set your API token export REDPANDA_CLOUD_TOKEN="your-token-here" # Step 1: List all accessible gateways echo "=== Discovering gateways ===" curl -s https://api.redpanda.com/v1/gateways \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ | jq '.gateways[] | {name: .name, id: .id, endpoint: .endpoint}' # Step 2: Select a gateway (example) export GATEWAY_ENDPOINT="https://example/gateways/gw_abc123/v1" # Step 3: List available models echo -e "\n=== Available models ===" curl -s ${GATEWAY_ENDPOINT}/models \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ | jq '.data[] | .id' # Step 4: Test with a simple request echo -e "\n=== Testing request ===" curl -s ${GATEWAY_ENDPOINT}/chat/completions \ -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.2-mini", "messages": [{"role": "user", "content": "Say hello"}], "max_tokens": 10 }' \ | jq '.choices[0].message.content' echo -e "\n=== Gateway validated successfully ===" ``` ## [](#next-steps)Next steps - [Connect Your Agent](../connect-your-agent/) - Integrate your application --- # Page 24: CEL Routing Cookbook **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/cel-routing-cookbook.md --- # CEL Routing Cookbook --- title: CEL Routing Cookbook latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/cel-routing-cookbook page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/cel-routing-cookbook.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/cel-routing-cookbook.adoc description: CEL routing cookbook for Redpanda AI Gateway with common patterns, examples, and best practices. page-topic-type: cookbook personas: app_developer, platform_admin learning-objective-1: Write CEL expressions to route requests based on user tier or custom headers learning-objective-2: Test CEL routing logic using the UI editor or test requests learning-objective-3: Troubleshoot common CEL errors using safe patterns page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-02" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Redpanda AI Gateway uses CEL (Common Expression Language) for dynamic request routing. CEL expressions evaluate request properties (headers, body, context) and determine which model or provider should handle each request. CEL enables: - User-based routing (free vs premium tiers) - Content-based routing (by prompt topic, length, complexity) - Environment-based routing (staging vs production models) - Cost controls (reject expensive requests in test environments) - A/B testing (route percentage of traffic to new models) - Geographic routing (by region header) - Custom business logic (any condition you can express) ## [](#cel-basics)CEL basics ### [](#what-is-cel)What is CEL? CEL (Common Expression Language) is a non-Turing-complete expression language designed for fast, safe evaluation. It’s used by Google (Firebase, Cloud IAM), Kubernetes, Envoy, and other systems. Key properties: - Safe: Cannot loop infinitely or access system resources - Fast: Evaluates in microseconds - Readable: Similar to Python/JavaScript expressions - Type-safe: Errors caught at configuration time, not runtime ### [](#cel-syntax-primer)CEL syntax primer Comparison operators: ```cel == // equal != // Not equal < // Less than > // Greater than <= // Less than or equal >= // Greater than or equal ``` Logical operators: ```cel && // AND || // OR ! // NOT ``` Ternary operator (most common pattern): ```cel condition ? value_if_true : value_if_false ``` Functions: ```cel .size() // Length of string or array .contains("text") // String contains substring .startsWith("x") // String starts with .endsWith("x") // String ends with .matches("regex") // Regex match has(field) // Check if field exists ``` Examples: ```cel // Simple comparison request.headers["tier"] == "premium" // Ternary (if-then-else) request.headers["tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" // Logical AND request.headers["tier"] == "premium" && request.headers["region"] == "us" // String contains request.body.messages[0].content.contains("urgent") // Size check request.body.messages.size() > 10 ``` ## [](#request-object-schema)Request object schema CEL expressions evaluate against the `request` object, which contains: ### [](#request-headers-mapstring-string)`request.headers` (map) All HTTP headers (lowercase keys). ```cel request.headers["x-user-tier"] // Custom header request.headers["x-customer-id"] // Custom header request.headers["user-agent"] // Standard header request.headers["x-request-id"] // Standard header ``` > 📝 **NOTE** > > Header names are case-insensitive in HTTP, but CEL requires lowercase keys. ### [](#request-body-object)`request.body` (object) The JSON request body (for `/chat/completions`). ```cel request.body.model // String: Requested model request.body.messages // Array: Conversation messages request.body.messages[0].role // String: "system", "user", "assistant" request.body.messages[0].content // String: Message content request.body.messages.size() // Int: Number of messages request.body.max_tokens // Int: Max completion tokens (if set) request.body.temperature // Float: Temperature (if set) request.body.stream // Bool: Streaming enabled (if set) ``` > 📝 **NOTE** > > Fields are optional. Use `has()` to check existence: ```cel has(request.body.max_tokens) ? request.body.max_tokens : 1000 ``` ### [](#request-path-string)`request.path` (string) The request path. ```cel request.path == "/v1/chat/completions" request.path.startsWith("/v1/") ``` ### [](#request-method-string)`request.method` (string) The HTTP method. ```cel request.method == "POST" ``` ## [](#cel-routing-patterns)CEL routing patterns Each pattern follows this structure: - When to use: Scenario description - Expression: CEL code - What happens: Routing behavior - Verify: How to test - Cost/performance impact: Implications ### [](#tier-based-routing)Tier-based routing When to use: Different user tiers (free, pro, enterprise) should get different model quality Expression: ```cel request.headers["x-user-tier"] == "enterprise" ? "openai/gpt-5.2" : request.headers["x-user-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : "openai/gpt-5.2-mini" ``` What happens: - Enterprise users → GPT-5.2 (best quality) - Pro users → Claude Sonnet 4.5 (balanced) - Free users → GPT-5.2-mini (cost-effective) Verify: ```python # Test enterprise response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Test"}], extra_headers={"x-user-tier": "enterprise"} ) # Check logs: Should route to openai/gpt-5.2 # Test free response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Test"}], extra_headers={"x-user-tier": "free"} ) # Check logs: Should route to openai/gpt-5.2-mini ``` Cost impact: - Enterprise: ~$5.00 per 1K requests - Pro: ~$3.50 per 1K requests - Free: ~$0.50 per 1K requests Use case: SaaS product with tiered pricing where model quality is a differentiator ### [](#environment-based-routing)Environment-based routing When to use: Prevent staging from using expensive models Expression: ```cel request.headers["x-environment"] == "production" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" ``` What happens: - Production → GPT-5.2 (best quality) - Staging/dev → GPT-5.2-mini (10x cheaper) Verify: ```python # Set environment header response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Test"}], extra_headers={"x-environment": "staging"} ) # Check logs: Should route to gpt-5.2-mini ``` Cost impact: - Prevents staging from inflating costs - Example: Staging with 100K test requests/day - GPT-5.2: $500/day ($15K/month) - GPT-5.2-mini: $50/day ($1.5K/month) - **Savings: $13.5K/month** Use case: Protect against runaway staging costs ### [](#content-length-guard-rails)Content-length guard rails When to use: Block or downgrade long prompts to prevent cost spikes Expression (Downgrade): ```cel request.body.messages.size() > 10 || request.body.max_tokens > 4000 ? "openai/gpt-5.2-mini" // Cheaper model : "openai/gpt-5.2" // Normal model ``` What happens: - Long conversations → Downgraded to cheaper model - Short conversations → Premium model Verify: ```python # Test rejection response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": f"Message {i}"} for i in range(15)], max_tokens=5000 ) # Should return 400 error (rejected) # Test normal response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Short message"}], max_tokens=100 ) # Should route to gpt-5.2 ``` Cost impact: - Prevents unexpected bills from verbose prompts - Example: Block requests >10K tokens (would cost $0.15 each) Use case: Staging cost controls, prevent prompt injection attacks that inflate token usage ### [](#topic-based-routing)Topic-based routing When to use: Route different question types to specialized models Expression: ```cel request.body.messages[0].content.contains("code") || request.body.messages[0].content.contains("debug") || request.body.messages[0].content.contains("programming") ? "openai/gpt-5.2" // Better at code : "anthropic/claude-sonnet-4.5" // Better at general writing ``` What happens: - Coding questions → GPT-5.2 (optimized for code) - General questions → Claude Sonnet (better prose) Verify: ```python # Test code question response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Debug this Python code: ..."}] ) # Check logs: Should route to gpt-5.2 # Test general question response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Write a blog post about AI"}] ) # Check logs: Should route to claude-sonnet-4.5 ``` Cost impact: - Optimize model selection for task type - Could improve quality without increasing costs Use case: Multi-purpose chatbot with both coding and general queries ### [](#geographicregional-routing)Geographic/regional routing When to use: Route by user region to different providers or gateways for compliance or latency optimization Expression: ```cel request.headers["x-user-region"] == "eu" ? "anthropic/claude-sonnet-4.5" // EU traffic to Anthropic : "openai/gpt-5.2" // Other traffic to OpenAI ``` What happens: - EU users → Anthropic (for EU data processing requirements) - Other users → OpenAI (default provider) > 📝 **NOTE** > > To achieve true data residency, configure separate gateways per region with provider pools that meet your compliance requirements. Verify: ```python response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Test"}], extra_headers={"x-user-region": "eu"} ) # Check logs: Should route to anthropic/claude-sonnet-4.5 ``` Cost impact: Varies by provider pricing Use case: GDPR compliance, data residency requirements ### [](#customer-specific-routing)Customer-specific routing When to use: Different customers have different model access (enterprise features) Expression: ```cel request.headers["x-customer-id"] == "customer_vip_123" ? "anthropic/claude-opus-4.6" // Most expensive, best quality : "anthropic/claude-sonnet-4.5" // Standard ``` What happens: - VIP customer → Best model - Standard customers → Normal model Verify: ```python response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Test"}], extra_headers={"x-customer-id": "customer_vip_123"} ) # Check logs: Should route to claude-opus-4 ``` Cost impact: - VIP: ~$7.50 per 1K requests - Standard: ~$3.50 per 1K requests Use case: Enterprise contracts with premium model access ### [](#complexity-based-routing)Complexity-based routing When to use: Route simple queries to cheap models, complex queries to expensive models Expression: ```cel request.body.messages.size() == 1 && request.body.messages[0].content.size() < 100 ? "openai/gpt-5.2-mini" // Simple, short question : "openai/gpt-5.2" // Complex or long conversation ``` What happens: - Single short message (<100 chars) → Cheap model - Multi-turn or long messages → Premium model Verify: ```python # Test simple response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[{"role": "user", "content": "Hi"}] # 2 chars ) # Check logs: Should route to gpt-5.2-mini # Test complex response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=[ {"role": "user", "content": "Long question here..." * 10}, {"role": "assistant", "content": "Response"}, {"role": "user", "content": "Follow-up"} ] ) # Check logs: Should route to gpt-5.2 ``` Cost impact: - Can reduce costs significantly if simple queries are common - Example: 50% of queries are simple, save 90% on those = 45% total savings Use case: FAQ chatbot with mix of simple lookups and complex questions ### [](#fallback-chain-multi-level)Fallback chain (multi-level) When to use: Complex fallback logic beyond simple primary/secondary Expression: ```cel request.headers["x-priority"] == "critical" ? "openai/gpt-5.2" // First choice for critical : request.headers["x-user-tier"] == "premium" ? "anthropic/claude-sonnet-4.5" // Second choice for premium : "openai/gpt-5.2-mini" // Default for everyone else ``` What happens: - Critical requests → Always GPT-5.2 - Premium non-critical → Claude Sonnet - Everyone else → GPT-5.2-mini Verify: Test with different header combinations Cost impact: Ensures SLA for critical requests while optimizing costs elsewhere Use case: Production systems with SLA requirements ## [](#advanced-cel-patterns)Advanced CEL patterns ### [](#default-values-with-has)Default values with `has()` Problem: Field might not exist in request Expression: ```cel has(request.body.max_tokens) && request.body.max_tokens > 2000 ? "openai/gpt-5.2" // Long response expected : "openai/gpt-5.2-mini" // Short response ``` What happens: Safely checks if `max_tokens` exists before comparing ### [](#multiple-conditions-with-parentheses)Multiple conditions with parentheses Expression: ```cel (request.headers["x-user-tier"] == "premium" || request.headers["x-customer-id"] == "vip_123") && request.headers["x-environment"] == "production" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" ``` What happens: Premium users OR VIP customer, AND production → GPT-5.2 ### [](#regex-matching)Regex matching Expression: ```cel request.body.messages[0].content.matches("(?i)(urgent|asap|emergency)") ? "openai/gpt-5.2" // Route urgent requests to best model : "openai/gpt-5.2-mini" ``` What happens: Messages containing "urgent", "ASAP", or "emergency" (case-insensitive) → GPT-5.2 ### [](#string-array-contains)String array contains Expression: ```cel ["customer_1", "customer_2", "customer_3"].exists(c, c == request.headers["x-customer-id"]) ? "openai/gpt-5.2" // Whitelist of customers : "openai/gpt-5.2-mini" ``` What happens: Only specific customers get premium model ## [](#test-cel-expressions)Test CEL expressions ### [](#option-1-cel-editor-in-ui-if-available)Option 1: CEL editor in UI (if available) 1. Navigate to **Agentic** → **AI Gateway** → **Gateways** → **Routing Rules** 2. Enter CEL expression 3. Click "Test" 4. Input test headers/body 5. View evaluated result ### [](#option-2-send-test-requests)Option 2: Send test requests ```python def test_cel_routing(headers, messages): """Test CEL routing with specific headers and messages""" response = client.chat.completions.create( model="openai/gpt-5.2", # CEL routing rules override model selection messages=messages, extra_headers=headers, max_tokens=10 # Keep it cheap ) # Check logs to see which model was used print(f"Headers: {headers}") print(f"Routed to: {response.model}") # Test tier-based routing test_cel_routing( {"x-user-tier": "premium"}, [{"role": "user", "content": "Test"}] ) test_cel_routing( {"x-user-tier": "free"}, [{"role": "user", "content": "Test"}] ) ``` ## [](#common-cel-errors)Common CEL errors ### [](#error-unknown-field)Error: "unknown field" Symptom: ```text Error: Unknown field 'request.headers.x-user-tier' ``` Cause: Wrong syntax (dot notation instead of bracket notation for headers) Fix: ```cel // Wrong request.headers.x-user-tier // Correct request.headers["x-user-tier"] ``` ### [](#error-type-mismatch)Error: "type mismatch" Symptom: ```text Error: Type mismatch: expected bool, got string ``` Cause: Forgot comparison operator Fix: ```cel // Wrong (returns string) request.headers["tier"] // Correct (returns bool) request.headers["tier"] == "premium" ``` ### [](#error-field-does-not-exist)Error: "field does not exist" Symptom: ```text Error: No such key: max_tokens ``` Cause: Accessing field that doesn’t exist in request Fix: ```cel // Wrong (crashes if max_tokens not in request) request.body.max_tokens > 1000 // Correct (checks existence first) has(request.body.max_tokens) && request.body.max_tokens > 1000 ``` ### [](#error-index-out-of-bounds)Error: "index out of bounds" Symptom: ```text Error: Index 0 out of bounds for array of size 0 ``` Cause: Accessing array element that doesn’t exist Fix: ```cel // Wrong (crashes if messages empty) request.body.messages[0].content.contains("test") // Correct (checks size first) request.body.messages.size() > 0 && request.body.messages[0].content.contains("test") ``` ## [](#cel-performance-considerations)CEL performance considerations ### [](#expression-complexity)Expression complexity Fast (<1ms evaluation): ```cel request.headers["tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" ``` Slower (~5-10ms evaluation): ```cel request.body.messages[0].content.matches("complex.*regex.*pattern") ``` Recommendation: Keep expressions simple. Complex regex can add latency. ### [](#number-of-evaluations)Number of evaluations Each request evaluates CEL expression once. Total latency impact: \* Simple expression: <1ms \* Complex expression: ~5-10ms **Acceptable for most use cases.** ## [](#cel-function-reference)CEL function reference ### [](#string-functions)String functions | Function | Description | Example | | --- | --- | --- | | size() | String length | "hello".size() == 5 | | contains(s) | String contains | "hello".contains("ell") | | startsWith(s) | String starts with | "hello".startsWith("he") | | endsWith(s) | String ends with | "hello".endsWith("lo") | | matches(regex) | Regex match | "hello".matches("h.*o") | ### [](#array-functions)Array functions | Function | Description | Example | | --- | --- | --- | | size() | Array length | [1,2,3].size() == 3 | | exists(x, cond) | Any element matches | [1,2,3].exists(x, x > 2) | | all(x, cond) | All elements match | [1,2,3].all(x, x > 0) | ### [](#utility-functions)Utility functions | Function | Description | Example | | --- | --- | --- | | has(field) | Field exists | has(request.body.max_tokens) | ## [](#next-steps)Next steps - **Apply CEL routing**: See the gateway configuration options available in the Redpanda Cloud console. --- # Page 25: AI Gateway Architecture **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/gateway-architecture.md --- # AI Gateway Architecture --- title: AI Gateway Architecture latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/gateway-architecture page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/gateway-architecture.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/gateway-architecture.adoc description: Technical architecture of Redpanda AI Gateway, including how the control plane, data plane, and observability plane deliver high availability, cost governance, and multi-tenant isolation. page-topic-type: concept personas: app_developer, platform_admin learning-objective-1: Describe the three architectural planes of AI Gateway learning-objective-2: Explain the request lifecycle through policy evaluation stages learning-objective-3: Identify supported providers, features, and current limitations page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). This page provides technical details about AI Gateway’s architecture, request processing, and capabilities. For an overview of AI Gateway, see [What is an AI Gateway?](../what-is-ai-gateway/) ## [](#architecture-overview)Architecture overview AI Gateway consists of a [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) for configuration and management, a [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) for request processing and routing, and an observability plane for monitoring and analytics. ### [](#control-plane)Control plane The control plane manages gateway configuration and policy definition: - **Workspace management**: Multi-tenant isolation with separate namespaces for different teams or environments - **Provider configuration**: Enable and configure LLM providers (such as OpenAI and Anthropic) - **Gateway creation**: Define gateways with specific routing rules, budgets, and rate limits - **Policy definition**: Create CEL-based routing policies, spend limits, and rate limits - **MCP server registration**: Configure which MCP servers are available to agents ### [](#data-plane)Data plane The data plane handles all runtime request processing: - **Request ingestion**: Accept requests via OpenAI-compatible API endpoints - **Authentication**: Validate API keys and gateway access - **Policy evaluation**: Apply rate limits, spend limits, and routing policies - **Provider pool management**: Select primary or fallback providers based on availability - **MCP proxy**: Aggregate tools from multiple MCP servers with deferred loading - **Response transformation**: Normalize provider-specific responses to OpenAI format - **Metrics collection**: Record token usage, latency, and cost for every request ### [](#observability-plane)Observability plane The observability plane provides monitoring and analytics: - **Request logs**: Store full request/response history with prompt and completion content - **Metrics aggregation**: Calculate token usage, costs, latency percentiles, and error rates - **Dashboard UI**: Display real-time and historical analytics per gateway, model, or provider - **Cost tracking**: Estimate spend based on provider pricing and token consumption ## [](#request-lifecycle)Request lifecycle When a request flows through AI Gateway, it passes through several policy and routing stages before reaching the LLM provider. Understanding this lifecycle helps you configure policies effectively and troubleshoot issues: 1. Application sends request to gateway endpoint 2. Gateway authenticates request 3. Rate limit policy evaluates (allow/deny) 4. Spend limit policy evaluates (allow/deny) 5. Routing policy evaluates (which model/provider to use) 6. Provider pool selects backend (primary/fallback) 7. Request forwarded to LLM provider 8. Response returned to application 9. Request logged with tokens, cost, latency, status Each policy evaluation happens synchronously in the request path. If rate limits or spend limits reject the request, the gateway returns an error immediately without calling the LLM provider, which helps you control costs. ### [](#mcp-tool-request-lifecycle)MCP tool request lifecycle For MCP tool requests, the lifecycle differs slightly to support deferred tool loading: 1. Application discovers tools via `/mcp` endpoint 2. Gateway aggregates tools from approved MCP servers 3. Application receives search + orchestrator tools (deferred loading) 4. Application invokes specific tool 5. Gateway routes to appropriate MCP server 6. Tool execution result returned 7. Request logged with execution time, status The gateway only loads and exposes specific tools when requested, which dramatically reduces the token overhead compared to loading all tools upfront. ## [](#next-steps)Next steps - [AI Gateway Quickstart](../gateway-quickstart/): Route your first request through AI Gateway - [MCP Gateway](../mcp-aggregation-guide/): Configure MCP server aggregation for AI agents --- # Page 26: AI Gateway Quickstart **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/gateway-quickstart.md --- # AI Gateway Quickstart --- title: AI Gateway Quickstart latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/gateway-quickstart page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/gateway-quickstart.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/gateway-quickstart.adoc description: Get started with AI Gateway. Configure providers, create your first gateway with failover and budget controls, and route your first request. page-topic-type: quickstart personas: evaluator, app_developer, platform_admin learning-objective-1: Enable an LLM provider and create your first gateway learning-objective-2: Route your first request through AI Gateway and verify it works learning-objective-3: Verify request routing and token usage in the gateway overview page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-02" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Redpanda AI Gateway keeps your AI-powered applications running and your costs under control by routing all LLM and MCP traffic through a single managed layer with automatic failover and budget enforcement. This quickstart walks you through configuring your first gateway and routing requests through it. ## [](#prerequisites)Prerequisites Before starting, ensure you have: - Access to the AI Gateway UI (provided by your administrator) - Admin permissions to configure providers and models - API key for at least one LLM provider (OpenAI, Anthropic, or Google AI) - Python 3.8+, Node.js 18+, or cURL (for testing) ## [](#configure-a-provider)Configure a provider Providers represent upstream LLM services and their associated credentials. Providers are disabled by default and must be enabled explicitly. 1. Navigate to **Agentic** > **AI Gateway** > **Providers**. 2. Select a provider (for example, OpenAI, Anthropic, Google AI). 3. On the Configuration tab, click **Add configuration** and enter your API key. 4. Verify the provider status shows "Active". ## [](#enable-models)Enable models After enabling a provider, enable the specific models you want to make available through your gateways. 1. Navigate to **Agentic** > **AI Gateway** > **Models**. 2. Enable the models you want to use (for example, `gpt-5.2-mini`, `claude-sonnet-4.5`, `claude-opus-4.6`). 3. Verify the models appear as "Enabled" in the model catalog. > 💡 **TIP** > > Different providers have different reliability and cost characteristics. When choosing models, consider your use case requirements for quality, speed, and cost. ### [](#model-naming-convention)Model naming convention Requests through AI Gateway must use the `vendor/model_id` format. For example: - OpenAI models: `openai/gpt-5.2`, `openai/gpt-5.2-mini` - Anthropic models: `anthropic/claude-sonnet-4.5`, `anthropic/claude-opus-4.6` - Google Gemini models: `google/gemini-2.0-flash`, `google/gemini-2.0-pro` This format allows the gateway to route requests to the correct provider. ## [](#create-a-gateway)Create a gateway A gateway is a logical configuration boundary that defines routing policies, rate limits, spend limits, and observability scope. Common gateway patterns include the following: - Environment separation: Create separate gateways for staging and production - Team isolation: One gateway per team for budget tracking - Customer multi-tenancy: One gateway per customer for isolated policies 1. Navigate to **Agentic** > **AI Gateway** > **Gateways**. 2. Click **Create Gateway**. 3. Configure the gateway: - Display name: Choose a descriptive name (for example, `my-first-gateway`) - Workspace: Select a workspace (conceptually similar to a resource group) - Description: Add context about this gateway’s purpose - Optional metadata for documentation After creation, copy the gateway endpoint from the overview page. You’ll need this for sending requests. The gateway ID is embedded in the endpoint URL. For example: ```bash Endpoint: https://example/gateways/d633lffcc16s73ct95mg/v1 Gateway ID: d633lffcc16s73ct95mg ``` ## [](#send-your-first-request)Send your first request Now that you’ve configured a provider and created a gateway, send a test request to verify everything works. #### Python ```python from openai import OpenAI client = OpenAI( base_url="", api_key="", # Or use gateway's auth ) response = client.chat.completions.create( model="openai/gpt-5.2", # Use vendor/model format messages=[ {"role": "user", "content": "Hello!"} ], ) print(response.choices[0].message.content) ``` Expected output: ```text Hello! How can I help you today? ``` #### Node.js ```javascript import OpenAI from 'openai'; const client = new OpenAI({ baseURL: '', apiKey: '', // Or use gateway's auth }); const response = await client.chat.completions.create({ model: 'anthropic/claude-sonnet-4-5-20250929', // Use vendor/model format messages: [ { role: 'user', content: 'Hello!' } ], }); console.log(response.choices[0].message.content); ``` Expected output: ```text Hello! How can I help you today? ``` #### cURL ```bash curl /chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "openai/gpt-5.2", "messages": [ {"role": "user", "content": "Hello!"} ] }' ``` Expected output: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "model": "openai/gpt-5.2", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 9, "completion_tokens": 9, "total_tokens": 18 } } ``` ### [](#troubleshooting)Troubleshooting If your request fails, check these common issues: - 401 Unauthorized: Verify your API key is valid - 404 Not Found: Confirm the base URL matches your gateway endpoint - Model not found: Ensure the model is enabled in the model catalog and that you’re using the correct `vendor/model` format. ## [](#verify-in-the-gateway-overview)Verify in the gateway overview Confirm your request was routed through AI Gateway. 1. On the **Overview** tab, check the aggregate metrics: - **Total Requests**: Should have incremented - **Total Tokens**: Shows combined input and output tokens - **Total Cost**: Estimated spend across all requests - **Avg Latency**: Average response time in milliseconds 2. Scroll to the **Models** table to see per-model statistics: The model you used in your request should appear with its request count, token usage (input/output), estimated cost, latency, and error rate. ## [](#configure-llm-routing-optional)Configure LLM routing (optional) Configure rate limits, spend limits, and provider pools with failover. On the Gateways page, select the **LLM** tab to configure routing policies. The LLM routing pipeline represents the request lifecycle: 1. **Rate Limit**: Control request throughput (for example, 100 requests/second) 2. **Spend Limit**: Set monthly budget caps (for example, $15K/month with blocking enforcement) 3. **Provider Pools**: Define primary and fallback providers ### [](#configure-provider-pool-with-fallback)Configure provider pool with fallback For high availability, configure a fallback provider that activates when the primary fails: 1. Add a second provider (for example, Anthropic). 2. In your gateway’s **LLM** routing configuration: - **Primary pool**: OpenAI (preferred for quality) - **Fallback pool**: Anthropic (activates on rate limits, timeouts, or errors) 3. Save the configuration. The gateway automatically routes to the fallback when it detects: - Rate limit exceeded - Request timeout - 5xx server errors from primary provider ## [](#configure-mcp-tools-optional)Configure MCP tools (optional) If you’re using [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent), configure [Model Context Protocol (MCP)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#model-context-protocol-mcp) tool aggregation. On the Gateways page, select the **MCP** tab to configure tool discovery and execution. The MCP proxy aggregates multiple [MCP servers](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server) behind a single endpoint, allowing agents to discover and call [tools](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-tool) through the gateway. Configure the MCP settings: - **Display name**: Descriptive name for the provider pool - **Model**: Choose which model handles tool execution - **Load balancing**: If multiple providers are available, select a strategy (for example, round robin) ### [](#available-mcp-tools)Available MCP tools The gateway provides these built-in MCP tools: - **Data catalog API**: Query your data catalog - **Memory store**: Persistent storage for agent state - **Vector search**: Semantic search over embeddings - **MCP Orchestrator**: Built-in tool for programmatic multi-tool workflows The **MCP Orchestrator** enables agents to generate JavaScript code that calls multiple tools in a single orchestrated step, reducing round trips. For example, a workflow requiring 47 file reads can be reduced from 49 round trips to just 1. To add external tools (for example, Slack, GitHub), add their MCP server endpoints to your gateway configuration. ### [](#deferred-tool-loading)Deferred tool loading When many tools are aggregated, listing all tools upfront can consume significant tokens. With deferred tool loading, the MCP gateway initially returns only: - A tool search capability - The MCP Orchestrator Agents then search for specific tools they need, retrieving only that subset. This can reduce token usage by 80-90% when you have many tools configured. ## [](#configure-cel-routing-rule-optional)Configure CEL routing rule (optional) Use CEL (Common Expression Language) expressions to route requests dynamically based on headers, content, or other request properties. The AI Gateway uses CEL for flexible routing without code changes. Use CEL to: - Route premium users to better models - Apply different rate limits based on user tiers - Enforce policies based on request content ### [](#add-a-routing-rule)Add a routing rule In your gateway’s routing configuration: 1. Add a CEL expression to route based on user tier: ```cel # Route based on user tier header request.headers["x-user-tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" ``` 2. Save the rule. The gateway editor helps you discover available request fields (headers, path, body, and so on). ### [](#test-the-routing-rule)Test the routing rule Send requests with different headers to verify routing: **Premium user request**: ```python response = client.chat.completions.create( model="openai/gpt-5.2", # Will be routed based on CEL rule messages=[{"role": "user", "content": "Hello"}], extra_headers={"x-user-tier": "premium"} ) # Should route to gpt-5.2 (premium model) ``` **Free user request**: ```python response = client.chat.completions.create( model="openai/gpt-5.2-mini", messages=[{"role": "user", "content": "Hello"}], extra_headers={"x-user-tier": "free"} ) # Should route to gpt-5.2-mini (cost-effective model) ``` ### [](#common-cel-patterns)Common CEL patterns Route based on model family: ```cel request.body.model.startsWith("anthropic/") ``` Apply a rule to all requests: ```cel true ``` Guard for field existence: ```cel has(request.body.max_tokens) && request.body.max_tokens > 1000 ``` For more CEL examples, see [CEL Routing Cookbook](../cel-routing-cookbook/). ## [](#connect-ai-tools-to-your-gateway)Connect AI tools to your gateway The AI Gateway provides standardized endpoints that work with various AI development tools. This section shows how to configure popular tools. ### [](#mcp-endpoint)MCP endpoint If you’ve configured MCP tools in your gateway, AI agents can connect to the aggregated MCP endpoint: - **MCP endpoint URL**: `/mcp` - **Required headers**: - `Authorization: Bearer ` This endpoint aggregates all MCP servers configured in your gateway. ### [](#environment-variables)Environment variables For consistent configuration, set these environment variables: ```bash export REDPANDA_GATEWAY_URL="" export REDPANDA_API_KEY="" ``` ### [](#claude-code)Claude Code Configure Claude Code using HTTP transport for the MCP connection: ```bash claude mcp add --transport http redpanda-aigateway /mcp \ --header "Authorization: Bearer " ``` Alternatively, edit `~/.claude/config.json`: ```json { "mcpServers": { "redpanda-ai-gateway": { "transport": "http", "url": "/mcp", "headers": { "Authorization": "Bearer " } } }, "apiProviders": { "redpanda": { "baseURL": "" } } } ``` ### [](#continue-dev)Continue.dev Edit your Continue config file (`~/.continue/config.json`): ```json { "models": [ { "title": "Redpanda AI Gateway - GPT-5.2", "provider": "openai", "model": "openai/gpt-5.2", "apiBase": "", "apiKey": "" }, { "title": "Redpanda AI Gateway - Claude", "provider": "anthropic", "model": "anthropic/claude-sonnet-4.5", "apiBase": "", "apiKey": "" }, { "title": "Redpanda AI Gateway - Gemini", "provider": "google", "model": "google/gemini-2.0-flash", "apiBase": "", "apiKey": "" } ] } ``` ### [](#cursor-ide)Cursor IDE Configure Cursor in Settings (**Cursor** → **Settings** or `Cmd+,`): ```json { "cursor.ai.providers.openai.apiBase": "" } ``` ### [](#custom-applications)Custom applications For custom applications using OpenAI, Anthropic, or Google Gemini SDKs: **Python with OpenAI SDK**: ```python from openai import OpenAI client = OpenAI( base_url="", api_key="", ) ``` **Python with Anthropic SDK**: ```python from anthropic import Anthropic client = Anthropic( base_url="", api_key="", ) ``` **Node.js with OpenAI SDK**: ```javascript import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: '', apiKey: process.env.REDPANDA_API_KEY, }); ``` ## [](#next-steps)Next steps Explore advanced AI Gateway features: - [CEL Routing Cookbook](../cel-routing-cookbook/): Advanced CEL routing patterns for traffic distribution and cost optimization - [MCP Gateway](../mcp-aggregation-guide/): Configure MCP server aggregation and deferred tool loading Learn about the architecture: - [AI Gateway Architecture](../gateway-architecture/): Technical architecture, request lifecycle, and deployment models - [What is an AI Gateway?](../what-is-ai-gateway/): Problems AI Gateway solves and common use cases --- # Page 27: MCP Gateway **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/mcp-aggregation-guide.md --- # MCP Gateway --- title: MCP Gateway latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/mcp-aggregation-guide page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/mcp-aggregation-guide.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/mcp-aggregation-guide.adoc description: Learn how to use the MCP Gateway to aggregate MCP servers, configure deferred tool loading, create orchestrator workflows, and manage security. page-topic-type: guide personas: app_developer, platform_admin learning-objective-1: Configure MCP aggregation with deferred tool loading to reduce token costs learning-objective-2: Write orchestrator workflows to reduce multi-step interactions learning-objective-3: Manage approved MCP servers with security controls and audit trails page-git-created-date: "2026-02-18" page-git-modified-date: "2026-03-02" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). The MCP Gateway provides [Model Context Protocol (MCP)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#model-context-protocol-mcp) aggregation, allowing [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) to access [tools](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-tool) from multiple MCP servers through a single unified endpoint. This eliminates the need for agents to manage multiple MCP connections and significantly reduces token costs through deferred tool loading. MCP Gateway benefits: - Single endpoint: One MCP endpoint aggregates all approved MCP servers - Token reduction: Fewer tokens through deferred tool loading (depending on configuration) - Centralized governance: Admin-approved MCP servers only - Orchestration: JavaScript-based orchestrator reduces multi-step round trips - Security: Controlled tool execution environment ## [](#what-is-mcp)What is MCP? Model Context Protocol (MCP) is a standard for exposing tools (functions) that AI agents can discover and invoke. MCP servers provide tools like: - Database queries - File system operations - API integrations (CRM, payment, analytics) - Search (web, vector, enterprise) - Code execution - Workflow automation | Without AI Gateway | With AI Gateway | | --- | --- | | Agent connects to each MCP server individually | Agent connects to gateway’s unified /mcp endpoint | | Agent loads ALL tools from ALL servers upfront (high token cost) | Gateway aggregates tools from approved MCP servers | | No centralized governance or security | Deferred loading: Only search + orchestrator tools sent initially | | Complex configuration | Agent queries for specific tools when needed (token savings) | | | Centralized governance and observability | ## [](#architecture)Architecture ```text ┌─────────────────┐ │ AI Agent │ │ (Claude, GPT) │ └────────┬────────┘ │ │ 1. Discover tools with /mcp endpoint │ 2. Invoke specific tool │ ┌────────▼────────────────────────────────┐ │ AI Gateway (MCP Aggregator) │ │ │ │ ┌─────────────────────────────────┐ │ │ │ Deferred tool loading │ │ │ │ (Send search + orchestrator │ │ │ │ initially, defer others) │ │ │ └─────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ Orchestrator (JavaScript) │ │ │ │ (Reduce round trips for │ │ │ │ multi-step workflows) │ │ │ └─────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ Approved MCP Server Registry │ │ │ │ (Admin-controlled) │ │ │ └─────────────────────────────────┘ │ └────────┬────────────────────────────────┘ │ │ Routes to appropriate MCP server │ ┌────▼─────┬──────────┬─────────┐ │ │ │ │ ┌───▼────┐ ┌──▼─────┐ ┌──▼──────┐ ┌▼──────┐ │ MCP │ │ MCP │ │ MCP │ │ MCP │ │Database│ │Filesystem│ │ Slack │ │Search │ │Server │ │ Server │ │ Server │ │Server │ └────────┘ └────────┘ └─────────┘ └───────┘ ``` ## [](#mcp-request-lifecycle)MCP request lifecycle ### [](#tool-discovery-initial-connection)Tool discovery (initial connection) Agent request: ```http GET /mcp/tools Headers: Authorization: Bearer {TOKEN} rp-aigw-mcp-deferred: true # Enable deferred loading ``` Gateway response (with deferred loading): ```json { "tools": [ { "name": "search_tools", "description": "Query available tools by keyword or category", "input_schema": { "type": "object", "properties": { "query": {"type": "string"}, "category": {"type": "string"} } } }, { "name": "orchestrator", "description": "Execute multi-step workflows with JavaScript logic", "input_schema": { "type": "object", "properties": { "workflow": {"type": "string"}, "context": {"type": "object"} } } } ] } ``` Note: Only 2 tools returned initially (search + orchestrator), not all 50+ tools from all MCP servers. Token savings: - Without deferred loading: ~5,000-10,000 tokens (all tool definitions) - With deferred loading: ~500-1,000 tokens (2 tool definitions) - Typically 80-90% reduction ### [](#tool-query-when-agent-needs-specific-tool)Tool query (when agent needs specific tool) Agent request: ```http POST /mcp/tools/search_tools Headers: Authorization: Bearer {TOKEN} Body: { "query": "database query" } ``` Gateway response: ```json { "tools": [ { "name": "execute_sql", "description": "Execute SQL query against the database", "mcp_server": "database-server", "input_schema": { "type": "object", "properties": { "query": {"type": "string"}, "database": {"type": "string"} }, "required": ["query"] } }, { "name": "list_tables", "description": "List all tables in the database", "mcp_server": "database-server", "input_schema": { "type": "object", "properties": { "database": {"type": "string"} } } } ] } ``` Agent receives only relevant tools based on query. ### [](#tool-execution)Tool execution Agent request: ```http POST /mcp/tools/execute_sql Headers: Authorization: Bearer {TOKEN} Body: { "query": "SELECT * FROM users WHERE tier = 'premium' LIMIT 10", "database": "prod" } ``` Gateway: 1. Routes to appropriate MCP server (database-server) 2. Executes tool 3. Returns result Gateway response: ```json { "result": [ {"id": 1, "name": "Alice", "tier": "premium"}, {"id": 2, "name": "Bob", "tier": "premium"}, ... ] } ``` Agent receives result and can continue reasoning. ## [](#deferred-tool-loading)Deferred tool loading ### [](#how-it-works)How it works Traditional MCP (No deferred loading): 1. Agent connects to MCP endpoint 2. Gateway sends all tools from all MCP servers (50+ tools) 3. Agent includes all tool definitions in every LLM request 4. High token cost: ~5,000-10,000 tokens per request Deferred loading (AI Gateway): 1. Agent connects to MCP endpoint with `rp-aigw-mcp-deferred: true` header 2. Gateway sends only 2 tools: `search_tools` + `orchestrator` 3. Agent includes only 2 tool definitions in LLM request (~500-1,000 tokens) 4. When agent needs specific tool: - Agent calls `search_tools` with query (for example, "database") - Gateway returns matching tools - Agent calls specific tool (for example, `execute_sql`) 5. Total token cost: Initial 500-1,000 + per-query ~200-500 ### [](#when-to-use-deferred-loading)When to use deferred loading Use deferred loading when: - You have 10+ tools across multiple MCP servers - Agents don’t need all tools for every request - Token costs are a concern - Agents can handle multi-step workflows (search → execute) Don’t use deferred loading when: - You have <5 tools total (overhead not worth it) - Agents need all tools for every request (rare) - Latency is more important than token costs (deferred adds 1 round trip) ### [](#configure-deferred-loading)Configure deferred loading Deferred loading is configured for each MCP server through the **Defer Loading Override** setting in the Create MCP Server dialog. 1. Navigate to your gateway’s **MCP** tab. 2. Create or edit an MCP server. 3. Under **Server Settings**, set **Defer Loading Override**: | Option | Description | | --- | --- | | Inherit from gateway | Use the gateway-level deferred loading setting (default) | | Enabled | Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. | | Disabled | Always load all tools from this server upfront. | 4. Click **Save**. ### [](#measure-token-savings)Measure token savings Compare token usage before/after deferred loading: 1. Check logs without deferred loading: - Filter: Gateway = your-gateway, Model = your-model, Date = before enabling - Note the average tokens per request 2. Enable deferred loading 3. Check logs after deferred loading: - Filter: Same gateway/model, Date = after enabling - Note the average tokens per request 4. Calculate savings: ```text Savings % = ((Before - After) / Before) × 100 ``` Expected results: Typically 80-90% reduction in average tokens per request ## [](#orchestrator-multi-step-workflows)Orchestrator: multi-step workflows ### [](#what-is-the-orchestrator)What is the orchestrator? The **orchestrator** is a special tool that executes JavaScript workflows, reducing multi-step interactions from multiple round trips to a single request. Without Orchestrator: 1. Agent: "Search vector database for relevant docs" → Round trip 1 2. Agent receives results, evaluates: "Results insufficient" 3. Agent: "Fallback to web search" → Round trip 2 4. Agent receives results, processes → Round trip 3 5. **Total: 3 round trips** (high latency, 3x token cost) With Orchestrator: 1. Agent: "Execute workflow: Search vector DB → if insufficient, fallback to web search" 2. Gateway executes entire workflow in JavaScript 3. Agent receives final result → **1 round trip** Benefits: - **Latency Reduction**: 1 round trip vs 3+ - **Token Reduction**: No intermediate LLM calls needed - **Reliability**: Workflow logic executes deterministically - **Cost**: Single LLM call instead of multiple ### [](#when-to-use-orchestrator)When to use orchestrator Use orchestrator when: - Multi-step workflows with conditional logic (if/else) - Fallback patterns (try A, if fails, try B) - Sequential tool calls with dependencies - Loop-based operations (iterate, aggregate) Don’t use orchestrator when: - Single tool call (no benefit) - Agent needs to reason between steps (orchestrator is deterministic) - Workflow requires LLM judgment at each step ### [](#orchestrator-example-search-with-fallback)Orchestrator example: search with fallback Scenario: Search vector database; if results insufficient, fallback to web search. Without Orchestrator (3 round trips): ```python # Agent's internal reasoning (3 separate LLM calls) # Round trip 1: Search vector DB vector_results = call_tool("vector_search", {"query": "Redpanda pricing"}) # Round trip 2: Agent evaluates results if len(vector_results) < 3: # Round trip 3: Fallback to web search web_results = call_tool("web_search", {"query": "Redpanda pricing"}) results = web_results else: results = vector_results # Agent processes final results ``` With Orchestrator (1 round trip): ```python # Agent invokes orchestrator once results = call_tool("orchestrator", { "workflow": """ // JavaScript workflow const vectorResults = await tools.vector_search({ query: context.query }); if (vectorResults.length < 3) { // Fallback to web search const webResults = await tools.web_search({ query: context.query }); return webResults; } return vectorResults; """, "context": { "query": "Redpanda pricing" } }) # Agent receives final results directly ``` Savings: - Latency: ~3-5 seconds (3 round trips) → ~1-2 seconds (1 round trip) - Tokens: ~1,500 tokens (3 LLM calls) → ~500 tokens (1 LLM call) - Cost: ~$0.0075 → ~$0.0025 (67% reduction) ### [](#orchestrator-api)Orchestrator API Tool name: `orchestrator` Input schema: ```json { "workflow": "string (JavaScript code)", "context": "object (variables available to workflow)" } ``` Available in workflow: - `tools.{tool_name}(params)`: Call any tool from approved MCP servers - `context.{variable}`: Access context variables - Standard JavaScript: `if`, `for`, `while`, `try/catch`, `async/await` Security: - Sandboxed execution (no file system, network, or system access) - Timeout and memory limits are system-managed and cannot be modified Limitations: - Cannot call external APIs directly (must use MCP tools) - Cannot import npm packages (built-in JS only) ### [](#orchestrator-example-data-aggregation)Orchestrator example: data aggregation Scenario: Fetch user data from database, calculate summary statistics. ```python results = call_tool("orchestrator", { "workflow": """ // Fetch all premium users const users = await tools.execute_sql({ query: "SELECT * FROM users WHERE tier = 'premium'", database: "prod" }); // Calculate statistics const stats = { total: users.length, by_region: {}, avg_spend: 0 }; let totalSpend = 0; for (const user of users) { // Count by region if (!stats.by_region[user.region]) { stats.by_region[user.region] = 0; } stats.by_region[user.region]++; // Sum spend totalSpend += user.monthly_spend; } stats.avg_spend = totalSpend / users.length; return stats; """, "context": {} }) ``` Output: ```json { "total": 1250, "by_region": { "us-east": 600, "us-west": 400, "eu": 250 }, "avg_spend": 149.50 } ``` vs Without Orchestrator: - Would require fetching all users to agent → agent processes → 2 round trips - Orchestrator: All processing in gateway → 1 round trip ### [](#orchestrator-best-practices)Orchestrator best practices DO: - Use for deterministic workflows (same input → same output) - Use for sequential operations with dependencies - Use for fallback patterns - Handle errors with `try/catch` - Keep workflows readable (add comments) DON’T: - Use for workflows requiring LLM reasoning at each step (let agent handle that) - Execute long-running operations (timeout will hit) - Access external resources (use MCP tools instead) - Execute untrusted user input (security risk) ## [](#mcp-server-administration)MCP server administration ### [](#add-mcp-servers)Add MCP servers Prerequisites: - MCP server URL - Authentication method (if required) - List of tools to enable Steps: 1. Navigate to MCP servers: - In the sidebar, navigate to **Agentic** > **AI Gateway** > **Gateways**, select your gateway, then select the **MCP** tab. 2. Configure server: ```yaml # PLACEHOLDER: Actual configuration format name: database-server url: https://mcp-database.example.com authentication: type: bearer_token token: ${SECRET_REF} # Reference to secret enabled_tools: * execute_sql * list_tables * describe_table ``` 3. Test connection: - Gateway attempts connection to MCP server - Verifies authentication - Retrieves tool list 4. Enable server: - Server status: Active - Tools available to agents Common MCP servers: - Database: PostgreSQL, MySQL, MongoDB query tools - Filesystem: Read/write/search files - API integrations: Slack, GitHub, Salesforce, Stripe - Search: web search, vector search, enterprise search - Code execution: Python, JavaScript sandboxes - Workflow: Zapier, n8n integrations ### [](#mcp-server-approval-workflow)MCP server approval workflow Why approval is required: - Security: Prevent agents from accessing unauthorized systems - Governance: Control which tools are available - Cost: Some tools are expensive (API calls, compute) - Compliance: Audit trail of approved tools Typical approval process: 1. Request: User/team requests MCP server 2. Review: Admin reviews security, cost, necessity 3. Approval/Rejection: Admin decision 4. Configuration: If approved, admin adds server to gateway > 📝 **NOTE** > > The exact approval workflow may vary by organization. In some cases, admins may directly enable servers without a formal workflow. Rejected server behavior: - Server not listed in tool discovery - Agent cannot query or invoke tools from this server - Requests return `403 Forbidden` ### [](#restrict-mcp-server-access)Restrict MCP server access Per-gateway restrictions: ```yaml # PLACEHOLDER: Actual configuration format gateways: - name: production-gateway mcp_servers: allowed: - database-server # Only this server allowed denied: - filesystem-server # Explicitly denied - name: staging-gateway mcp_servers: allowed: - "*" # All approved servers allowed ``` Use cases: - Production gateway: Only production-safe tools - Staging gateway: All tools for testing - Customer-specific gateway: Only tools relevant to customer ### [](#mcp-server-versioning)MCP server versioning Challenge: MCP server updates may change tool schemas. Best practices for version management: 1. Pin versions (if supported): ```yaml mcp_servers: * name: database-server version: "1.2.3" # Pin to specific version ``` 2. Test in staging first: - Update MCP server in staging gateway - Test agent workflows - Promote to production when validated 3. Monitor breaking changes: - Subscribe to MCP server changelogs - Set up alerts for schema changes ## [](#mcp-observability)MCP observability ### [](#logs)Logs MCP tool invocations appear in request logs with: - Tool name - MCP server - Input parameters - Output result - Execution time - Errors (if any) Filter logs by MCP: ```text Filter: request.path.startsWith("/mcp") ``` Common log fields: | Field | Description | Example | | --- | --- | --- | | Tool | Tool invoked | execute_sql | | MCP Server | Which server handled it | database-server | | Input | Parameters sent | {"query": "SELECT …​"} | | Output | Result returned | [{"id": 1, …​}] | | Latency | Tool execution time | 250ms | | Status | Success/failure | 200, 500 | ### [](#metrics)Metrics The following MCP-specific metrics may be available depending on your gateway configuration: - MCP requests per second - Tool invocation count (by tool, by MCP server) - MCP latency (p50, p95, p99) - MCP error rate (by server, by tool) - Orchestrator execution count - Orchestrator execution time Dashboard: MCP Analytics - Top tools by usage - Top MCP servers by latency - Error rate by MCP server - Token savings from deferred loading ### [](#debug-mcp-issues)Debug MCP issues Issue: "Tool not found" Possible causes: 1. MCP server not added to gateway 2. Tool not enabled in MCP server configuration 3. Deferred loading enabled but agent didn’t query for tool first Solution: 1. Verify MCP server is active in the Redpanda Cloud console 2. Verify tool is in enabled\_tools list 3. If deferred loading: Agent must call `search_tools` first Issue: "MCP server timeout" Possible causes: 1. MCP server is down/unreachable 2. Tool execution is slow (for example, expensive database query) 3. Gateway timeout too short Solution: 1. Check MCP server health 2. Optimize tool (for example, add database index) 3. Contact support if you need to adjust timeout limits Issue: "Orchestrator workflow failed" Possible causes: 1. JavaScript syntax error 2. Tool invocation failed inside workflow 3. Timeout exceeded 4. Memory limit exceeded Solution: 1. Test workflow syntax in JavaScript playground 2. Check logs for tool error inside orchestrator 3. Simplify workflow or increase timeout 4. Reduce data processing in workflow ## [](#security-considerations)Security considerations ### [](#authentication)Authentication Gateway → MCP server: - Bearer token (most common) - API key - mTLS (for high-security environments) Agent → Gateway: - Standard gateway authentication (Redpanda Cloud token) - Gateway endpoint URL identifies the gateway (and its approved MCP servers) ### [](#audit-trail)Audit trail All MCP operations logged: - Who (agent/user) invoked tool - When (timestamp) - What tool was invoked - What parameters were sent - What result was returned - Whether it succeeded or failed Use case: Compliance, security investigation, debugging ### [](#restrict-dangerous-tools)Restrict dangerous tools Recommendation: Don’t enable destructive tools in production gateways Examples of dangerous tools\*: - File deletion (`delete_file`) - Database writes without safeguards (`execute_sql` with UPDATE/DELETE) - Payment operations (`charge_customer`) - System commands (`execute_bash`) Best practice: - Read-only tools in production gateway - Write tools only in staging gateway (with approval workflows) - Wrap dangerous operations in MCP server with safeguards (for example, "require confirmation token") ## [](#mcp-llm-routing)MCP + LLM routing ### [](#combine-mcp-with-cel-routing)Combine MCP with CEL routing Use case: Route agents to different MCP servers based on customer tier CEL expression: ```cel request.headers["x-customer-tier"] == "enterprise" ? "gateway-with-premium-mcp-servers" : "gateway-with-basic-mcp-servers" ``` Result: - Enterprise customers: Access to proprietary data, expensive APIs - Basic customers: Access to public data, free APIs ### [](#mcp-with-provider-pools)MCP with provider pools Scenario: Different agents use different models + different tools Configuration: - Gateway A: GPT-5.2 + database + CRM MCP servers - Gateway B: Claude Sonnet + web search + analytics MCP servers Use case: Optimize model-tool pairing (some models better at certain tools) ## [](#integration-examples)Integration examples ### Python (OpenAI SDK) ```python from openai import OpenAI # Initialize client with MCP endpoint client = OpenAI( base_url=os.getenv("GATEWAY_ENDPOINT"), api_key=os.getenv("REDPANDA_CLOUD_TOKEN"), default_headers={ "rp-aigw-mcp-deferred": "true" # Enable deferred loading } ) # Discover tools tools_response = requests.get( f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools", headers={ "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}", "rp-aigw-mcp-deferred": "true" } ) tools = tools_response.json()["tools"] # Agent uses tools response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[ {"role": "user", "content": "Query the database for premium users"} ], tools=tools, # Pass MCP tools to agent tool_choice="auto" ) # Handle tool calls if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: # Execute tool via gateway tool_result = requests.post( f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools/{tool_call.function.name}", headers={ "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}", }, json=json.loads(tool_call.function.arguments) ) # Continue conversation with tool result response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[ {"role": "user", "content": "Query the database for premium users"}, response.choices[0].message, { "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(tool_result.json()) } ] ) ``` ### Claude Code CLI ```bash # Configure gateway with MCP export CLAUDE_API_BASE="https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1" export ANTHROPIC_API_KEY="your-redpanda-token" # Claude Code automatically discovers MCP tools from gateway claude code # Agent can now use aggregated MCP tools ``` ### LangChain ```python from langchain_openai import ChatOpenAI from langchain.agents import initialize_agent, Tool # Initialize LLM with gateway llm = ChatOpenAI( base_url=os.getenv("GATEWAY_ENDPOINT"), api_key=os.getenv("REDPANDA_CLOUD_TOKEN"), ) # Fetch MCP tools from gateway # PLACEHOLDER: LangChain-specific integration code # Create agent with MCP tools agent = initialize_agent( tools=mcp_tools, llm=llm, agent="openai-tools", verbose=True ) # Agent can now use MCP tools response = agent.run("Find all premium users in the database") ``` --- # Page 28: What is an AI Gateway? **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/ai-gateway/what-is-ai-gateway.md --- # What is an AI Gateway? --- title: What is an AI Gateway? latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: ai-gateway/what-is-ai-gateway page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: ai-gateway/what-is-ai-gateway.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/ai-gateway/what-is-ai-gateway.adoc description: Understand how AI Gateway keeps AI-powered apps highly available across providers and prevents runaway AI spend with centralized cost governance. page-topic-type: concept personas: evaluator, app_developer, platform_admin learning-objective-1: Explain how AI Gateway keeps AI-powered apps highly available through governed provider failover learning-objective-2: Describe how AI Gateway prevents runaway AI spend with centralized budget controls and tenancy-based governance learning-objective-3: Identify when AI Gateway fits your use case based on availability requirements, cost governance needs, and multi-provider or MCP tool usage page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Redpanda AI Gateway keeps your AI-powered applications highly available and your AI spend under control. It sits between your applications and the LLM providers and AI tools they depend on. If a provider goes down, the gateway provides automatic failover to keep your apps running. It also offers centralized budget controls to prevent runaway costs. For platform teams, it adds governance at the model-fallback level, tenancy modeling for teams, individuals, apps, and service accounts, and a single proxy layer for both LLM models and [MCP servers](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server). ## [](#the-problem)The problem Modern AI applications face two business-critical challenges: staying up and staying on budget. First, applications typically hardcode provider-specific SDKs. An application using OpenAI’s SDK cannot easily switch to Anthropic or Google without code changes and redeployment. When a provider hits rate limits, suffers an outage, or degrades in performance, your application goes down with it. Your end users don’t care which provider you use; they care that the app works. Second, costs can spiral without centralized controls. Without a single view of token consumption across teams and applications, it’s difficult to attribute costs to specific customers, features, or environments. Testing and debugging can generate unexpected bills, and there’s no way to enforce budgets or rate limits per team, application, or service account. The result: runaway spend that finance discovers only after the fact. These two challenges are compounded by fragmented observability across provider dashboards, which makes it harder to detect availability issues or cost anomalies in time to act. And as organizations adopt [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) that call [MCP tools](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-tool), the lack of centralized tool governance adds another dimension of uncontrolled cost and risk. ## [](#what-ai-gateway-solves)What AI Gateway solves Redpanda AI Gateway delivers two core business outcomes, high availability and cost governance, backed by platform-level controls that set it apart from simple proxy layers. ### [](#high-availability-through-governed-failover)High availability through governed failover Your end users don’t care whether you use OpenAI, Anthropic, or Google: they care that your app stays up. AI Gateway lets you configure provider pools with automatic failover, so when your primary provider hits rate limits, times out, or returns errors, the gateway routes requests to a fallback provider with no code changes and no downtime for your users. Unlike simple retry logic, AI Gateway provides governance at the failover level: you define which providers fail over to which, under what conditions, and with what priority. This controlled failover can significantly improve uptime even during extended provider outages. ### [](#cost-governance-and-budget-controls)Cost governance and budget controls AI Gateway gives you centralized fiscal control over AI spend. Set monthly budget caps for each gateway, enforce them automatically, and set rate limits per team, environment, or application. No more runaway costs discovered after the fact. You can route requests to different models based on user attributes. For example, to direct premium users to a more capable model while routing free tier users to a cost-effective option, use a CEL expression. For example: ```cel // Route premium users to best model, free users to cost-effective model request.headers["x-user-tier"] == "premium" ? "anthropic/claude-opus-4.6" : "anthropic/claude-sonnet-4.5" ``` You can also set different rate limits and spend limits for each environment to prevent staging or development traffic from consuming production budgets. ### [](#tenancy-and-access-governance)Tenancy and access governance AI Gateway provides multi-tenant isolation by design. Create separate gateways for teams, individual developers, applications, or service accounts, each with their own budgets, rate limits, routing policies, and observability scope. This tenancy model lets platform teams govern who uses what, how much they spend, and which models and tools they can access, without building custom authorization layers. ### [](#unified-llm-access-single-endpoint-for-all-providers)Unified LLM access (single endpoint for all providers) AI Gateway provides a single OpenAI-compatible endpoint that routes requests to multiple LLM providers. Instead of integrating with each provider’s SDK separately, you configure your application once and switch providers by changing only the model parameter. Without AI Gateway, you need different SDKs and patterns for each provider: ```python # OpenAI from openai import OpenAI client = OpenAI(api_key="sk-...") response = client.chat.completions.create( model="gpt-5.2", messages=[{"role": "user", "content": "Hello"}] ) # Anthropic (different SDK, different patterns) from anthropic import Anthropic client = Anthropic(api_key="sk-ant-...") response = client.messages.create( model="claude-sonnet-4.5", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) ``` With AI Gateway, you use the OpenAI SDK for all providers: ```python from openai import OpenAI # Single configuration, multiple providers client = OpenAI( base_url="", api_key="your-redpanda-token", ) # Route to OpenAI response = client.chat.completions.create( model="openai/gpt-5.2", messages=[{"role": "user", "content": "Hello"}] ) # Route to Anthropic (same code, different model string) response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[{"role": "user", "content": "Hello"}] ) # Route to Google Gemini (same code, different model string) response = client.chat.completions.create( model="google/gemini-2.0-flash", messages=[{"role": "user", "content": "Hello"}] ) ``` To switch providers, you change only the `model` parameter from `openai/gpt-5.2` to `anthropic/claude-sonnet-4.5`. No code changes or redeployment needed. ### [](#proxy-for-llm-models-and-mcp-servers)Proxy for LLM models and MCP servers AI Gateway acts as a single proxy layer for both LLM model requests and MCP servers. For LLM traffic, it provides a unified endpoint. For AI agents that use MCP tools, it aggregates multiple MCP servers and provides deferred tool loading, which dramatically reduces token costs. Without AI Gateway, agents typically load all available MCP tools from multiple MCP servers at startup. This approach sends 50+ tool definitions with every request, creating high token costs (thousands of tokens per request), slow agent startup times, and no centralized governance over which tools agents can access. With AI Gateway, you configure approved MCP servers once, and the gateway loads only search and orchestrator tools initially. Agents query for specific tools only when needed, which often reduces token usage by 80-90% depending on your configuration and the number of tools aggregated. You also gain centralized approval and governance over which MCP servers your agents can access. For complex workflows, AI Gateway provides a JavaScript-based orchestrator tool that reduces multi-step workflows from multiple round trips to a single call. For example, you can create a workflow that searches a vector database and, if the results are insufficient, falls back to web search—all in one orchestration step. ### [](#unified-observability-and-cost-tracking)Unified observability and cost tracking AI Gateway provides a single dashboard that tracks all LLM traffic across providers, eliminating the need to switch between multiple provider dashboards. The dashboard tracks request volume for each gateway, model, and provider, along with token usage for both prompt and completion tokens. You can view estimated spend per model with cross-provider comparisons, latency metrics (p50, p95, p99), and errors broken down by type, provider, and model. This unified view helps you answer critical questions such as which model is the most cost-effective for your use case, why a specific user request failed, how much your staging environment costs each week, and what the latency difference is between providers for your workload. ## [](#common-gateway-patterns)Common gateway patterns Some common patterns for configuring gateways include: - **Team isolation**: When multiple teams share infrastructure but need separate budgets and policies, create one gateway for each team. For example, you might configure Team A’s gateway with a $5K/month budget for both staging and production environments, while Team B’s gateway has a $10K/month budget with different rate limits. Each team sees only their own traffic in the observability dashboards, providing clear cost attribution and isolation. - **Environment separation**: To prevent staging traffic from affecting production metrics, create separate gateways for each environment. Configure the staging gateway with lower rate limits, restricted model access, and aggressive cost controls to prevent runaway expenses. The production gateway can have higher rate limits, access to all models, and alerting configured to detect anomalies. - **Primary and fallback for reliability**: To ensure uptime during provider outages, configure provider pools with automatic failover. For example, you can set OpenAI as your primary provider (preferred for quality) and configure Anthropic as the fallback that activates when the gateway detects rate limits or timeouts from OpenAI. Monitor the fallback rate to detect primary provider issues early, before they impact your users. - **A/B testing models**: To compare model quality and cost without dual integration, route a percentage of traffic to different models. For example, you can send 80% of traffic to `claude-sonnet-4.5` and 20% to `claude-opus-4.6`, then compare quality metrics and costs in the observability dashboard before adjusting the split. - **Customer-based routing**: For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: ### [](#customer-based-routing)Customer-based routing For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: ```cel request.headers["x-customer-tier"] == "enterprise" ? "anthropic/claude-opus-4.6" : request.headers["x-customer-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : "anthropic/claude-haiku" ``` ## [](#when-to-use-ai-gateway)When to use AI Gateway AI Gateway is ideal for organizations that: - Use or plan to use multiple LLM providers - Need centralized cost tracking and budgeting - Want to experiment with different models without code changes - Require high availability during provider outages - Have multiple teams or customers using AI services - Build AI agents that need MCP tool aggregation - Need unified observability across all AI traffic AI Gateway may not be necessary if: - You only use a single provider with simple requirements - You have minimal AI traffic (< 1000 requests/day) - You don’t need cost tracking or policy enforcement - Your application doesn’t require provider switching ## [](#next-steps)Next steps - [Gateway Quickstart](../gateway-quickstart/) - Get started quickly with a basic gateway setup **For Administrators:** - [Setup Guide](../admin/setup-guide/) - Enable providers, models, and create gateways - [Architecture Deep Dive](../gateway-architecture/) - Technical architecture details **For Builders:** - [Discover Available Gateways](../builders/discover-gateways/) - Find which gateways you can access - [Connect Your Agent](../builders/connect-your-agent/) - Integrate your application --- # Page 29: Model Context Protocol (MCP) **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp.md --- # Model Context Protocol (MCP) --- title: Model Context Protocol (MCP) latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/index.adoc description: Give AI agents direct access to your databases, queues, CRMs, and other business systems without writing custom glue code. page-git-created-date: "2026-01-22" page-git-modified-date: "2026-02-18" --- AI agents need context from your business systems. The Model Context Protocol (MCP) translates agent intent into real connections to databases, queues, CRMs, HRIS, and other systems of record, without you writing custom integration code. Redpanda’s MCP servers are built on the same proven connectors that power the world’s largest e-commerce, electric vehicle, energy, and AI companies. Redpanda Cloud offers two complementary MCP options: - **Remote MCP**: Deploy MCP servers directly in Redpanda Cloud for scalable, managed AI agent integrations - **Redpanda Cloud Management MCP Server**: Connect your local AI development environment to manage Redpanda Cloud resources - [MCP Servers for Redpanda Cloud Overview](overview/) Connect AI agents to your databases, queues, CRMs, and other business systems without writing glue code, using Redpanda's proven connectors. - [Remote MCP Servers for Redpanda Cloud](remote/) Build MCP tools that connect AI agents to databases, queues, CRMs, and other business systems using Redpanda's proven connectors. - [Redpanda Cloud Management MCP Server](local/) Manage your Redpanda Cloud clusters, topics, and users through AI agents using natural language commands. --- # Page 30: Redpanda Cloud Management MCP Server **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/local.md --- # Redpanda Cloud Management MCP Server --- title: Redpanda Cloud Management MCP Server page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/local/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/local/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/local/index.adoc # Beta release status page-beta: "true" description: Manage your Redpanda Cloud clusters, topics, and users through AI agents using natural language commands. page-git-created-date: "2025-09-08" page-git-modified-date: "2026-02-18" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta - [Redpanda Cloud Management MCP Server](overview/) Let AI agents securely operate your Redpanda Cloud clusters, topics, and users through natural language commands. - [Redpanda Cloud Management MCP Server Quickstart](quickstart/) Connect your Claude AI agent to your Redpanda Cloud account and clusters using the Redpanda Cloud Management MCP Server. - [Configure the Redpanda Cloud Management MCP Server](configuration/) Learn how to configure the Redpanda Cloud Management MCP Server, including auto and manual client setup, enabling deletes, and security considerations. --- # Page 31: Configure the Redpanda Cloud Management MCP Server **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/local/configuration.md --- # Configure the Redpanda Cloud Management MCP Server --- title: Configure the Redpanda Cloud Management MCP Server page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/local/configuration page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/local/configuration.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/local/configuration.adoc # Beta release status page-beta: "true" description: Learn how to configure the Redpanda Cloud Management MCP Server, including auto and manual client setup, enabling deletes, and security considerations. page-topic-type: how-to personas: agent_developer, platform_admin learning-objective-1: Configure MCP clients learning-objective-2: Enable delete operations safely learning-objective-3: Troubleshoot common configuration issues page-git-created-date: "2025-09-08" page-git-modified-date: "2026-02-18" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta After installing the Redpanda Cloud Management MCP Server, you can configure it for different AI clients, customize security settings, and troubleshoot common issues. After reading this page, you will be able to: - Configure MCP clients - Enable delete operations safely - Troubleshoot common configuration issues ## [](#prerequisites)Prerequisites - At least version 25.2.3 of [`rpk` installed on your local machine](../../../../manage/rpk/rpk-install/) - Access to a Redpanda Cloud account - An MCP-compatible AI client such as Claude, Claude Code, or another tool that supports MCP > 💡 **TIP** > > The MCP server exposes Redpanda Cloud API endpoints for both the [Control Plane](https://docs.redpanda.com/api/doc/cloud-controlplane/) and the [Data Plane](https://docs.redpanda.com/api/doc/cloud-dataplane/). Available endpoints depend on your `rpk` version. Keep `rpk` updated to access new Redpanda Cloud features through the MCP server. New MCP endpoints are documented in Redpanda [release notes](https://github.com/redpanda-data/redpanda/releases). ## [](#install-the-integration-for-claude-or-claude-code)Install the integration for Claude or Claude Code For some supported clients, you can install and configure the MCP integration using the [`rpk cloud mcp install` command](../../../../reference/rpk/rpk-cloud/rpk-cloud-mcp-install/). For Claude and Claude Code, run one of these commands: ```bash # Choose one rpk cloud mcp install --client claude rpk cloud mcp install --client claude-code ``` If you need to update the integration, re-run the install command for your client. ## [](#configure-other-mcp-clients-manually)Configure other MCP clients manually If you’re using another MCP-compatible client, manually configure it to use the Redpanda Cloud Management MCP Server. Follow these steps: Add an MCP server entry to your client’s configuration (example shown in JSON). Adjust paths for your system. ```json "mcpServers": { "redpandaCloud": { "command": "rpk", "args": [ "--config", "", (1) "cloud", "mcp", "stdio" ] } } ``` | 1 | Optional: The --config flag lets you target a specific rpk.yaml, which contains the configuration for connecting to your cluster. Always use the same configuration path as you used for rpk cloud login to ensure it has your token. Default paths vary by operating system. See the rpk cloud login reference for the default paths. | | --- | --- | You can also [start the server manually in a terminal to observe logs and troubleshoot](#local). ## [](#enable-delete-operations)Enable delete operations The server disables destructive operations by default. To allow delete operations, add `--allow-delete` to the MCP server invocation. > ⚠️ **CAUTION** > > Enabling delete operations permits actions like **deleting topics or clusters**. Restrict access to your AI client and double-check prompts. ### Auto-configured clients ```bash # Choose one rpk cloud mcp install --client claude --allow-delete rpk cloud mcp install --client claude-code --allow-delete ``` ### Manual configuration example ```json "mcpServers": { "redpandaCloud": { "command": "rpk", "args": [ "cloud", "mcp", "stdio", "--allow-delete" ] } } ``` ## [](#specify-configuration-file-paths)Specify configuration file paths All `rpk` commands accept a `--config` flag, which lets you specify the exact `rpk.yaml` configuration file to use for connecting to your Redpanda cluster. This flag overrides the default search path and ensures that the command uses the credentials and settings from the file you provide. Always use the same configuration path for both `rpk cloud login` and any MCP server setup or install commands to avoid authentication issues. By default, `rpk` searches for config files in standard locations depending on your operating system. See the [reference documentation](../../../../reference/rpk/rpk-cloud/rpk-cloud-login/) for details. Use an absolute path and make sure your user has read and write permissions. > ⚠️ **CAUTION** > > The `rpk` configuration file contains your Redpanda Cloud token. Keep the file secure and never share it. For example, if you want to use a custom config path, specify it for both login and the MCP install command: ```bash rpk cloud login --config /Users//my-rpk-config.yaml rpk cloud mcp install --client claude --config /Users//my-rpk-config.yaml ``` Or for Claude Code: ```bash rpk cloud login --config /Users//my-rpk-config.yaml rpk cloud mcp install --client claude-code --config /Users//my-rpk-config.yaml ``` ## [](#remove-the-mcp-server)Remove the MCP server To remove the MCP server, delete or disable the `mcpServers.redpandaCloud` entry in your client’s config (steps vary by client). ## [](#security-considerations)Security considerations - Avoid enabling `--allow-delete` unless required. - For most local use cases, such as with Claude or Claude Code, log in with your personal Redpanda Cloud user account for better security and easier management. - If you are deploying the MCP server as part of an application or shared environment, consider using a [service account](../../../../security/cloud-authentication/#authenticate-to-the-cloud-api) with tailored roles. To log in as a service account, use: ```bash rpk cloud login --client-id --client-secret --save ``` - Regularly review and rotate your credentials. ## [](#troubleshooting)Troubleshooting ### [](#verify-your-installation)Verify your installation 1. Make sure you are using at least version 25.2.3 of `rpk`. 2. If you see authentication errors, run `rpk cloud login` again. 3. Ensure you installed for the right client: ```bash rpk cloud mcp install --client claude # or rpk cloud mcp install --client claude-code ``` 4. If using another MCP client, verify your `mcpServers.redpandaCloud` entry (paths, JSON syntax, and args order). 5. Start the server manually using the [`rpk cloud mcp stdio` command](../../../../reference/rpk/rpk-cloud/rpk-cloud-mcp-stdio/) (one-time login required) to verify connectivity to Redpanda Cloud endpoints: ```bash rpk cloud login rpk cloud mcp stdio ``` 1. Send the following newline-delimited JSON-RPC messages (each on its own line): ```json {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{"roots":{},"sampling":{},"elicitation":{}},"clientInfo":{"name":"ManualTest","version":"0.1.0"}}} {"jsonrpc":"2.0","method":"notifications/initialized"} {"jsonrpc":"2.0","id":2,"method":"tools/list"} ``` Expected response shapes (examples): ```json {"jsonrpc":"2.0","id":1,"result":{"capabilities":{...}}} {"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"...","description":"..."}, ...]}} ``` 2. Stop the server with `Ctrl+C`. ### [](#client-cant-find-the-mcp-server)Client can’t find the MCP server - Re-run the install for your MCP client. - Confirm the path in `--config /path/to/rpk.yaml` exists and is readable. - Double-check your client’s configuration format and syntax. ### [](#unauthorized-errors-or-token-errors)Unauthorized errors or token errors Your capabilities depend on your Redpanda Cloud account permissions. If an operation fails with a permissions error, contact your account admin. - Run `rpk cloud login` to refresh the token. - Ensure your account has the necessary permissions for the requested operation. ### [](#deletes-not-working)Deletes not working - The server disables delete operations by default. Add `--allow-delete` to the server invocation (auto or manual configuration) and restart the client. - For auto-configured clients, you may need to edit the generated config or re-run the install command and adjust the entry. --- # Page 32: Redpanda Cloud Management MCP Server **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/local/overview.md --- # Redpanda Cloud Management MCP Server --- title: Redpanda Cloud Management MCP Server page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/local/overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/local/overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/local/overview.adoc # Beta release status page-beta: "true" description: Let AI agents securely operate your Redpanda Cloud clusters, topics, and users through natural language commands. page-topic-type: overview personas: evaluator, agent_developer, platform_admin learning-objective-1: Explain what the Redpanda Cloud Management MCP Server does learning-objective-2: Identify what operations are available through MCP learning-objective-3: Identify security considerations for MCP authentication page-git-created-date: "2025-09-08" page-git-modified-date: "2026-02-18" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta The Redpanda Cloud Management MCP Server lets AI agents securely access and operate your Redpanda Cloud account and clusters through natural language commands. After reading this page, you will be able to: - Explain what the Redpanda Cloud Management MCP Server does - Identify what operations are available through MCP - Identify security considerations for MCP authentication ![A terminal window showing Claude Code invoking the Redpanda Cloud Management MCP Server to list topics in a cluster.](../../../../shared/_images/cloud-mcp.gif) ## [](#what-you-can-do)What you can do MCP provides controlled access to: - [Control Plane](https://docs.redpanda.com/api/doc/cloud-controlplane/) APIs, such as creating a Redpanda Cloud cluster or listing clusters. - [Data Plane](https://docs.redpanda.com/api/doc/cloud-dataplane/) APIs, such as creating topics or listing topics. The MCP server runs on your computer and authenticates to Redpanda Cloud using a Redpanda Cloud token. You can do anything that’s available in the Control Plane or Data Plane APIs. Typical requests you can make to your assistant once connected include: - Create a Redpanda Cloud cluster named `dev-mcp`. - List topics in `dev-mcp`. - Create a topic `orders-raw` with 6 partitions. > 📝 **NOTE** > > The MCP server does **not** expose delete endpoints by default. You can enable delete endpoints when you create the server if you intentionally want to allow delete operations. ## [](#use-cases)Use cases - Test automation: Create short-lived clusters, create topics, and validate pipelines quickly. - Operational assistance: Inspect a cluster’s health or list topics during incidents. - Onboarding and demos: Let team members issue high-level requests without memorizing every CLI flag. ## [](#how-it-works)How it works 1. Authenticate to Redpanda Cloud and receive a token using the [`rpk cloud login` command](../../../../reference/rpk/rpk-cloud/rpk-cloud-login/). 2. Configure your MCP client using the [`rpk cloud mcp install`](../../../../reference/rpk/rpk-cloud/rpk-cloud-mcp-install/) command. Your client then starts the server on-demand using [`rpk cloud mcp stdio`](../../../../reference/rpk/rpk-cloud/rpk-cloud-mcp-stdio/), authenticating with the Redpanda Cloud token from `rpk cloud login`. 3. Prompt your assistant to perform Redpanda operations. The MCP server executes them in your Redpanda Cloud account using your Redpanda Cloud token. ### [](#components)Components The Redpanda Cloud Management MCP Server requires these components: - AI client (Claude, Claude Code, or any other MCP client) that connects to the MCP server. - Redpanda CLI (`rpk`) for obtaining a token and starting the MCP server. - Redpanda Cloud account that the MCP server can connect to and issue API requests. ## [](#security-considerations)Security considerations MCP servers authenticate to Redpanda Cloud using your personal or service account credentials. However, there is **no auditing or access control** that distinguishes between actions performed by MCP servers versus direct API calls: - All API actions appear in Redpanda Cloud’s internal logs as coming from the authenticated user account, not the specific MCP server. - You cannot audit which MCP server performed which operations, as Redpanda Cloud logs are not accessible to users. - You cannot restrict specific MCP servers to only certain API endpoints or resources. ## [](#next-steps)Next steps - [Redpanda Cloud Management MCP Server Quickstart](../quickstart/) - [Configure the Redpanda Cloud Management MCP Server](../configuration/) > 💡 **TIP** > > The Redpanda documentation site has a read-only MCP server that provides access to Redpanda docs and examples. This server has no access to your Redpanda Cloud account or clusters. See [MCP Server for Redpanda Documentation](../../../../../home/mcp-setup/). --- # Page 33: Redpanda Cloud Management MCP Server Quickstart **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/local/quickstart.md --- # Redpanda Cloud Management MCP Server Quickstart --- title: Redpanda Cloud Management MCP Server Quickstart page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/local/quickstart page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/local/quickstart.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/local/quickstart.adoc # Beta release status page-beta: "true" description: Connect your Claude AI agent to your Redpanda Cloud account and clusters using the Redpanda Cloud Management MCP Server. page-topic-type: tutorial personas: agent_developer, platform_admin learning-objective-1: Authenticate to Redpanda Cloud with rpk learning-objective-2: Install the MCP integration for Claude learning-objective-3: Issue natural language commands to manage clusters page-git-created-date: "2025-09-08" page-git-modified-date: "2026-02-18" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta In this quickstart, you’ll get your Claude AI agent talking to Redpanda Cloud using the [Redpanda Cloud Management MCP Server](../overview/). If you’re trying to deploy your own MCP server as a managed service inside your cluster, see [Remote MCP Server Quickstart](../../remote/quickstart/). After completing this quickstart, you will be able to: - Authenticate to Redpanda Cloud with rpk - Install the MCP integration for Claude - Issue natural language commands to manage clusters ## [](#prerequisites)Prerequisites - At least version 25.2.3 of [`rpk` installed on your computer](../../../../manage/rpk/rpk-install/) - Access to a Redpanda Cloud account - [Claude](https://support.anthropic.com/en/articles/10065433-installing-claude-desktop) or [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup) installed > 💡 **TIP** > > For other clients, see [Configure the Redpanda Cloud Management MCP Server](../configuration/). ## [](#set-up-the-mcp-server)Set up the MCP server 1. Verify your `rpk` version. ```bash rpk version ``` Ensure the version is at least 25.2.3. 2. Log in to Redpanda Cloud. ```bash rpk cloud login ``` A browser window opens. Sign in to grant access. After you sign in, `rpk` stores a token locally. This token is not shared with your AI agent. It is used by the MCP server to authenticate requests to your Redpanda Cloud account. 3. Install the MCP integration. Choose one client: ```bash # Claude desktop rpk cloud mcp install --client claude # Claude Code (IDE) rpk cloud mcp install --client claude-code ``` This command configures the MCP server for your client. If you need to update the integration, re-run the install command for your client. ## [](#start-prompting)Start prompting Launch Claude or Claude Code and try one of these prompts: - “Create a Redpanda Cloud cluster named `dev-mcp`.” - “List topics in `dev-mcp`.” - “Create a topic `orders-raw` with 6 partitions.” > 📝 **NOTE: Delete operations are opt-in** > > The MCP server does **not** expose API endpoints that result in delete operations by default. Use `--allow-delete` only if you intentionally want to enable delete operations. See [Enable delete operations](../configuration/#enable_delete_operations). ## [](#next-steps)Next steps - [Configure the Redpanda Cloud Management MCP Server](../configuration/) --- # Page 34: MCP Servers for Redpanda Cloud Overview **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/overview.md --- # MCP Servers for Redpanda Cloud Overview --- title: MCP Servers for Redpanda Cloud Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/overview.adoc description: Connect AI agents to your databases, queues, CRMs, and other business systems without writing glue code, using Redpanda's proven connectors. page-topic-type: overview personas: evaluator, ai_agent_developer learning-objective-1: Describe what MCP enables for AI agents learning-objective-2: Distinguish between Redpanda Cloud Management MCP Server and Remote MCP learning-objective-3: Choose the right MCP option for your use case page-git-created-date: "2025-10-21" page-git-modified-date: "2026-02-18" --- This page introduces MCP in Redpanda Cloud and helps you choose the right option for your use case. After reading this page, you will be able to: - Describe what MCP enables for AI agents - Distinguish between Redpanda Cloud Management MCP Server and Remote MCP - Choose the right MCP option for your use case ## [](#what-is-mcp)What is MCP? MCP (Model Context Protocol) is an open standard that translates AI agent intent into real connections to databases, queues, CRMs, HRIS, accounting software, and other business systems. Instead of writing custom glue code for every integration, you define your tools once using MCP, and any MCP-compatible AI client can discover and use them. Without MCP, connecting AI to your business systems requires custom API code, authentication handling, and response formatting for each AI platform. With MCP, you describe what a tool does and what inputs it needs, and the protocol handles the rest. Redpanda’s MCP servers are built on the same proven connectors that power the world’s largest e-commerce, electric vehicle, energy, and AI companies today. ## [](#mcp-options-in-redpanda-cloud)MCP options in Redpanda Cloud Redpanda Cloud offers two complementary MCP options: - **Redpanda Cloud Management MCP Server**: A pre-built server that gives AI agents access to Redpanda Cloud APIs. It runs on your computer and lets you manage clusters, topics, and other resources through natural language. Example: "Create a cluster called `dev-analytics` with 3 brokers." - **Remote MCP**: Your own MCP server built with Redpanda Connect and hosted inside your Redpanda Cloud cluster. You define custom tools that access your data, call external APIs, or trigger workflows. Example: "Analyze the last 100 orders and show me the top product categories." ### [](#comparison)Comparison | | Redpanda Cloud Management MCP Server | Remote MCP | | --- | --- | --- | | Purpose | Operate your Redpanda Cloud account | Build custom tools for your data and workflows | | Where it runs | Your computer | Redpanda Cloud (managed) | | Who builds it | Redpanda | You, using Redpanda Connect | | Access to | Redpanda Cloud APIs (clusters, topics, users, ACLs) | Anything you configure (databases, APIs, Redpanda topics, LLMs) | | Best for | Platform operations, quick admin tasks, incident response | Data analysis, workflow automation, team-shared tools | ## [](#which-should-i-use)Which should I use? Choose based on what you want to accomplish: **Use the Redpanda Cloud Management MCP Server** when you want to: - Manage Redpanda Cloud resources without memorizing CLI commands - Quickly create test clusters, topics, or users - Inspect cluster health during incidents - Onboard team members who prefer natural language over APIs **Use Remote MCP** when you want to: - Build tools that access your business data in Redpanda topics - Create reusable tools shared across your team - Connect AI agents to external systems (databases, APIs, LLMs) - Run tools close to your data with managed infrastructure You can use both options together. For example, use the Redpanda Cloud Management MCP Server to create a cluster, then deploy Remote MCP tools to analyze data in that cluster. ## [](#get-started)Get started - [Redpanda Cloud Management MCP Server Quickstart](../local/quickstart/): Connect Claude to your Redpanda Cloud account - [Remote MCP Server Quickstart](../remote/quickstart/): Build and deploy custom MCP tools ## [](#suggested-reading)Suggested reading - [MCP Server for Redpanda Documentation](../../../../home/mcp-setup/): Access Redpanda documentation through AI agents (read-only, no Cloud access required) --- # Page 35: Remote MCP Servers for Redpanda Cloud **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote.md --- # Remote MCP Servers for Redpanda Cloud --- title: Remote MCP Servers for Redpanda Cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/index.adoc description: Build MCP tools that connect AI agents to databases, queues, CRMs, and other business systems using Redpanda's proven connectors. page-git-created-date: "2025-10-21" page-git-modified-date: "2026-02-18" --- - [Remote MCP Server Overview](overview/) Build and host MCP tools that connect AI agents to your business systems without writing glue code, using Redpanda's proven connectors. - [MCP Tool Execution and Components](concepts/) Understand how MCP tools execute requests, choose the right Redpanda Connect component type, and use traces for observability. - [Remote MCP Server Quickstart](quickstart/) Build and deploy your first MCP tools to connect AI agents to your Redpanda data without writing custom integration code. - [Create an MCP Tool](create-tool/) Create an MCP tool with the correct YAML structure, metadata, and parameter mapping. - [MCP Tool Design](best-practices/) Design effective MCP tool interfaces with clear names, descriptions, and input properties. - [MCP Tool Patterns](tool-patterns/) Catalog of patterns for MCP server tools in Redpanda Cloud. - [Troubleshoot Remote MCP Servers](troubleshooting/) Diagnose and fix common issues when building and running Remote MCP servers in Redpanda Cloud. - [Manage Remote MCP Servers](manage-servers/) Learn how to edit, stop, start, and delete MCP servers in Redpanda Cloud. - [Monitor MCP Server Activity](monitor-mcp-servers/) Consume traces, track tool invocations, measure performance, and debug failures in MCP servers. - [Scale Remote MCP Server Resources](scale-resources/) Learn how to scale MCP server resources up or down to match workload demands and optimize costs. --- # Page 36: MCP Tool Design **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/best-practices.md --- # MCP Tool Design --- title: MCP Tool Design latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/best-practices page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/best-practices.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/best-practices.adoc description: Design effective MCP tool interfaces with clear names, descriptions, and input properties. page-topic-type: best-practices personas: agent_developer learning-objective-1: Write tool names and descriptions that help AI clients select the right tool learning-objective-2: Define input properties with appropriate types and constraints learning-objective-3: Design focused tools that complete quickly page-git-created-date: "2026-01-13" page-git-modified-date: "2026-02-18" --- After [creating your first tool](../create-tool/), apply these design guidelines so AI clients can discover, understand, and invoke your tools correctly. After reading this page, you will be able to: - Write tool names and descriptions that help AI clients select the right tool - Define input properties with appropriate types and constraints - Design focused tools that complete quickly ## [](#tool-discovery)Tool discovery AI clients use tool names and descriptions to decide which tool to call. Good metadata helps AI select the right tool and provide correct inputs. The `label` field defines the tool name. The `meta.mcp` block defines the description and input properties. ```yaml label: get-weather meta: tags: [ weather, api, example ] mcp: enabled: true description: "Get current weather for a city. Returns temperature, humidity, and conditions." properties: - name: city type: string description: "City name (e.g., 'London', 'Tokyo', 'New York')" required: true ``` For the complete field reference, see [MCP metadata fields](../create-tool/#mcp-metadata). ### [](#choose-a-clear-tool-name)Choose a clear tool name AI clients see the `label` value when selecting tools, so make it descriptive and consistent. The name should reflect the tool’s primary action and target resource. Focus on clarity over brevity. For example, use `get-weather-forecast` instead of just `get-data`. ### [](#write-effective-descriptions)Write effective descriptions Tool descriptions help AI clients decide when to use your tool. Start with an action verb and explain what the tool returns. ```yaml description: "Get current weather for a city. Returns temperature in Celsius and Fahrenheit, humidity percentage, and weather description." ``` Include any limitations or requirements in the description: ```yaml description: "Search product catalog by name or category. Returns up to 10 results. Requires at least one search term." ``` ### [](#define-input-properties-clearly)Define input properties clearly Each property needs a name, type, description, and required status. Use `string`, `number`, or `boolean` types. Include example values in descriptions: ```yaml properties: - name: city type: string description: "City name (e.g., 'London', 'New York', 'Tokyo')" required: true - name: units type: string description: "Temperature units: 'celsius' or 'fahrenheit'. Defaults to 'celsius'." required: false - name: limit type: number description: "Maximum results to return (1-100). Defaults to 10." required: false - name: include_forecast type: boolean description: "If true, include 5-day forecast. Defaults to false." required: false ``` Mark properties as required only if the tool cannot function without them: ```yaml properties: - name: city type: string description: "City name to get weather for" required: true ``` Optional properties should have sensible defaults: ```yaml - name: units type: string description: "Temperature units. Defaults to 'celsius'." required: false ``` For patterns that apply defaults and validate values at runtime, see [input validation patterns](../tool-patterns/#input-validation). ## [](#tool-execution)Tool execution ### [](#keep-tools-focused)Keep tools focused Each tool should do one thing well. If you find yourself adding multiple unrelated operations, split them into separate tools. Tools that do too much or have vague purposes cause problems because AI clients rely on descriptions to choose tools. Vague descriptions lead to wrong tool selection. Also, tools that do too much are harder to test and debug. Write descriptions that clearly state what the tool does, what input it needs, and what it returns. If a tool is doing multiple things, split it into focused tools. ### [](#design-for-quick-completion)Design for quick completion MCP tools should complete quickly. AI clients wait for responses, and long-running tools cause poor user experiences. Tools that wait indefinitely, poll continuously, or never return cause problems because MCP tools use a request/response model. A tool that never completes will time out and fail, and resources remain allocated while waiting. Follow these guidelines: - Set explicit timeouts on all external calls. For timeout options, see [`http` processor](../../../../develop/connect/components/processors/http/). - Avoid unbounded reads. Read N messages, not all messages. - Consider pagination for large datasets. - Return partial results if full processing takes too long. For patterns that handle timeout failures gracefully, see [error handling patterns](../tool-patterns/#error-handling). ## [](#complete-example)Complete example This example combines all the best practices: ```yaml label: search-customer-orders processors: - mutation: | let customer_id = this.customer_id | "" let status = this.status | "" let limit = (this.limit | 10).number() root = { "orders": [ { "order_id": "ord_001", "customer_id": $customer_id, "status": if $status != "" { $status } else { "delivered" }, "total": 125.99 } ], "count": 1, "filters_applied": { "customer_id": $customer_id, "status": $status, "limit": $limit } } meta: tags: [ orders, database, production ] mcp: enabled: true description: "Search customer orders by customer ID, date range, or status. Returns order summaries with totals. Maximum 50 results per query." properties: - name: customer_id type: string description: "Customer ID (e.g., 'cust_12345'). Required if no other filters provided." required: false - name: status type: string description: "Order status filter: 'pending', 'shipped', 'delivered', or 'cancelled'." required: false - name: start_date type: string description: "Start date for date range filter (ISO 8601, e.g., '2024-01-01')." required: false - name: end_date type: string description: "End date for date range filter (ISO 8601, e.g., '2024-12-31')." required: false - name: limit type: number description: "Maximum results to return (1-50). Defaults to 10." required: false ``` ## [](#next-steps)Next steps - [Use secrets for credentials](../create-tool/#secrets) - [MCP Tool Patterns](../tool-patterns/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) --- # Page 37: MCP Tool Execution and Components **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/concepts.md --- # MCP Tool Execution and Components --- title: MCP Tool Execution and Components latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/concepts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/concepts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/concepts.adoc description: Understand how MCP tools execute requests, choose the right Redpanda Connect component type, and use traces for observability. page-topic-type: concepts personas: agent_developer, streaming_developer learning-objective-1: Describe the request/response execution model learning-objective-2: Choose the right component type for a use case learning-objective-3: Interpret MCP server traces for debugging and monitoring page-git-created-date: "2026-01-13" page-git-modified-date: "2026-02-18" --- This page explains how MCP tools execute and how to choose the right component type for your use case. After reading this page, you will be able to: - Describe the request/response execution model - Choose the right component type for a use case - Interpret MCP server traces for debugging and monitoring ## [](#how-components-map-to-mcp-tools)How components map to MCP tools Each MCP tool is implemented as a single Redpanda Connect component. The component type determines what the tool can do. The following table shows which component types are available and their purposes: | Component Type | Purpose as an MCP Tool | | --- | --- | | Processor | Transforms, validates, or computes data. Calls external APIs. Returns results to the AI client. | | Output | Writes data to external systems (Redpanda topics, databases, APIs). Can include processors for transformation before writing. | | Input | Reads data from external systems. Returns the read data to the AI client. | | Cache | Stores and retrieves data for use by other tools. | Most MCP tools are processors. Use outputs when you need to write data. Use inputs when you need to read from external data sources. ## [](#execution-model)The MCP execution model When an AI client calls an MCP tool, the MCP server handles the request in a specific sequence. The execution follows these steps: 1. The AI client sends a JSON request to the MCP server with the tool name and parameters. 2. The MCP server finds the corresponding component configuration. 3. The MCP server executes the component with the input data. 4. The component runs to completion and returns a result. 5. The MCP server sends the result back to the AI client. 6. The component instance is torn down. This execution model has several important characteristics: - Stateless execution: Each tool invocation is independent. Tools do not maintain state between calls. If you need state, use an external store such as a [cache](../../../../develop/connect/components/caches/about/), database, or Redpanda topic. - Synchronous by default: Tools run synchronously from the AI client’s perspective. The client waits for the response before continuing. - Timeout boundaries: Tools should complete quickly. Long-running operations should be avoided or handled asynchronously. Set explicit timeouts on external calls. - No continuous processing: Unlike a traditional Redpanda Connect pipeline, MCP tools do not poll for messages or maintain connections between invocations. They start, execute, and stop. MCP tools use an agent-initiated execution model where agents invoke tools on-demand. Redpanda also supports pipeline-initiated integration where pipelines call agents using the `a2a_message` processor. For guidance on choosing between these patterns, see [Integration Patterns Overview](../../../agents/integration-overview/). ## [](#component-selection)Choose the right component type Every MCP tool is implemented as a single component. Choosing the right component type is a critical design decision that affects what your tool can do and how it behaves. ### [](#decision-framework)Decision framework To choose the right component type, ask what the tool’s primary purpose is. Use the following table to match your tool’s intent to a component type: | Question | Component Type | | --- | --- | | Does the tool compute or transform data and return results? | Processor | | Does the tool call external APIs and return the response? | Processor | | Does the tool write data to an external system (database, topic, API)? | Output | | Does the tool read data from an external source and return it? | Input | | Does the tool store and retrieve temporary data for other tools? | Cache | The core principle is to choose the component type that matches the tool’s primary intent. ### [](#processor-tools)Processor tools Processor tools transform, validate, compute, or fetch data and return results to the AI client. This is the most common tool type. See the [processors reference](../../../../develop/connect/components/processors/about/) for available processors. #### [](#when-to-choose-a-processor-tool)When to choose a processor tool Choose a processor tool when the tool’s purpose is to compute or transform data, call an external API and return the response, or validate inputs and return errors or results. #### [](#use-case-fetch-and-transform-external-data)Use case: Fetch and transform external data Consider a scenario where an AI agent needs current weather data to answer a user’s question about whether to bring an umbrella. The following prompts should trigger this type of tool: - "What’s the weather in Berlin?" - "Is it raining in Tokyo right now?" - "Get me the current temperature for Seattle." A processor is the right choice because the tool fetches data from an API, transforms it into a useful format, and returns it. #### [](#use-case-validate-and-normalize-data)Use case: Validate and normalize data Consider a scenario where an AI agent needs to validate user-submitted data and return structured feedback about any issues. The following prompts should trigger this type of tool: - "Validate this customer record before saving." - "Check if this order has all required fields." - "Normalize this JSON and tell me what’s missing." A processor is the right choice because the tool examines data, applies validation rules, and returns results. No data is written anywhere. ### [](#output-tools)Output tools Output tools write data to external systems. Use them when the primary purpose is to create a side effect such as persisting data, publishing an event, or triggering an action. See the [outputs reference](../../../../develop/connect/components/outputs/about/) for available outputs. #### [](#when-to-choose-an-output-tool)When to choose an output tool Choose an output tool when the tool’s purpose is to write data to Redpanda, a database, or an external API. The side effect (writing) should be the primary intent, not incidental. You can use `processors:` within the output to transform data before writing. Output tools are appropriate when you want the AI to trigger real-world actions. #### [](#understanding-tool-response-vs-side-effect)Understanding tool response vs. side effect Output tools have two outcomes: the side effect (data is written to the destination) and the tool response (the AI client receives confirmation that the write succeeded). The AI client does not receive the written data back. It receives status information. If you need to return the written data, consider using a processor tool instead. #### [](#use-case-publish-events-to-redpanda)Use case: Publish events to Redpanda Consider a scenario where an AI agent needs to publish order events to Redpanda for downstream processing. The following prompts should trigger this type of tool: - "Publish this order to Redpanda." - "Send the order event to the orders topic." - "Record this new order for processing." An output is the right choice because the purpose is to write data to Redpanda. The AI needs to create a persistent record, not just compute something. #### [](#use-case-transform-and-publish)Use case: Transform and publish Output components can include a `processors:` section that transforms data before writing to the destination. This is a single output component, not a combination of component types. Consider a scenario where an AI agent asks an LLM to summarize a document, then stores both the original and summary in Redpanda. The following prompts should trigger this type of tool: - "Summarize this document and save it." - "Process this feedback with GPT and store the analysis." - "Analyze this text and publish the results." An output with processors is the right choice because the primary intent is to store data. The processors provide pre-processing before writing. The execution flow for this pattern is as follows: 1. AI client calls the tool with input data. 2. The `processors` section transforms the data. 3. The output component writes the transformed data to the destination. 4. The tool returns a response to the AI client. For implementation examples, see [outputs with processors](../tool-patterns/#outputs-with-processors) in the tool patterns guide. ### [](#input-tools)Input tools Input tools read data from external sources and return it to the AI client. They’re useful when you need to query or fetch existing data. See the [inputs reference](../../../../develop/connect/components/inputs/about/) for available inputs. #### [](#when-to-choose-an-input-tool)When to choose an input tool Choose an input tool when the tool’s purpose is to read and return data from an external source, consume messages from a Redpanda topic, or build a query-style tool that retrieves existing data. #### [](#bounded-vs-unbounded-reads)Bounded vs. unbounded reads Input tools must return a finite result. Use bounded reads that fetch a specific number of messages or read until a condition is met. For example, "get me the latest N events" or "read messages from the last hour". Unbounded reads that poll continuously are not appropriate for MCP tools because the tool would never return a response to the AI client. #### [](#latency-and-scope-considerations)Latency and scope considerations Keep these factors in mind when building input tools: - Input tools may have variable latency depending on the data source. - Scope your reads appropriately. Don’t try to read entire topics. - Consider consumer group behavior: with a consumer group, each invocation advances through the stream. Without one, each invocation may read the same data. #### [](#use-case-query-recent-events)Use case: Query recent events Consider a scenario where an AI agent needs to retrieve recent user activity events to understand user behavior. The following prompts should trigger this type of tool: - "Show me recent user events." - "Get the last 10 login events." - "What events happened in the user-events topic recently?" An input is the right choice because the tool reads from an existing data source (topic) and returns what it finds. ### [](#cache-tools)Cache tools Cache tools store and retrieve temporary data that other tools can access. They’re useful for sharing state between tool calls or storing frequently accessed data. See the [caches reference](../../../../develop/connect/components/caches/about/) for available caches. #### [](#when-to-choose-a-cache-tool)When to choose a cache tool Choose a cache tool when the tool’s purpose is to store temporary data that expires after a set time, share state between multiple tool calls in a conversation, or reduce repeated calls to slow external APIs by caching results. #### [](#use-case-session-state-management)Use case: Session state management Consider a scenario where an AI agent needs to remember user preferences across multiple tool calls within a conversation. The following prompts should trigger this type of tool: - "Remember that I prefer metric units." - "Store my timezone as America/New\_York." - "Save this search filter for later." A cache is the right choice because the data is temporary, session-scoped, and needs to be accessible by other tools during the conversation. #### [](#use-case-api-response-caching)Use case: API response caching Consider a scenario where an AI agent frequently looks up the same reference data (like exchange rates or product catalogs) and you want to avoid repeated API calls. The following prompts should trigger cache usage: - "Get the current exchange rate" (cached for 5 minutes) - "Look up product details" (cached for 1 hour) - "Check inventory levels" (cached briefly to reduce load) A cache is the right choice because you want to store API responses temporarily and serve them on subsequent requests without hitting the external API again. ### [](#component-selection-summary)Component selection summary The following table summarizes when to use each component type: | Component | Primary Intent | Example Tools | Returns | | --- | --- | --- | --- | | Processor | Compute, transform, validate, fetch | Weather lookup, data validation, API calls | Computed result | | Output | Write data with side effects | Publish events, store records, trigger webhooks | Write confirmation | | Output + processors | Transform then write | Summarize and store, enrich and publish | Write confirmation | | Input | Read and return data | Query recent events, search logs | Retrieved data | | Cache | Store and retrieve temporary data | Session state, API response caching | Cached value or confirmation | For implementation examples and common patterns, see [MCP Tool Patterns](../tool-patterns/). ## [](#observability)Observability MCP servers automatically emit OpenTelemetry traces for monitoring and debugging. For detailed information about traces, spans, and the trace structure, see [Transcripts and AI Observability](../../../observability/concepts/). To monitor MCP server activity, consume traces, and debug failures, see [Monitor MCP Server Activity](../monitor-mcp-servers/). ## [](#service-account-authorization)Service account authorization When you create an MCP server or AI agent, Redpanda Cloud automatically creates a service account to authenticate requests to your cluster. The service account is created with the following: **Name**: Prepopulated as `cluster----sa`, where `sa` stands for service account. For example: - MCP server: `cluster-d5tp5kntujt599ksadgg-mcp-my-test-server-sa` - AI agent: `cluster-d5tp5kntujt599ksadgg-agent-my-agent-sa` You can customize this name during creation. **Role binding**: Cluster scope with Writer role for the cluster where you created the resource. This allows the resource to read and write data, manage topics, and access cluster resources. ### [](#manage-service-accounts)Manage service accounts You can view and manage service accounts created for MCP servers and AI agents at the organization level in **Organization IAM** > **Service account**. This page shows additional details not visible during creation: | Field | Description | | --- | --- | | Client ID | Unique identifier for OAuth2 authentication | | Description | Optional description of the service account | | Created at | Timestamp when the service account was created | | Updated at | Timestamp of the last modification | From this page you can: - Edit the service account name or description - View and manage role bindings - Rotate credentials - Delete the service account > 📝 **NOTE** > > Deleting a service account removes authentication for the associated MCP server or AI agent. The resource can no longer access cluster data. ### [](#customize-role-bindings)Customize role bindings The default Writer role provides broad access suitable for most use cases. If you need more restrictive permissions: 1. Exit the cluster and navigate to **Organization IAM** > **Service account**. 2. Find the service account for your resource. 3. Edit the role bindings to use a more restrictive role or scope. For more information about roles and permissions, see [Role-based access control](../../../../security/authorization/rbac/rbac/) or [Group-based access control](../../../../security/authorization/gbac/gbac/). ## [](#next-steps)Next steps - [Create an MCP Tool](../create-tool/) - [MCP Tool Design](../best-practices/) - [MCP Tool Patterns](../tool-patterns/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) --- # Page 38: Create an MCP Tool **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/create-tool.md --- # Create an MCP Tool --- title: Create an MCP Tool latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/create-tool page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/create-tool.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/create-tool.adoc description: Create an MCP tool with the correct YAML structure, metadata, and parameter mapping. page-topic-type: how-to personas: agent_developer, streaming_developer, data_engineer learning-objective-1: Create a tool with the correct structure and MCP metadata learning-objective-2: Map MCP parameters to component configuration fields using Bloblang learning-objective-3: Test tools using the MCP Inspector page-git-created-date: "2026-01-13" page-git-modified-date: "2026-02-18" --- After [deploying your first MCP server](../quickstart/), create custom tools that AI clients can discover and invoke. This guide walks you through the process using any Redpanda Connect component. After reading this page, you will be able to: - Create a tool with the correct structure and MCP metadata - Map MCP parameters to component configuration fields using Bloblang - Test tools using the MCP Inspector ## [](#prerequisites)Prerequisites - A Redpanda Cloud cluster with **Remote MCP** enabled. - You can describe the MCP execution model (see [The MCP execution model](../concepts/#execution-model)) - You have chosen the right component type for your use case (see [Choose the right component type](../concepts/#component-selection)) ## [](#create-the-tool)Create the tool In Redpanda Cloud, you create tools directly in the Cloud Console or using the Data Plane API. ### Cloud Console 1. Log in to the [Redpanda Cloud Console](https://cloud.redpanda.com/). 2. Navigate to **Agentic AI** > **Remote MCP** and either create a new MCP server or edit an existing one. 3. In the **Tools** section, click **Add Tool**. 4. Enter the YAML configuration for your tool. 5. Click **Lint** to validate your configuration. 6. Click **Save** to deploy the tool. ### Data Plane API Use the [Create MCP Server](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_createmcpserver) or [Update MCP Server](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_updatemcpserver) endpoints to add tools programmatically. ## [](#yaml-structure)Add the tool structure An MCP tool wraps a [Redpanda Connect component](../../../../develop/connect/components/about/) and exposes it to AI clients. Each tool has three parts: - **Label**: The tool name AI clients see - **Component configuration**: A Redpanda Connect component (processor, input, output, or cache) that does the work - **MCP metadata**: Describes the tool’s purpose and parameters for AI clients Here’s an example using the [`sql_select` processor](../../../../develop/connect/components/processors/sql_select/): ```yaml label: lookup-customer (1) sql_select: (2) driver: postgres dsn: "${secrets.DATABASE_URL}" table: customers columns: ["id", "name", "email", "plan"] where: id = ? args_mapping: '[this.customer_id]' meta: (3) mcp: enabled: true description: "Look up a customer by ID and return their profile." properties: - name: customer_id type: string description: "The customer's unique identifier" required: true ``` | 1 | Label: Becomes the tool name. | | --- | --- | | 2 | Component: The sql_select processor configured to query a database. | | 3 | MCP metadata: Tells AI clients what this tool does and what parameters it accepts. | Each tool configuration must contain exactly one component. The component type is inferred from the type you select when creating or editing the MCP server. The component can be a processor, input, output, or cache. The following sections show how to structure tools for each component type. ### [](#label-naming-rules)Label naming rules The `label` field (tool name) must follow these rules: - Lowercase letters, numbers, underscores, and hyphens only (`a-z`, `0-9`, `_`, `-`) - Cannot start with an underscore - No spaces or special characters Valid examples: `get-weather`, `lookup_customer`, `send-notification-v2` ### [](#component-types)Component types [Processors](../../../../develop/connect/components/processors/about/) transform, filter, or enrich data. Use a `processors:` array with one or more processors: Processor tool ```yaml label: enrich-order processors: - http: url: "https://api.example.com/lookup" verb: GET meta: mcp: enabled: true description: "Enrich order with customer data" ``` [Inputs](../../../../develop/connect/components/inputs/about/) read data from sources, [outputs](../../../../develop/connect/components/outputs/about/) write data to destinations, and [caches](../../../../develop/connect/components/caches/about/) store and retrieve data. Define these components directly at the top level: Input tool ```yaml label: read-events redpanda: (1) seed_brokers: ["${REDPANDA_BROKERS}"] topics: ["events"] consumer_group: "mcp-reader" tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${secrets.MCP_USERNAME}" password: "${secrets.MCP_PASSWORD}" meta: mcp: enabled: true description: "Read events from Redpanda" ``` | 1 | The component name (redpanda) is at the top level, not wrapped in input:. | | --- | --- | Output tool ```yaml label: publish-event redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: "processed-events" tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${secrets.MCP_USERNAME}" password: "${secrets.MCP_PASSWORD}" meta: mcp: enabled: true description: "Publish event to Redpanda" ``` Cache tool ```yaml label: session-cache memory: default_ttl: 300s meta: mcp: enabled: true description: "In-memory cache for session data" ``` Outputs can include a `processors:` section to transform data before publishing: Output tool with processors ```yaml label: publish-with-timestamp processors: - mutation: | root = this root.published_at = now() redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: "processed-events" tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${secrets.MCP_USERNAME}" password: "${secrets.MCP_PASSWORD}" meta: mcp: enabled: true description: "Add timestamp and publish to Redpanda" ``` See [outputs with processors](../tool-patterns/#outputs-with-processors) for more examples. Do not wrap components in `input:`, `output:`, or `cache:` blocks. This syntax is for pipelines, not MCP tools. ### [](#mcp-metadata)MCP metadata fields The `meta.mcp` block defines how AI clients discover and interact with your tool. These fields control tool visibility, naming, and input parameters. | Field | Required | Description | | --- | --- | --- | | enabled | Yes | Set to true to expose this component as an MCP tool. Set to false to disable without deleting the configuration. | | description | Yes | Explains what the tool does and what it returns. AI clients use this to decide when to call the tool. | | properties | No | Array of input parameters the tool accepts. See [property-fields] for the fields in each property. | | tags | No | Array of strings for categorizing tools. | #### [](#mcp-property-fields)Property fields Each entry in the `properties` array defines an input parameter: | Field | Required | Description | | --- | --- | --- | | name | Yes | Parameter name. | | type | Yes | Data type. Must be one of: string, number, or boolean. | | description | Yes | Explains what the parameter is for. Include example values and any constraints. | | required | Yes | Set to true if the tool cannot function without this parameter. | #### [](#property-restrictions)Property restrictions by component type Different component types have different property capabilities when exposed as MCP tools: | Component Type | Property Support | Details | | --- | --- | --- | | input | Only supports the count property | AI clients can specify how many messages to read, but you cannot define custom properties. | | cache | No custom properties | Properties are hardcoded to key and value for cache operations. | | output | Custom properties supported | AI sees properties as an array for batch operations: [{prop1, prop2}, {prop1, prop2}]. | | processor | Custom properties supported | You can define any properties needed for data processing operations. | ## [](#parameter-mapping)Map parameters to component fields When an AI client calls your tool, the `arguments` object becomes the message body. You can access these arguments using [Bloblang](../../../../develop/connect/guides/bloblang/about/), but the syntax depends on where you’re using it: - **Inside Bloblang contexts** (mutation, mapping, args\_mapping): Use `this.field_name` - **Inside string fields** (URLs, topics, headers): Use interpolation `${! json("field_name") }` ### [](#in-bloblang-contexts)In Bloblang contexts Use `this` to access message fields directly in processors like `mutation`, `mapping`, or in `args_mapping` fields: ```yaml mutation: | root.search_query = this.query.lowercase() root.max_results = this.limit.or(10) ``` ```yaml sql_select: table: orders where: customer_id = ? AND status = ? args_mapping: '[this.customer_id, this.status.or("active")]' ``` ### [](#in-string-fields-interpolation)In string fields (interpolation) Use `${! …​ }` interpolation to embed Bloblang expressions inside string values like URLs or topic names: ```yaml http: url: 'https://api.weather.com/v1/current?city=${! json("city") }&units=${! json("units").or("metric") }' ``` ```yaml redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] (1) topic: '${! json("topic_name") }' (2) ``` | 1 | ${VAR} without ! is environment variable substitution, not Bloblang. | | --- | --- | | 2 | ${! …​ } with ! is Bloblang interpolation that accesses message data. | > 💡 **TIP** > > For more on Bloblang syntax, see [Bloblang](../../../../develop/connect/guides/bloblang/about/). For interpolation details, see [Interpolation](../../../../develop/connect/configuration/interpolation/). ### [](#provide-defaults-for-optional-parameters)Provide defaults for optional parameters Use `.or(default)` to handle missing optional parameters: ```yaml mutation: | root.city = this.city # Required - will error if missing root.units = this.units.or("metric") # Optional with default root.limit = this.limit.or(10).number() # Optional, converted to number ``` Declare which parameters are required in your `meta.mcp.properties`: ```yaml properties: - name: city type: string description: "City name to look up" required: true - name: units type: string description: "Temperature units: 'metric' or 'imperial' (default: metric)" required: false - name: limit type: number description: "Max results (default: 10)" required: false ``` ## [](#secrets)Use the Secrets Store Never hardcode credentials, API keys, or connection strings in your tool configurations. Use the [Secrets Store](../../../../develop/connect/configuration/secret-management/) to securely manage sensitive values. Reference secrets using `${secrets.SECRET_NAME}` syntax: ```yaml http: url: "https://api.example.com/data" headers: Authorization: "Bearer ${secrets.API_TOKEN}" sql_select: driver: postgres dsn: "${secrets.DATABASE_URL}" table: customers ``` When you add secret references to your tool configuration, the Cloud Console automatically detects them and provides an interface to create the required secrets. ### [](#secrets-best-practices)Secrets best practices - Use uppercase snake\_case for secret names (for example, `DATAPLANE_TOKEN`, `API_KEY`). - Rotate secrets periodically. - Follow the principle of least privilege. Only request the scopes and roles your tool actually needs. See [Secrets management](../best-practices/#secrets) for more guidance. ## [](#test-the-tool)Test the tool 1. Click **Lint** to validate your configuration. 2. Deploy the MCP server. 3. Use the **MCP Inspector** tab to test tool calls: - Select the tool from the list - Enter test parameter values - Click **Run Tool** to execute - Review the response 4. Connect an AI client and verify the tool appears: ```bash rpk cloud mcp proxy \ --cluster-id \ --mcp-server-id \ --install --client claude-code ``` 5. Test end-to-end with realistic prompts to verify the AI client uses your tool correctly. ## [](#complete-example)Complete example Here’s a complete tool that wraps the `http` processor to fetch weather data: ```yaml label: get-weather processors: # Validate and sanitize input - label: validate_city mutation: | root.city = if this.city.or("").trim() == "" { throw("city is required") } else { this.city.trim().lowercase().re_replace_all("[^a-z\\s\\-]", "") } root.units = this.units.or("metric") # Fetch weather data - label: fetch_weather try: - http: url: 'https://wttr.in/${! json("city") }?format=j1' verb: GET timeout: 10s - mutation: | root.weather = { "location": this.nearest_area.0.areaName.0.value, "country": this.nearest_area.0.country.0.value, "temperature_c": this.current_condition.0.temp_C, "temperature_f": this.current_condition.0.temp_F, "condition": this.current_condition.0.weatherDesc.0.value, "humidity": this.current_condition.0.humidity, "wind_kph": this.current_condition.0.windspeedKmph } # Handle errors gracefully - label: handle_errors catch: - mutation: | root.error = true root.message = "Failed to fetch weather: " + error() meta: mcp: enabled: true description: "Get current weather for a city. Returns temperature, conditions, humidity, and wind speed." properties: - name: city type: string description: "City name (for example, 'London', 'New York', 'Tokyo')" required: true - name: units type: string description: "Temperature units: 'metric' or 'imperial' (default: metric)" required: false ``` ## [](#next-steps)Next steps - [AI Agent Quickstart](../../../agents/quickstart/) - [MCP Tool Design](../best-practices/) - [MCP Tool Patterns](../tool-patterns/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) - [Components Catalog](../../../../develop/connect/components/about/) --- # Page 39: Manage Remote MCP Servers **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/manage-servers.md --- # Manage Remote MCP Servers --- title: Manage Remote MCP Servers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/manage-servers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/manage-servers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/manage-servers.adoc description: Learn how to edit, stop, start, and delete MCP servers in Redpanda Cloud. page-topic-type: how-to personas: platform_admin, agent_developer learning-objective-1: Edit MCP server configurations learning-objective-2: Stop and start MCP servers learning-objective-3: Delete MCP servers safely page-git-created-date: "2025-12-17" page-git-modified-date: "2026-02-18" --- After creating an MCP server, you can manage its lifecycle, including editing configurations, pausing to save costs, and permanent deletion. After reading this page, you will be able to: - Edit MCP server configurations - Stop and start MCP servers - Delete MCP servers safely ## [](#prerequisites)Prerequisites You must have an existing MCP server. If you do not have one, see [Remote MCP Server Quickstart](../quickstart/). ## [](#edit-an-mcp-server)Edit an MCP server You can update the configuration, resources, or metadata of an MCP server at any time. ### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Agentic AI** > **Remote MCP**. 2. Find the MCP server you want to edit and click its name. 3. Click **Edit configuration**. 4. Make your changes. 5. Click **Save** to apply changes. > 📝 **NOTE** > > Editing a running MCP server may cause a brief interruption. Review changes before deploying to production. ### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`GET /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_getmcpserver) to retrieve the current configuration. 3. Make a request to [`PATCH /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_updatemcpserver) to update the configuration: ```bash curl -X PATCH "https:///v1/redpanda-connect/mcp-servers/?update_mask=display_name,description" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "mcp_server": { "display_name": "updated-name", "description": "Updated description" } }' ``` ## [](#stop-an-mcp-server)Stop an MCP server Stopping a server pauses all tool execution and releases compute resources, but preserves configuration and state. This is useful for temporarily disabling a server to save costs while retaining the ability to restart it later. ### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Agentic AI** > **Remote MCP**. 2. Find the server you want to stop. 3. Click the three dots and select **Stop**. 4. Confirm the action. ### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`POST /v1/redpanda-connect/mcp-servers/{mcp_server_id}:stop`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_stopmcpserver): ```bash curl -X POST "https:///v1/redpanda-connect/mcp-servers/:stop" \ -H "Authorization: Bearer " ``` While stopped, the server cannot respond to MCP requests. Start it to restore service. ## [](#start-a-stopped-mcp-server)Start a stopped MCP server Resume a stopped server to restore its functionality. ### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Agentic AI** > **Remote MCP**. 2. Find the stopped server. 3. Click the three dots and select **Start**. 4. Wait for the status to show **Running** before reconnecting clients. ### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`POST /v1/redpanda-connect/mcp-servers/{mcp_server_id}:start`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_startmcpserver): ```bash curl -X POST "https:///v1/redpanda-connect/mcp-servers/:start" \ -H "Authorization: Bearer " ``` 3. Wait for the status to show **Running** before reconnecting clients. ## [](#delete-an-mcp-server)Delete an MCP server Deleting a server permanently removes it. You cannot undo this action. Redpanda removes all configuration, tools, and associated resources. ### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Agentic AI** > **Remote MCP**. 2. Find the server you want to delete. 3. Click the three dots and select **Delete**. 4. Confirm the deletion when prompted. ### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`DELETE /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_deletemcpserver): ```bash curl -X DELETE "https:///v1/redpanda-connect/mcp-servers/" \ -H "Authorization: Bearer " ``` > ⚠️ **WARNING** > > Deletion is immediate and permanent. Make sure you have backed up any important configuration before deleting an MCP server. ## [](#next-steps)Next steps - [Scale Remote MCP Server Resources](../scale-resources/) - [Monitor MCP Server Activity](../monitor-mcp-servers/) - [MCP Tool Design](../best-practices/) --- # Page 40: Monitor MCP Server Activity **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/monitor-mcp-servers.md --- # Monitor MCP Server Activity --- title: Monitor MCP Server Activity latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/monitor-mcp-servers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/monitor-mcp-servers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/monitor-mcp-servers.adoc description: Consume traces, track tool invocations, measure performance, and debug failures in MCP servers. page-topic-type: how-to personas: platform_admin, agent_developer, data_engineer learning-objective-1: Consume traces from the redpanda.otel_traces topic learning-objective-2: Track tool invocations and measure performance learning-objective-3: Debug tool failures using trace data page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-18" --- Monitor MCP server activity using OpenTelemetry traces emitted to the `redpanda.otel_traces` [topic](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#topic). After reading this page, you will be able to: - Consume traces from the redpanda.otel\_traces topic - Track tool invocations and measure performance - Debug tool failures using trace data For conceptual background on traces, spans, and the trace data structure, see [Transcripts and AI Observability](../../../observability/concepts/). ## [](#prerequisites)Prerequisites You must have an existing MCP server. If you do not have one, see [Remote MCP Server Quickstart](../quickstart/). ## [](#view-transcripts-in-the-cloud-console)View transcripts in the Cloud Console ### [](#navigate-the-transcripts-view)Navigate the transcripts view 1. Click **Transcripts**. 2. Select a recent transcript from your MCP server tool invocations. The transcripts view displays: - **Timeline**: Visual history of recent executions with success/error indicators - **Trace list**: Hierarchical view of traces and spans - **Summary panel**: Detailed metrics when you select a transcript #### [](#timeline-visualization)Timeline visualization The timeline shows execution patterns over time: - Green bars: Successful executions - Red bars: Failed executions with errors - Gray bars: Incomplete traces or traces still loading - Time range: Displays the last few hours by default Use the timeline to spot patterns like error clusters, performance degradation over time, or gaps indicating downtime. #### [](#trace-hierarchy)Trace hierarchy The trace list shows nested operations with visual duration bars indicating how long each operation took. Click the expand arrows (▶) to drill into nested spans and see the complete execution flow. For details on span types, see [MCP server trace hierarchy](../../../observability/concepts/#mcp-server-trace-hierarchy). #### [](#summary-panel)Summary panel When you select a transcript, the summary panel shows: - Duration: Total execution time for this request - Total Spans: Number of operations in the trace - Service: The MCP server identifier ## [](#analyze-traces-programmatically)Analyze traces programmatically MCP servers emit OpenTelemetry traces to the `redpanda.otel_traces` topic. Consume these traces to build custom monitoring, track tool usage, and analyze performance. ### [](#consume-traces)Consume traces #### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Topics**. 2. Select `redpanda.otel_traces`. 3. Click **Messages** to view recent traces. 4. Use filters to search for specific trace IDs, span names, or time ranges. #### rpk Consume the most recent traces: ```bash rpk topic consume redpanda.otel_traces --offset end -n 10 ``` Filter for specific MCP server activity by examining the span attributes. #### Data Plane API Use the [Data Plane API](/api/doc/cloud-dataplane/) to programmatically consume traces and integrate with your monitoring pipeline. ### [](#track-tool-invocations)Track tool invocations Monitor which tools are being called and how often by filtering spans where `instrumentationScope.name` is `rpcn-mcp`. The `name` field shows which tool was invoked. Example: Find all invocations of a specific tool: ```bash rpk topic consume redpanda.otel_traces --offset start \ | jq '.value | select(.instrumentationScope.name == "rpcn-mcp" and .name == "weather")' ``` ### [](#measure-performance)Measure performance Calculate tool execution time using span timestamps: ```bash Duration (ms) = (endTimeUnixNano - startTimeUnixNano) / 1000000 ``` Track percentiles (p50, p95, p99) to identify performance issues and set alerts for durations exceeding acceptable thresholds. ### [](#debug-failures)Debug failures Filter for error spans where `status.code` is `2`: ```bash rpk topic consume redpanda.otel_traces --offset start \ | jq '.value | select(.status.code == 2)' ``` Check `status.message` for error details and the `events` array for error events with timestamps. Use `traceId` to correlate related spans across the distributed system. ## [](#next-steps)Next steps - [Transcripts and AI Observability](../../../observability/concepts/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) - [Manage Remote MCP Servers](../manage-servers/) --- # Page 41: Remote MCP Server Overview **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/overview.md --- # Remote MCP Server Overview --- title: Remote MCP Server Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/overview.adoc description: Build and host MCP tools that connect AI agents to your business systems without writing glue code, using Redpanda's proven connectors. page-topic-type: overview personas: evaluator, agent_developer learning-objective-1: Explain what a Remote MCP server is and how tools differ from pipelines learning-objective-2: Identify use cases where Remote MCP provides business value learning-objective-3: Describe how MCP tools expose Redpanda Connect components to AI page-git-created-date: "2025-10-21" page-git-modified-date: "2026-02-18" --- Remote MCP lets you give AI agents access to your databases, queues, CRMs, and other systems of record without writing custom integration code. This page introduces Remote MCP servers and helps you decide if they’re right for your use case. After reading this page, you will be able to: - Explain what a Remote MCP server is and how tools differ from pipelines - Identify use cases where Remote MCP provides business value - Describe how MCP tools expose Redpanda Connect components to AI ## [](#what-is-mcp)What is MCP? MCP (Model Context Protocol) is an open standard that lets AI agents use tools. Think of it like a universal adapter: instead of building custom integrations for every AI system, you define your tools once using MCP, and any MCP-compatible AI client can discover and use them. Without MCP, connecting AI to your business systems requires custom API code, authentication handling, and response formatting for each AI platform. With MCP, you describe what a tool does and what inputs it needs, and the protocol handles the rest. ## [](#what-is-remote-mcp)What is Remote MCP? Remote MCP lets you build and host MCP servers in your Redpanda Cloud clusters. Your tools run next to your data, managed by Redpanda, so you get: - **Always-on availability:** No local process to run. Your tools are hosted and managed by Redpanda Cloud. - **Proximity to data:** Tools execute next to your cluster for lower latency and simpler networking. - **Secure secrets management:** Use the [Secrets Store](../../../../develop/connect/configuration/secret-management/) instead of hardcoding credentials. - **Fast iteration:** Define tools as YAML, deploy, and your AI agents can use them immediately. ## [](#mcp-tools-are-not-pipelines)MCP tools are not pipelines If you already use Redpanda Connect, you might wonder how MCP tools differ from pipelines. A pipeline is a continuous data flow: data streams from an input, through processors, to an output. The pipeline runs indefinitely, processing many messages over time. An MCP tool is different. It’s a single component that executes on demand when called by an AI client. The tool starts, runs, and completes for each invocation. There is no persistent state between calls. Think of it like calling a function rather than running a service. This request/response pattern is what makes MCP tools useful for AI agents: the agent asks a question, the tool runs, and it returns an answer. ## [](#use-cases)Use cases | Category | Example prompts | | --- | --- | | Operational monitoring | Check partition lag for customer-events topicShow me the top 10 producers by message volume todayGet schema registry health status | | Data enrichment and analysis | Fetch user profile data and recent orders for customer ID 12345Get real-time stock prices for symbols in my portfolio topicAnalyze sentiment of latest product reviews | | Team productivity | Deploy my microservice to the staging environmentGenerate load test data for the payments serviceCreate a summary dashboard of this week’s incident reports | | Business intelligence | What are the trending products in the last 24 hours?Show revenue impact of the latest feature deploymentGet customer satisfaction scores from support tickets | ## [](#how-it-works)How it works Remote MCP servers sit between AI clients and your data: 1. Your AI agent connects to your MCP server using `rpk cloud mcp proxy` or direct authentication. 2. A user asks their AI agent something like "What’s the weather in London?" 3. The server finds the matching tool and runs your Redpanda Connect configuration. 4. Your configuration fetches data, transforms it, and returns a structured response. 5. The AI agent gets the data and can use it to answer the user. ### [](#what-a-tool-looks-like)What a tool looks like A tool is a YAML configuration with two parts: the logic (what the tool does) and the metadata (how AI understands it). Here’s a minimal example that returns weather data: ```yaml http: url: "https://wttr.in/${! this.city }?format=j1" verb: GET meta: mcp: enabled: true name: get_weather description: "Get current weather for a city" properties: - name: city type: string description: "City name" required: true ``` When an AI client asks about weather, it calls this tool with the city name. The tool fetches data from the weather API and returns it. ## [](#mcp-specification-support)MCP specification support MCP servers implement the open MCP protocol for tool exposure. Only the tool concept from the MCP server specification is supported. Features such as MCP resources and prompts are not yet available. For full details, see the [official MCP server specification](https://modelcontextprotocol.io/specification/2025-06-18/server). ## [](#next-steps)Next steps - [Remote MCP Server Quickstart](../quickstart/) - [AI Agents Overview](../../../agents/overview/) - [MCP Tool Execution and Components](../concepts/) - [Create an MCP Tool](../create-tool/) - [Model Context Protocol documentation](https://modelcontextprotocol.io/) --- # Page 42: Remote MCP Server Quickstart **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/quickstart.md --- # Remote MCP Server Quickstart --- title: Remote MCP Server Quickstart latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/quickstart page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/quickstart.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/quickstart.adoc description: Build and deploy your first MCP tools to connect AI agents to your Redpanda data without writing custom integration code. page-topic-type: tutorial personas: agent_developer, streaming_developer, evaluator learning-objective-1: Create an MCP server in Redpanda Cloud learning-objective-2: Define tools that generate and publish data learning-objective-3: Connect Claude Code to your MCP server and invoke tools page-git-created-date: "2025-10-21" page-git-modified-date: "2026-02-18" --- This quickstart builds an MCP server in Redpanda Cloud that exposes tools for generating and publishing event data. You’ll create two tools, then ask Claude Code to use them through natural language: > Generate 10 user events and publish them to the events topic. By completing this quickstart, you will be able to: - Create an MCP server in Redpanda Cloud - Define tools that generate and publish data - Connect Claude Code to your MCP server and invoke tools > 💡 **TIP** > > For background on how MCP tools work and when to use each component type, see [MCP Tool Execution and Components](../concepts/). ## [](#prerequisites)Prerequisites - A Redpanda Cloud cluster with **Remote MCP** enabled. - Access to the [Secrets Store](../../../../develop/connect/configuration/secret-management/) for storing credentials. - At least version 25.2.5 of the [Redpanda CLI (`rpk`)](../../../../manage/rpk/rpk-install/) installed on your computer. - [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup) installed. ## [](#prepare-your-cluster)Prepare your cluster Before creating the MCP server, you need to set up a topic for event publishing. ### rpk 1. Log in to your Redpanda Cloud account: ```bash rpk cloud login ``` This opens a browser window to authenticate. The token is saved locally inside your `rpk` configuration file. It is valid for 4 hours. You can refresh it by running `rpk cloud login` again. 2. Create a topic called `events` for storing user event data: ```bash rpk topic create events --partitions 3 --replicas 3 ``` 3. Create a user called `mcp` with a strong password: ```bash rpk acl user create mcp --password ``` Save the password securely. You need it later when configuring the MCP server. 4. Grant the `mcp` user permissions to produce and consume from the `events` topic: ```bash rpk acl create --allow-principal User:mcp --operation all --topic events ``` ### Data Plane API 1. [Authenticate to the Control Plane API](/api/doc/cloud-controlplane/authentication#topic-request-an-access-token) to get an access token. > 📝 **NOTE** > > Access tokens expire after 1 hour. To refresh, make the same authentication request again with your service account credentials. The same token works for both Control Plane and Data Plane API requests. 2. Get the Data Plane API URL for your cluster: The response includes a `dataplane_api.url` value: "id": "....", "name": "my-cluster", .... "dataplane\_api": { "url": "https://api-xyz.abc.fmc.ppd.cloud.redpanda.com" }, ... > 📝 **NOTE** > > The `dataplane_api.url` field might not be immediately available when a cluster reaches STATE\_READY. If the field is missing or null, wait a few minutes and make the request again. The Data Plane API URL is typically available within 5-10 minutes after cluster creation completes. 3. Make a request to [`POST /v1/topics`](/api/doc/cloud-dataplane/operation/operation-topicservice_createtopic) to create the topic: ```bash curl -X POST "https:///v1/topics" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "name": "events", "partition_count": 3, "replication_factor": 3 }' ``` 4. Make a request to [`POST /v1/users`](/api/doc/cloud-dataplane/operation/operation-userservice_createuser) to create a user called `mcp`: ```bash curl -X POST "https:///v1/users" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "name": "mcp", "password": "", "mechanism": "SASL_MECHANISM_SCRAM_SHA_256" }' ``` Save the password securely. You need it later when configuring the MCP server. 5. Make a request to [`POST /v1/acls`](/api/doc/cloud-dataplane/operation/operation-aclservice_createacl) to grant the `mcp` user permissions to produce and consume from the `events` topic: ```bash curl -X POST "https:///v1/acls" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "resource_type": "RESOURCE_TYPE_TOPIC", "resource_name": "events", "resource_pattern_type": "RESOURCE_PATTERN_TYPE_LITERAL", "principal": "User:mcp", "host": "*", "operation": "OPERATION_ALL", "permission_type": "PERMISSION_TYPE_ALLOW" }' ``` #### BYOC or Dedicated Make a request to the [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) endpoint: ```bash curl "https://api.redpanda.com/v1/clusters/" \ -H "Authorization: Bearer " ``` #### Serverless Make a request to the [`GET /v1/serverless/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_getserverlesscluster) endpoint: ```bash curl "https://api.redpanda.com/v1/serverless/clusters/" \ -H "Authorization: Bearer " ``` ## [](#create-an-mcp-server-in-redpanda-cloud)Create an MCP Server in Redpanda Cloud ### Cloud Console 1. Log in to the [Redpanda Cloud Console](https://cloud.redpanda.com/). 2. Navigate to **Agentic AI** > **Remote MCP**. This page shows a list of existing servers. 3. Click **Create new MCP Server**. In **Server Metadata**, configure the basic information and resources: - **Display Name**: A human-friendly name such as `event-data-generator`. This name is shown in the Redpanda Cloud Console. It is not the name of the MCP server itself. - **Description**: Explain what the server does. For example, `Generates fake user event data and publishes it to Redpanda topics`. - **Tags**: Add key/value tags such as `owner=platform` or `env=demo`. The tag names `service_account_id` and `secret_id` are reserved and cannot be used. - **Resources**: Choose a size (XSmall / Small / Medium / Large / XLarge). Larger sizes allow more concurrent requests and faster processing, but cost more. You can change this later. - **Service Account**: A service account is automatically created for authenticating the MCP server to your cluster. The name is pre-filled but you can customize it. For details about default permissions and how to manage service accounts, see [Service account authorization](../concepts/#service-account-authorization). 4. Click **Next** to define tools. Tools define the actions your MCP server can perform. In this example, you create two tools: one for generating user event data and another for publishing that data to Redpanda. 5. From the **Template** dropdown, select **Generate Input**. The template populates the configuration with YAML for the tool definition. 6. Click **Add Tool** to create a second tool. 7. From the **Template** dropdown, select **Redpanda Output**. The template populates the configuration for publishing to Redpanda and a section for adding the required secrets is displayed. 8. Enter the values for the `mcp` user’s credentials in the **Add Required Secrets** section. 9. Click **Lint** to check the configuration. You should see no errors. 10. Click **Create MCP Server** to deploy the server. It may take a few seconds to start. The status changes from **Starting** to **Running** when it’s ready. 11. Open the **MCP Inspector** tab to test the tools: - For the `generate_input` tool, click **Run Tool** to generate sample event data. - For the `redpanda_output` tool, enter some sample event data such as `user_id=user123`, `event_type=login`, and `timestamp=2025-10-21T10:30:00Z` then click **Run Tool** to publish it to the `events` topic. ### Data Plane API 1. Create a secret for the username: ```bash curl -X POST "https:///v1/secrets" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "id": "MCP_USERNAME", "scopes": ["SCOPE_MCP_SERVER"], "secret_data": "bWNw" }' ``` The `secret_data` value `bWNw` is the base64-encoded string `mcp`. Create a secret for the password: ```bash curl -X POST "https:///v1/secrets" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "id": "MCP_PASSWORD", "scopes": ["SCOPE_MCP_SERVER"], "secret_data": "" }' ``` Replace `` with your password encoded in base64. You can encode it with: `echo -n '' | base64`. 2. Using the Data Plane API URL from the previous section, make a request to [`POST /v1/redpanda-connect/mcp-servers`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_createmcpserver) to create the MCP server: ```bash curl -X POST "https:///v1/redpanda-connect/mcp-servers" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "display_name": "event-data-generator", "description": "Generates fake user event data and publishes it to Redpanda topics", "tags": { "owner": "platform", "env": "demo" }, "resources": { "memory_shares": "400M", "cpu_shares": "100m" }, "tools": { "generate_input": { "component_type": "COMPONENT_TYPE_INPUT", "config_yaml": "generate:\n interval: 1s\n mapping: |\n root.user_id = \"user\" + random_int(min: 1, max: 1000).string()\n root.event_type = [\"login\", \"logout\", \"purchase\", \"view\"].index(random_int(max: 3))\n root.timestamp = now().ts_format(\"2006-01-02T15:04:05Z07:00\")" }, "redpanda_output": { "component_type": "COMPONENT_TYPE_OUTPUT", "config_yaml": "redpanda:\n seed_brokers: [ \"${REDPANDA_BROKERS}\" ]\n topic: events\n tls:\n enabled: true\n sasl:\n - mechanism: SCRAM-SHA-256\n username: \"${secrets.MCP_USERNAME}\"\n password: \"${secrets.MCP_PASSWORD}\"\n" } } }' ``` The response includes the MCP server ID. Wait for the status to show **Running** before testing the tools. 3. To test the tools, use the [`GET /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_getmcpserver) endpoint to verify the server is running. ## [](#connect-an-ai-client)Connect an AI client Now that your MCP server is running with two tools available, you’ll connect Claude Code so it can discover and use them. You can connect any MCP-compatible AI client to your MCP server. When you connect Claude Code: 1. Claude automatically discovers your `generate_input` and `redpanda_output` tools. 2. You can ask Claude in natural language to perform tasks using these tools. 3. Claude decides which tools to call and in what order based on your request. 4. The Redpanda CLI acts as a secure proxy, forwarding Claude’s tool requests to your MCP server in the cloud. This example uses Claude Code, but the same pattern works with any MCP-compatible client. 1. Log in to your Redpanda Cloud account: ```bash rpk cloud login ``` This opens a browser window to authenticate. The token is saved locally inside your `rpk` configuration file. It is valid for 4 hours. You can refresh it by running `rpk cloud login` again. 2. Open the **Connection** tab in Redpanda Cloud to get connection details and run the `rpk` command for Claude Code. For BYOC and Dedicated clusters, use: ```bash rpk cloud mcp proxy \ --cluster-id \ --mcp-server-id \ --install --client claude-code ``` For Serverless clusters, use: ```bash rpk cloud mcp proxy \ --serverless-cluster-id \ --mcp-server-id \ --install --client claude-code ``` 3. Restart Claude Code and invoke your tool. ```bash claude ``` 4. Ask Claude Code to use your tools. Try these example requests: - "Generate 10 user events and then publish them to the events topic." - "Create sample login events for users user001, user002, and user003, then publish them to Redpanda." - "Generate purchase events with metadata and publish them to the events topic." Watch what happens: - Claude analyzes your natural language request - Claude identifies which tools to use (`generate_input` to create data, `redpanda_output` to publish) - Claude calls your tools via the MCP server running in Redpanda Cloud - You see the tool execution results in your Claude Code session You may need to respond to prompts to grant Claude permission to call the tools. 5. Verify the events were published by consuming from the `events` topic: ```bash rpk topic consume events --num 10 ``` You should see the generated event data in JSON format, confirming that Claude successfully used your custom tools to generate data and publish it to Redpanda. ## [](#troubleshoot)Troubleshoot If you encounter issues during this quickstart: - **MCP server not starting**: Check the **Logs** tab and verify your YAML syntax by clicking **Lint**. - **Connection issues**: Verify you’re logged in with `rpk cloud login` and that your server status shows **Running**. - **Publishing failures**: Verify the `events` topic exists with `rpk topic list`. For detailed solutions, see [Troubleshoot Remote MCP Servers](../troubleshooting/). ## [](#next-steps)Next steps You’ve deployed an MCP server and connected Claude Code to your Redpanda cluster. Here’s where to go next: - [AI Agent Quickstart](../../../agents/quickstart/) - [MCP Tool Execution and Components](../concepts/) - [Create an MCP Tool](../create-tool/) - [MCP Tool Design](../best-practices/) - [MCP Tool Patterns](../tool-patterns/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) - [Manage Remote MCP Servers](../manage-servers/) --- # Page 43: Scale Remote MCP Server Resources **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/scale-resources.md --- # Scale Remote MCP Server Resources --- title: Scale Remote MCP Server Resources latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/scale-resources page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/scale-resources.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/scale-resources.adoc description: Learn how to scale MCP server resources up or down to match workload demands and optimize costs. page-topic-type: how-to personas: platform_admin learning-objective-1: Scale MCP server resources up or down learning-objective-2: Choose appropriate resource sizes for workloads learning-objective-3: Optimize costs through resource management page-git-created-date: "2025-12-17" page-git-modified-date: "2026-02-18" --- After creating an MCP server, you can scale its resources up or down to match your workload needs. Resource allocation affects your [billing costs](../../../../billing/billing/#remote-mcp-billing-metrics), which are charged per compute unit hour. After reading this page, you will be able to: - Scale MCP server resources up or down - Choose appropriate resource sizes for workloads - Optimize costs through resource management ## [](#prerequisites)Prerequisites You must have an existing MCP server. If you do not have one, see [Remote MCP Server Quickstart](../quickstart/). ## [](#scale-resources)Scale resources ### Cloud Console 1. In the Redpanda Cloud Console, navigate to **Agentic AI** > **Remote MCP**. 2. Find the MCP server you want to scale and click its name. 3. Click **Edit configuration**. 4. Under **Resources**, select a new size: - **XSmall**: Lowest cost, suitable for development or light workloads - **Small**: Light production workloads - **Medium**: Standard production workloads - **Large**: High-throughput workloads - **XLarge**: Highest performance for demanding workloads 5. Click **Save** to apply the new resource allocation. Redpanda makes the specified resources available immediately. ### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`GET /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_getmcpserver) to retrieve the current configuration. 3. Make a request to [`PATCH /v1/redpanda-connect/mcp-servers/{mcp_server_id}`](/api/doc/cloud-dataplane/operation/operation-mcpserverservice_updatemcpserver) to update the resources: ```bash curl -X PATCH "https:///v1/redpanda-connect/mcp-servers/?update_mask=resources" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "mcp_server": { "resources": { "memory_shares": "2Gi", "cpu_shares": "1000m" } } }' ``` Redpanda makes the updated resources available immediately. > 💡 **TIP** > > Monitor your MCP server’s performance and adjust resources as needed. You can scale up during peak usage periods and scale down during quieter times to optimize costs. For compute unit definitions and pricing, see [MCP billing metrics](../../../../billing/billing/#remote-mcp-billing-metrics). --- # Page 44: MCP Tool Patterns **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/tool-patterns.md --- # MCP Tool Patterns --- title: MCP Tool Patterns latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/tool-patterns page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/tool-patterns.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/tool-patterns.adoc description: Catalog of patterns for MCP server tools in Redpanda Cloud. page-topic-type: cookbook personas: agent_developer, data_engineer learning-objective-1: Find reusable patterns for common MCP tool scenarios learning-objective-2: Apply validation and error handling patterns for production robustness learning-objective-3: Format responses consistently for AI client consumption page-git-created-date: "2026-01-13" page-git-modified-date: "2026-02-18" --- When building tools, use these patterns as starting points for common scenarios. For step-by-step instructions, see [Create an MCP Tool](../create-tool/). For design guidelines, see [MCP Tool Design](../best-practices/). After reading this page, you will be able to: - Find reusable patterns for common MCP tool scenarios - Apply validation and error handling patterns for production robustness - Format responses consistently for AI client consumption ## [](#read-data)Read data Use [inputs](../../../../develop/connect/components/inputs/about/) to create tools that read from data sources or generate sample data. ### [](#data-generators)Generate test data **When to use:** Development and testing environments where you need synthetic data, load testing scenarios, or demonstrating data flows without real data sources. **Example use cases:** Mock user events, test order data, synthetic sensor readings, demo data for presentations. ```yaml label: generate_input generate: mapping: | let event_type = ["login", "logout", "purchase", "view_page", "click_button"].index(random_int(max:4)) root = { "id": uuid_v4(), "timestamp": now().ts_format("2006-01-02T15:04:05.000Z"), "user_id": random_int(min:1, max:10000), "event_type": $event_type, "data": { "session_id": ksuid(), "ip_address": "192.168.%v.%v".format(random_int(max:255), random_int(min:1, max:254)), "user_agent": ["Chrome", "Firefox", "Safari", "Edge"].index(random_int(max:3)), "amount": if $event_type == "purchase" { random_int(min:10, max:500) } else { null } } } meta: mcp: enabled: true description: "Generate an example user event message with realistic data" properties: [] ``` See also: [`generate` input component](../../../../develop/connect/components/inputs/generate/) ### [](#consume-from-redpanda)Consume from Redpanda topics **When to use:** Processing events from Redpanda topics, building event-driven AI agents, consuming audit logs, or subscribing to data change streams. **Example use cases:** Monitor order events, process user activity streams, consume IoT sensor data, react to system notifications. ```yaml redpanda: seed_brokers: [ "${REDPANDA_BROKERS}" ] topics: [ "user-events" ] consumer_group: "mcp-event-processor" start_from_oldest: true tls: enabled: true sasl: - mechanism: "${REDPANDA_SASL_MECHANISM}" username: "${REDPANDA_SASL_USERNAME}" password: "${REDPANDA_SASL_PASSWORD}" ``` See also: [`redpanda` input](../../../../develop/connect/components/inputs/redpanda/) ### [](#stream-processing)Process streaming data **When to use:** Real-time analytics, windowed aggregations, computing metrics over time, or building streaming dashboards. **Example use cases:** Calculate rolling averages, count events per time window, detect anomalies in streams, aggregate metrics. ```yaml redpanda: seed_brokers: [ "${REDPANDA_BROKERS}" ] topics: [ "sensor-readings" ] consumer_group: "analytics-processor" tls: enabled: true sasl: - mechanism: "${REDPANDA_SASL_MECHANISM}" username: "${REDPANDA_SASL_USERNAME}" password: "${REDPANDA_SASL_PASSWORD}" processors: - mapping: | root.sensor_id = this.sensor_id root.avg_temperature = this.readings.map_each(r -> r.temperature).sum() / this.readings.length() root.max_temperature = this.readings.map_each(r -> r.temperature).max() root.reading_count = this.readings.length() root.window_end = now() ``` See also: [`redpanda` input](../../../../develop/connect/components/inputs/redpanda/) ## [](#call-external-services)Call external services Use [processors](../../../../develop/connect/components/processors/about/) to fetch data from external APIs, databases, or AI services. ### [](#external-api-calls)Call REST APIs **When to use:** Integrating with third-party services, fetching real-time data, calling internal microservices, or enriching event data with external information. **Example use cases:** Fetch user profile from CRM, get product pricing from inventory API, validate addresses with geocoding service, retrieve weather data. ```yaml label: fetch-weather processors: - label: prepare_parameters mutation: | meta city_name = this.city_name - label: fetch_weather http: url: 'https://wttr.in/${! @city_name }?format=j1' verb: GET headers: Accept: "application/json" User-Agent: "redpanda-mcp-server/1.0" - label: format_response mutation: | root = { "city": @city_name, "temperature": this.current_condition.0.temp_C.number(), "feels_like": this.current_condition.0.FeelsLikeC.number(), "humidity": this.current_condition.0.humidity.number(), "pressure": this.current_condition.0.pressure.number(), "description": this.current_condition.0.weatherDesc.0.value, "wind_speed": this.current_condition.0.windspeedKmph.number(), "metadata": { "source": "wttr.in", "fetched_at": now().ts_format("2006-01-02T15:04:05.000Z") } } meta: mcp: enabled: true description: "Fetch current weather information for a specified city" properties: - name: city_name type: string description: "Name of the city to get weather information for" required: true ``` See also: [`http` processor](../../../../develop/connect/components/processors/http/), [`mutation` processor](../../../../develop/connect/components/processors/mutation/) ### [](#database-queries)Query databases **When to use:** Retrieving customer records, querying analytics data, looking up configuration values, or joining streaming data with dimensional data from data warehouses. **Example use cases:** Fetch customer details from PostgreSQL, query sales data from BigQuery, retrieve product catalog from MongoDB, look up reference data. ```yaml label: gcp_bigquery_select_processor processors: - label: prepare_parameters mutation: | meta customer_id = this.customer_id.string().catch("12345") meta limit = this.limit.number().catch(10) - label: query_bigquery gcp_bigquery_select: project: my-gcp-project credentials_json: | ${secrets.BIGQUERY_CREDENTIALS} table: my_dataset.customer_orders columns: - "order_id" - "customer_id" - "order_date" - "total_amount" - "status" where: customer_id = ? AND order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) suffix: "ORDER BY order_date DESC LIMIT ?" args_mapping: root = [ @customer_id, @limit ] - label: format_response mutation: | root = { "orders": this, "metadata": { "source": "BigQuery", "customer_id": @customer_id, "fetched_at": now().ts_format("2006-01-02T15:04:05.000Z") } } meta: mcp: enabled: true description: "Query customer orders from BigQuery" properties: - name: customer_id type: string description: "Customer ID to filter orders" required: true - name: limit type: number description: "Maximum number of orders to return" required: false ``` See also: [`gcp_bigquery_select` processor](../../../../develop/connect/components/processors/gcp_bigquery_select/), [`sql_select` processor](../../../../develop/connect/components/processors/sql_select/) ### [](#jira-queries)Query Jira issues **When to use:** Fetching tickets by status, checking assignments, finding recent issues, or building AI agents that interact with project management data. **Example use cases:** Get open bugs for a sprint, find issues assigned to a user, list recently updated tickets, search by custom fields. > 📝 **NOTE** > > The `jira` processor is available on Dedicated and BYOC clusters. ```yaml label: search-jira processors: - generate: count: 1 mapping: | root.jql = this.jql root.maxResults = this.max_results.or(50) root.fields = ["key", "summary", "status", "assignee", "priority"] - jira: base_url: "${secrets.JIRA_BASE_URL}" username: "${secrets.JIRA_USERNAME}" api_token: "${secrets.JIRA_API_TOKEN}" meta: mcp: enabled: true description: "Search Jira issues using JQL. Returns matching issues with key, summary, status, assignee, and priority." properties: - name: jql type: string description: "JQL query (for example, 'project = DOC AND status = Open')" required: true - name: max_results type: number description: "Maximum issues to return (default: 50)" required: false ``` For more patterns including pagination, custom fields, and creating issues via the HTTP processor, see [Work with Jira Issues](../../../../develop/connect/cookbooks/jira/). ### [](#ai-llm-integration)Integrate with AI/LLM services **When to use:** Generating embeddings for semantic search, calling LLM APIs for text generation, building RAG pipelines, or analyzing sentiment. **Example use cases:** Generate embeddings for documents, classify customer feedback, summarize long text, extract entities, answer questions with context. #### [](#openai-chat-completion)OpenAI chat completion ```yaml openai_chat_completion: api_key: "${secrets.OPENAI_API_KEY}" model: "gpt-5.2" prompt: | Analyze this customer feedback and provide: 1. Sentiment (positive/negative/neutral) 2. Key themes 3. Actionable insights Feedback: ${! json(feedback_text) } max_tokens: 500 ``` See also: [`openai_chat_completion`](../../../../develop/connect/components/processors/openai_chat_completion/), [`openai_embeddings`](../../../../develop/connect/components/processors/openai_embeddings/) #### [](#generate-embeddings)Generate embeddings ```yaml openai_embeddings: api_key: "${secrets.OPENAI_API_KEY}" model: "text-embedding-3-small" text: ${! json("content") } ``` See also: [`cohere_embeddings`](../../../../develop/connect/components/processors/cohere_embeddings/), [`gcp_vertex_ai_embeddings`](../../../../develop/connect/components/processors/gcp_vertex_ai_embeddings/) ## [](#write-data)Write data Use [outputs](../../../../develop/connect/components/outputs/about/) to write data to Redpanda topics or cache stores. ### [](#publish-to-redpanda)Publish to Redpanda topics **When to use:** Publishing events to Redpanda for consumption by other services, creating event sourcing patterns, building audit trails, or triggering downstream workflows. **Example use cases:** Publish order confirmations, emit audit events, trigger notifications, create event-driven workflows. ```yaml label: redpanda_output redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: ${! this.topic_name.string().catch("default-topic") } timeout: 30s tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: ${secrets.REDPANDA_USERNAME} password: ${secrets.REDPANDA_PASSWORD} meta: mcp: enabled: true description: Publishes a message to a specified Redpanda topic properties: - name: message type: string description: The message content to publish required: true - name: topic_name type: string description: The Redpanda topic to publish to required: true ``` See also: [`redpanda` output](../../../../develop/connect/components/outputs/redpanda/) #### [](#outputs-with-processors)Outputs with processors Output tools can include processors to transform data before publishing. This pattern is useful when you need to process data and save the result to a destination in a single tool. **When to use:** Processing user input with an LLM and saving the response, transforming data before publishing to a topic, enriching events before writing to external systems. ```yaml label: summarize_and_publish processors: - openai_chat_completion: api_key: "${secrets.OPENAI_API_KEY}" model: "gpt-5.2" prompt: ${! json("question") } - mapping: | root.question = this.question root.answer = this.content root.timestamp = now().ts_format("2006-01-02T15:04:05Z07:00") redpanda: seed_brokers: [ "${REDPANDA_BROKERS}" ] topic: "llm-responses" tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "${secrets.MCP_USERNAME}" password: "${secrets.MCP_PASSWORD}" meta: mcp: enabled: true description: "Process a question through an LLM and publish the response to Redpanda" properties: - name: question type: string description: "The question to send to the LLM" required: true ``` ### [](#caching)Cache data **When to use:** Reducing repeated API calls, storing lookup tables, caching database query results, or maintaining session state across tool invocations. **Example use cases:** Cache user profiles, store API rate limit counters, maintain configuration values, cache product catalogs. Redpanda-backed cache ```yaml label: redpanda_cache redpanda: seed_brokers: ["${REDPANDA_BROKERS}"] topic: "mcp-cache-topic" tls: enabled: true sasl: - mechanism: "SCRAM-SHA-512" username: "${secrets.MCP_REDPANDA_CREDENTIALS.username}" password: "${secrets.MCP_REDPANDA_CREDENTIALS.password}" meta: mcp: enabled: true description: "Redpanda-backed distributed cache using Kafka topics for persistence" ``` In-memory cache ```yaml label: memory_cache memory: default_ttl: "5m" init_values: "user:1001": '{"name": "Alice", "role": "admin"}' "user:1002": '{"name": "Bob", "role": "user"}' "config:theme": "dark" "config:language": "en" shards: 4 meta: mcp: enabled: true description: "In-memory cache for storing user data, configuration, and temporary values" ``` See also: [`memory` cache](../../../../develop/connect/components/caches/memory/), [`redpanda` output](../../../../develop/connect/components/outputs/redpanda/) ## [](#transform-data)Transform data Use Bloblang and processors to transform, validate, and route data. ### [](#data-transformation)Transform and validate **When to use:** Converting data formats, validating schemas, filtering events, enriching messages with computed fields, or normalizing data structures. **Example use cases:** Parse JSON payloads, validate required fields, add timestamps, convert units, mask sensitive data, aggregate nested objects. ```yaml - mapping: | # Parse and validate incoming data root.user_id = this.user_id.or(throw("user_id is required")) root.timestamp = now().ts_format("2006-01-02T15:04:05Z07:00") # Transform and enrich root.email_domain = this.email.split("@").index(1) root.is_premium = this.subscription_tier == "premium" # Filter sensitive data root.profile = this.profile.or({}).without("ssn", "credit_card") ``` See also: [`mapping` processor](../../../../develop/connect/components/processors/mapping/), [Bloblang guide](../../../../develop/connect/guides/bloblang/about/) ### [](#event-driven-workflows)Build event-driven workflows **When to use:** Orchestrating multi-step processes, responding to business events, implementing saga patterns, or coordinating microservices. **Example use cases:** Order fulfillment workflows, approval processes, notification cascades, data pipeline orchestration. ```yaml redpanda: seed_brokers: [ "${REDPANDA_BROKERS}" ] topics: [ "order-events" ] consumer_group: "workflow-orchestrator" tls: enabled: true sasl: - mechanism: "${REDPANDA_SASL_MECHANISM}" username: "${REDPANDA_SASL_USERNAME}" password: "${REDPANDA_SASL_PASSWORD}" processors: - switch: - check: this.event_type == "order_created" processors: - http: url: "${secrets.INVENTORY_API}/reserve" verb: POST headers: Content-Type: application/json body: '{"order_id": "${! this.order_id }", "items": ${! json("items") }}' - check: this.event_type == "payment_confirmed" processors: - http: url: "${secrets.FULFILLMENT_API}/ship" verb: POST headers: Content-Type: application/json body: '{"order_id": "${! this.order_id }"}' ``` See also: [`redpanda` input](../../../../develop/connect/components/inputs/redpanda/) ## [](#production-readiness)Production readiness Build production-ready tools with proper input validation, error handling, and response formatting. ### [](#input-validation)Validate input AI clients may send unexpected or malformed input. Validate early to return helpful error messages instead of cryptic failures from downstream components. Always validate inputs before processing. This prevents errors and provides clear feedback to the AI client. The following example shows a basic validation pattern: ```yaml - label: validate_input mutation: | let city = this.city.or("").trim() root = if $city == "" { {"error": "City name is required"} } else { {"city": $city} } ``` This validation does three things: First, `.or("")` provides an empty string default if the `city` field is missing, which prevents null errors. Then, `.trim()` removes whitespace so `" "` doesn’t pass as a valid city. Then, the `if` expression returns either an error object or the validated data. The AI client receives clear feedback either way. #### [](#essential-validation-methods)Essential validation methods Use these [Bloblang methods](../../../../develop/connect/guides/bloblang/methods/) for input validation: | Method | Purpose | Example | | --- | --- | --- | | .or(default) | Provide fallback for missing fields | this.city.or("unknown") | | .trim() | Remove leading/trailing whitespace | this.name.trim() | | .exists("field") | Check if a field is present | this.exists("email") | | .type() | Get the type of a value | this.count.type() == "number" | | .length() | Check string or array length | this.items.length() > 0 | | .re_match(pattern) | Validate against regex | this.email.re_match("^@[^@]$") | | .number() | Convert and validate as number | this.quantity.number() | #### [](#sanitize-string-inputs)Sanitize string inputs Remove potentially dangerous characters from user inputs. This is especially important when inputs will be used in URLs, database queries, or shell commands: ```yaml - label: sanitize_input mutation: | let clean_city = this.city.or("").trim().re_replace_all("[^a-zA-Z\\s-]", "") root = if $clean_city == "" { {"error": "City name contains only invalid characters"} } else { {"city": $clean_city} } ``` The regex `[^a-zA-Z\\s\\-]` matches any character that is not a letter, space, or hyphen, and `re_replace_all` removes all matches. An input like `"New York!@#$"` becomes `"New York"`. The `meta` keyword stores the result in message metadata (using `@sanitized_city`), keeping it separate from the message body until validation passes. For regex replacement syntax, see [`re_replace_all`](../../../../develop/connect/guides/bloblang/methods/#re_replace_all). #### [](#validate-numeric-ranges)Validate numeric ranges Check that numeric inputs fall within acceptable bounds: ```yaml - label: validate_quantity mutation: | let qty = this.quantity.or(0).number() root = if $qty < 1 { {"error": "Quantity must be at least 1", "received": $qty} } else if $qty > 1000 { {"error": "Quantity cannot exceed 1000", "received": $qty} } else { {"quantity": $qty, "valid": true} } ``` This example chains `.or(0)` with `.number()` to handle both missing values and type conversion. The chained `if`/`else if` checks both lower and upper bounds. Including the received value in error responses helps AI clients understand what went wrong and correct their input. #### [](#validate-multiple-fields)Validate multiple fields For forms or complex inputs, collect all errors before returning. This gives AI clients a complete list of problems to fix rather than failing on the first error: ```yaml - label: validate_order mutation: | let errors = [] let errors = if !this.exists("order_id") || this.order_id == "" { $errors.append("order_id is required") } else { $errors } let errors = if !this.exists("items") || this.items.length() == 0 { $errors.append("at least one item is required") } else { $errors } let errors = if this.exists("email") && !this.email.contains("@") { $errors.append("invalid email format") } else { $errors } root = if $errors.length() > 0 { {"valid": false, "errors": $errors} } else { {"valid": true, "order_id": this.order_id} } ``` The pattern uses variable reassignment (`let errors = …​`) to accumulate errors into an array. Each check appends to the array if validation fails, or returns the unchanged array if it passes. At the end, if any errors were collected, the response includes all of them. Notice that the email validation only runs if the field exists. This allows optional fields that, when provided, must be valid. #### [](#validate-enum-values)Validate enum values Restrict inputs to a set of allowed values. This prevents invalid states and provides helpful feedback when the input doesn’t match: ```yaml - label: validate_status mutation: | let allowed = ["pending", "approved", "rejected"] let status = this.status.or("").lowercase() root = if $status == "" { {"error": "status is required", "allowed": $allowed} } else if !$allowed.contains($status) { {"error": "invalid status", "received": $status, "allowed": $allowed} } else { {"status": $status, "valid": true} } ``` The `lowercase()` call normalizes the input so `"PENDING"`, `"Pending"`, and `"pending"` all match. When validation fails, the error response includes the list of allowed values. This helps AI clients self-correct without needing to look up valid options. For more details, see [`contains`](../../../../develop/connect/guides/bloblang/methods/#contains) and [`lowercase`](../../../../develop/connect/guides/bloblang/methods/#lowercase). #### [](#use-throw-for-validation-failures)Use throw for validation failures Use [`throw()`](../../../../develop/connect/guides/bloblang/functions/#throw) to stop processing with an error message. This is useful when validation failure should stop the entire tool execution: ```yaml - label: require_auth mutation: | root = if !this.exists("api_key") || this.api_key == "" { throw("API key is required for this operation") } else { this } ``` Unlike returning an error object, `throw()` immediately stops the processor chain and triggers any `catch` block that follows. Use `throw()` for critical validation failures where continuing would be pointless or dangerous. The `else` branch returns `this` unchanged, passing all input fields to the next processor. ### [](#error-handling)Handle errors External services fail. Databases go down. APIs return unexpected responses. Wrap risky operations in error handling so your tool returns useful error messages instead of crashing. Wrap operations that can fail in `try`/`catch` blocks. This ensures the tool returns useful errors instead of failing silently. ```yaml processors: - try: - http: url: "https://api.example.com/data" verb: GET - catch: - mutation: | root.error = true root.message = "Request failed: " + error() ``` For full configuration options, see [`try` processor](../../../../develop/connect/components/processors/try/) and [`catch` processor](../../../../develop/connect/components/processors/catch/). #### [](#return-error-details)Return error details The [`error()`](../../../../develop/connect/guides/bloblang/functions/#error) function returns the error message from the most recent failure. Use it in `catch` blocks to capture what went wrong: ```yaml - label: handle_errors catch: - mutation: | root = { "success": false, "error_message": error(), "timestamp": now().format_timestamp("2006-01-02T15:04:05Z07:00") } ``` #### [](#set-timeouts)Set timeouts Always set explicit timeouts on external calls to prevent tools from hanging indefinitely: ```yaml - label: fetch_with_timeout try: - http: url: "https://httpbin.org/get" verb: GET timeout: "10s" # Fail after 10 seconds retries: 2 # Retry twice before failing retry_period: "1s" ``` For all timeout and retry options, see [`http` processor](../../../../develop/connect/components/processors/http/). #### [](#handle-specific-error-types)Handle specific error types Create different responses based on error type: ```yaml - catch: - mutation: | let err = error() root.error = true root.error_type = if $err.contains("timeout") { "TIMEOUT" } else if $err.contains("connection refused") { "CONNECTION_ERROR" } else if $err.contains("404") { "NOT_FOUND" } else { "UNKNOWN" } root.message = $err root.retry_suggested = root.error_type == "TIMEOUT" || root.error_type == "CONNECTION_ERROR" ``` #### [](#log-errors-for-debugging)Log errors for debugging Add logging inside `catch` blocks to aid troubleshooting: ```yaml - catch: - log: message: "Tool failed: ${! error() }" level: ERROR fields_mapping: | root.input = this root.error = error() - mutation: | root.error = true root.message = error() ``` For log level options, see [`log` processor](../../../../develop/connect/components/processors/log/). #### [](#preserve-input-context-in-errors)Preserve input context in errors Include original input data in error responses to help AI clients retry with corrections: ```yaml - label: validate_and_fetch try: - mutation: | meta original_input = this - mutation: | root = throw("User not found in database") - label: handle_errors catch: - mutation: | root = { "error": "Failed to fetch user", "details": error(), "input_received": @original_input, "suggestion": "Verify the user_id exists" } ``` ### [](#response-formatting)Format responses AI clients work best with clean, predictable response structures. Transform raw component output into consistent formats. Structure responses consistently so AI clients can interpret them reliably. The following example takes a raw weather API response and transforms it into a clean, predictable format: ```yaml - label: format_response mapping: | root = { "city": this.location.name, "temperature_c": this.current.temp_c.number(), "description": this.current.condition.text, "timestamp": now().ts_format("2006-01-02T15:04:05Z") } ``` This mapping does four things: - Extracts the city name from a nested `location.name` field - Converts `temp_c` to a number type (APIs sometimes return numbers as strings) - Pulls out the weather description text - Adds a timestamp so the AI client knows when the data was fetched The result is a flat JSON object with predictable field names and types, rather than the raw API response which might have deeply nested structures or inconsistent formatting. #### [](#type-coercion-methods)Type coercion methods Use [type coercion methods](../../../../develop/connect/guides/bloblang/methods/) to ensure fields have the correct data types: | Method | Purpose | Example | | --- | --- | --- | | .string() | Convert to string | this.id.string() becomes "123" | | .number() | Convert to number | this.price.number() becomes 19.99 | | .bool() | Convert to boolean | this.active.bool() becomes true | | .int64() | Convert to 64-bit integer | this.count.int64() becomes 42 | | .float64() | Convert to 64-bit float | this.ratio.float64() becomes 0.75 | #### [](#format-timestamps)Format timestamps Use [`now()`](../../../../develop/connect/guides/bloblang/functions/#now) with [`format_timestamp()`](../../../../develop/connect/guides/bloblang/methods/#format_timestamp) for consistent time formatting: ```yaml - label: add_timestamps mutation: | root = this root.timestamp = now().format_timestamp("2006-01-02T15:04:05Z07:00") root.date = now().format_timestamp("2006-01-02") root.time = now().format_timestamp("15:04:05") ``` This example preserves all existing fields (`root = this`) and adds three timestamp fields. The `now()` function returns the current time, and `format_timestamp()` converts it to a string. Each field uses a different format: full ISO 8601 timestamp, date only, and time only. The format string uses Go’s reference time layout. Common patterns: | Format | Output example | | --- | --- | | "2006-01-02T15:04:05Z07:00" | 2024-03-15T14:30:00-07:00 (ISO 8601) | | "2006-01-02" | 2024-03-15 | | "15:04:05" | 14:30:00 | | "Mon, 02 Jan 2006" | Fri, 15 Mar 2024 | #### [](#extract-nested-fields)Extract nested fields API responses often have deeply nested structures. Extract only the fields your AI client needs and flatten them into a simple object: ```yaml - label: extract_user_data mapping: | root = { "user_id": this.data.user.id.string(), "name": this.data.user.profile.display_name, "email": this.data.user.contact.email, "is_verified": this.data.user.status.verified.bool(), "created_at": this.data.user.metadata.created_at } ``` This mapping navigates a nested structure like `{"data": {"user": {"id": 123, "profile": {"display_name": "…​"}}}}` and creates a flat response. The dot notation (`this.data.user.id`) drills down through nested objects. Type coercion (`.string()`, `.bool()`) ensures consistent output types. For navigating nested structures, see [dot notation](../../../../develop/connect/guides/bloblang/about/#dot-notation). #### [](#handle-arrays)Handle arrays When your data contains arrays, you can transform each element, extract specific items, or compute aggregates: ```yaml - label: format_items mapping: | root = { "total_items": this.items.length(), "items": this.items.map_each(item -> { "id": item.id.string(), "name": item.name, "price": item.price.number() }), "first_item": this.items.index(0).name, "item_names": this.items.map_each(i -> i.name) } ``` This example demonstrates four array operations: - `length()` returns the array size for the `total_items` count - `map_each()` transforms each item into a new object with only the fields you need - `index(0)` accesses the first element (zero-indexed) to get the first item’s name - A second `map_each()` extracts just the names into a simple string array For array operations, see [`map_each()`](../../../../develop/connect/guides/bloblang/methods/#map_each) and [`index()`](../../../../develop/connect/guides/bloblang/methods/#index). #### [](#include-fields-conditionally)Include fields conditionally Sometimes you want to include fields only when they have meaningful values. This avoids returning `null` or empty fields that clutter the response: ```yaml - mutation: | root.id = this.id root.name = this.name root.email = if this.exists("email") && this.email != "" { this.email } else { deleted() } root.phone = if this.exists("phone") { this.phone } else { deleted() } ``` This mapping starts with required fields (`id`, `name`), then conditionally adds optional fields. The `exists()` check prevents errors when accessing missing fields. When the condition is false, `deleted()` removes the field entirely from the output. The AI client won’t see `"email": null`. The field simply won’t exist. The [`deleted()`](../../../../develop/connect/guides/bloblang/functions/#deleted) function removes the field from the output entirely. #### [](#filter-sensitive-data)Filter sensitive data When your data source contains sensitive fields, strip them before returning responses to the AI client: ```yaml - mutation: | root = this.without("password", "ssn", "api_key", "internal_notes") ``` The `without()` method creates a copy of the object with the specified fields removed. This is safer than manually selecting fields because new fields added to the source data are included automatically, so you only need to maintain the exclusion list. Use this when returning database records or API responses that might contain credentials or personal information. For field removal, see [`without()`](../../../../develop/connect/guides/bloblang/methods/#without). #### [](#wrap-responses-in-a-success-envelope)Wrap responses in a success envelope When AI clients call multiple tools, they need a predictable way to check if the call succeeded. Wrapping responses in a consistent envelope structure makes this easy: ```yaml - label: format_success mapping: | root = { "success": true, "data": { "user_id": this.id, "name": this.name }, "timestamp": now().format_timestamp("2006-01-02T15:04:05Z07:00") } ``` Both success and error responses share the same top-level structure: a `success` boolean, a payload field (`data` or `error`), and a `timestamp`. The AI client can check `success` first, then access the appropriate field. The error response uses the `catch` processor to handle failures and the `error()` function to capture the error message. ## [](#advanced-workflows)Advanced workflows Build multi-step workflows with dynamic configuration, conditional logic, and observability. ### [](#dynamic-configuration)Dynamic configuration Build tools that adapt their behavior based on input parameters: ```yaml processors: - label: dynamic_config mutation: | # Choose data source based on environment meta env = this.environment | "production" meta table_name = match @env { "dev" => "dev_orders", "staging" => "staging_orders", "production" => "prod_orders", _ => "dev_orders" } # Adjust query complexity based on urgency meta columns = if this.detailed.bool().catch(false) { ["order_id", "customer_id", "total", "items", "shipping_address"] } else { ["order_id", "customer_id", "total"] } ``` ### [](#conditional-processing)Conditional processing Build tools that branch based on input or data characteristics: ```yaml processors: - label: conditional_processing switch: - check: this.data_type == "json" processors: - mapping: | root.parsed_data = this.content.parse_json() root.format = "json" - check: this.data_type == "csv" processors: - mapping: | root.parsed_data = this.content.parse_csv() root.format = "csv" - processors: - mapping: | root.error = "Unsupported data type" root.supported_types = ["json", "csv"] ``` ### [](#secrets)Secrets and credentials Securely handle multiple credentials and API keys. Here is an example of using an API key secret. 1. Create a secret in the [Secrets Store](../../../../develop/connect/configuration/secret-management/) with name `EXTERNAL_API_KEY` and your API key as the value. 2. Reference the secret in your YAML configuration: ```yaml processors: - label: call_external_api http: url: "https://api.example.com/data" verb: GET headers: Authorization: "Bearer ${secrets.EXTERNAL_API_KEY}" (1) Accept: "application/json" ``` | 1 | The secret is injected at runtime. Never store the actual API key in your YAML configuration. The actual secret value never appears in your configuration files or logs. | | --- | --- | ### [](#monitoring-debugging-and-observability)Monitoring, debugging, and observability Use structured logging, request tracing, and performance metrics to gain insights into tool execution. ```yaml label: observable_tool processors: - label: init_tracing mutation: | # Generate correlation ID for request tracing meta req_id = uuid_v7() meta start_time = now() # Log request start with structured data root.trace = { "request_id": @req_id, "timestamp": @start_time.ts_format("2006-01-02T15:04:05.000Z"), "tool": "observable_tool", "version": "1.0.0" } - label: log_request_start log: message: "MCP tool request started" fields: request_id: "${! @req_id }" tool_name: "observable_tool" input_params: "${! this.without(\"trace\") }" user_agent: "${! meta(\"User-Agent\").catch(\"unknown\") }" level: "INFO" - label: finalize_response mutation: | # Calculate total execution time meta duration = (now().ts_unix_nano() - @start_time.ts_unix_nano()) / 1000000 # Add trace information to response root.metadata = { "request_id": @req_id, "execution_time_ms": @duration, "timestamp": now().ts_format("2006-01-02T15:04:05.000Z"), "tool": "observable_tool", "success": !this.exists("error") } - label: log_completion log: message: "MCP tool request completed" fields: request_id: "${! @req_id }" duration_ms: "${! this.metadata.execution_time_ms }" success: "${! this.metadata.success }" result_size: "${! content().length() }" level: "INFO" meta: tags: [ example ] mcp: enabled: true description: "Example tool with comprehensive observability and error handling" properties: - name: user_id type: string description: "User ID to fetch data for" required: true ``` Observability features: - **Correlation IDs**: Use `uuid_v7()` to generate unique request identifiers for tracing - **Execution timing**: Track how long your tools take to execute using nanosecond precision - **Structured logging**: Include consistent fields like `request_id`, `duration_ms`, `tool_name` - **Request/response metadata**: Log input parameters and response characteristics - **Success tracking**: Monitor whether operations complete successfully You can test this pattern by invoking the tool with valid and invalid parameters, and observe the structured logs for tracing execution flow. For example, with a user ID of 1, you might see logs like: ```json { "metadata": { "execution_time_ms": 0.158977, "request_id": "019951ab-d07d-703f-aaae-7e1c9a5afa95", "success": true, "timestamp": "2025-09-16T08:37:18.589Z", "tool": "observable_tool" }, "trace": { "request_id": "019951ab-d07d-703f-aaae-7e1c9a5afa95", "timestamp": "2025-09-16T08:37:18.589Z", "tool": "observable_tool", "version": "1.0.0" }, "user_id": "1" } ``` See also: [`log` processor](../../../../develop/connect/components/processors/log/), [`try` processor](../../../../develop/connect/components/processors/try/), [Bloblang functions](../../../../develop/connect/guides/bloblang/functions/) (for timing and ID generation) ### [](#multi-step-data-enrichment)Multi-step data enrichment Build tools that combine data from multiple sources. This workflow fetches customer data from a SQL database, enriches it with recent order history, and computes summary metrics. ```yaml label: customer_enrichment processors: - label: fetch_customer_base branch: processors: - sql_select: driver: "postgres" dsn: "${POSTGRES_DSN}" table: "customers" where: "customer_id = ?" args_mapping: 'root = [this.customer_id]' result_map: 'root.customers = this' - label: enrich_with_orders branch: processors: - sql_select: driver: "postgres" dsn: "${POSTGRES_DSN}" table: "orders" where: "customer_id = ? AND created_at >= NOW() - INTERVAL ''30 days''" args_mapping: 'root = [this.customer_id]' result_map: 'root.orders = this' - label: combine_data mutation: | let order_totals = this.orders.map_each(o -> o.total) root = { "customer": this.customers.index(0), "recent_orders": this.orders, "metrics": { "total_orders": this.orders.length(), "total_spent": $order_totals.sum(), "avg_order_value": if $order_totals.length() > 0 { $order_totals.sum() / $order_totals.length() } else { 0 } } } meta: tags: [ example ] mcp: enabled: true description: "Get comprehensive customer profile with recent order history and metrics" properties: - name: customer_id type: string description: "Customer ID to analyze" required: true ``` See also: [`sql_select` processor](../../../../develop/connect/components/processors/sql_select/), [Bloblang functions](../../../../develop/connect/guides/bloblang/about/) (for data manipulation and aggregations) ### [](#workflow-orchestration)Workflow orchestration Coordinate complex workflows with multiple steps and conditional logic. This workflow simulates a complete order processing pipeline with mock data for inventory and processing tiers. This allows you to test the full logic without needing real external systems. ```yaml label: order_workflow processors: - label: validate_order mutation: | # Validation logic root = if this.total <= 0 { throw("Invalid order total") } else { this } - label: mock_inventory_check mutation: | # Mock inventory data for testing let inventory = { "widget-001": {"quantity": 100, "name": "Standard Widget"}, "widget-premium": {"quantity": 25, "name": "Premium Widget"}, "widget-limited": {"quantity": 2, "name": "Limited Edition Widget"} } let product = $inventory.get(this.product_id) root = if $product == null { throw("Product not found: " + this.product_id) } else if $product.quantity < this.quantity { throw("Insufficient inventory. Available: " + $product.quantity.string()) } else { this.merge({ "inventory_check": "passed", "available_quantity": $product.quantity, "product_name": $product.name }) } - label: route_by_priority switch: - check: 'this.total > 1000' processors: - label: mock_high_value_processing mutation: | # Mock premium processing root = this.merge({ "processing_tier": "premium", "processing_time_estimate": "2-4 hours", "assigned_rep": "premium-team@company.com", "priority_score": 95 }) - check: 'this.customer_tier == "vip"' processors: - label: mock_vip_processing mutation: | # Mock VIP processing root = this.merge({ "processing_tier": "vip", "processing_time_estimate": "1-2 hours", "assigned_rep": "vip-team@company.com", "priority_score": 90, "perks": ["expedited_shipping", "white_glove_service"] }) - processors: - label: mock_standard_processing mutation: | # Mock standard processing root = this.merge({ "processing_tier": "standard", "processing_time_estimate": "24-48 hours", "assigned_rep": "support@company.com", "priority_score": 50 }) - label: finalize_order mutation: | # Add final processing metadata # Calculate estimated fulfillment by parsing processing time let max_hours = this.processing_time_estimate.split("-").index(1).split(" ").index(0).number() root = this.merge({ "order_status": "processed", "processed_at": now().ts_format("2006-01-02T15:04:05.000Z"), "estimated_fulfillment": "TBD - calculated based on processing tier", "processing_time_hours": $max_hours }) meta: tags: [ example ] mcp: enabled: true description: "Process orders with validation, inventory check, and tiered routing (with mocks for testing)" properties: - name: order_id type: string description: "Unique order identifier" required: true - name: product_id type: string description: "Product ID (try: widget-001, widget-premium, widget-limited)" required: true - name: quantity type: number description: "Quantity to order" required: true - name: total type: number description: "Order total in dollars" required: true - name: customer_tier type: string description: "Customer tier (optional: vip, standard)" required: false ``` For the input `{"order_id": "ORD001", "product_id": "widget-001", "quantity": 5, "total": 250, "customer_tier": "vip"}`, the workflow produces: ```json { "assigned_rep": "vip-team@company.com", "available_quantity": 100, "customer_tier": "vip", "estimated_fulfillment": "TBD - calculated based on processing tier", "inventory_check": "passed", "order_id": "ORD001", "order_status": "processed", "perks": [ "expedited_shipping", "white_glove_service" ], "priority_score": 90, "processed_at": "2025-09-16T09:05:29.138Z", "processing_tier": "vip", "processing_time_estimate": "1-2 hours", "processing_time_hours": 2, "product_id": "widget-001", "product_name": "Standard Widget", "quantity": 5, "total": 250 } ``` Notice how the workflow: 1. Preserves original input: `order_id`, `product_id`, `quantity`, `total`, and `customer_tier` pass through unchanged. 2. Adds inventory data: `available_quantity`, `product_name`, and `inventory_check` status from the mock lookup. 3. Routes by customer tier: Since `customer_tier` is `vip`, it gets VIP processing with special `perks` and priority. 4. Enriches with processing metadata: `assigned_rep`, `priority_score`, `processing_tier`, and time estimates. 5. Finalizes with timestamps: `order_status`, `processed_at`, and calculated `processing_time_hours`. ## [](#next-steps)Next steps - [Integration Patterns Overview](../../../agents/integration-overview/) - [Create an MCP Tool](../create-tool/) - [MCP Tool Design](../best-practices/) - [Troubleshoot Remote MCP Servers](../troubleshooting/) --- # Page 45: Troubleshoot Remote MCP Servers **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/mcp/remote/troubleshooting.md --- # Troubleshoot Remote MCP Servers --- title: Troubleshoot Remote MCP Servers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mcp/remote/troubleshooting page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mcp/remote/troubleshooting.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/mcp/remote/troubleshooting.adoc description: Diagnose and fix common issues when building and running Remote MCP servers in Redpanda Cloud. page-topic-type: troubleshooting personas: agent_developer, streaming_developer, platform_admin learning-objective-1: Diagnose and fix lint and YAML configuration errors learning-objective-2: Resolve runtime issues when tools don't appear or return unexpected results learning-objective-3: Debug client connection problems page-git-created-date: "2026-01-13" page-git-modified-date: "2026-02-18" --- This page helps you diagnose and fix common issues when building and running Remote MCP servers. Use this page to: - Diagnose and fix lint and YAML configuration errors - Resolve runtime issues when tools don’t appear or return unexpected results - Debug client connection problems ## [](#lint-errors)Lint errors Always lint your configuration before deploying. The Cloud Console provides a **Lint** button that validates your YAML. ### [](#common-lint-errors)Common lint errors - `unable to infer component type`: Your file contains multiple component types or uses wrapper blocks. Each YAML file must contain only a single component type and should not be wrapped in an `input:` or `output:` block. See [Unable to infer component type](#fix-unable-to-infer). - `unknown field`: A configuration field is misspelled. Check the field name against the component documentation. - `missing required field`: A required field is missing from your configuration. Add the missing field. ### [](#fix-unable-to-infer)Unable to infer component type If you see errors like the following, your YAML file contains more than one component type or uses a wrapper: ```none resources/inputs/redpanda-consume.yaml(1,1) unable to infer component type: [input processors cache_resources meta] resources/outputs/redpanda-publish.yaml(1,1) unable to infer component type: [processors output meta] ``` To fix this, split out each component type into its own file. Incorrect: Multiple component types ```yaml label: incorrect-example input: redpanda: { ... } processors: - mutation: { ... } output: redpanda: { ... } ``` Correct: Single component type ```yaml label: event-reader redpanda: seed_brokers: [ "${REDPANDA_BROKERS}" ] topics: [ "events" ] consumer_group: "mcp-reader" meta: mcp: enabled: true description: "Consume events from Redpanda" ``` ### [](#json-schema-errors)JSON schema errors JSON schema errors indicate that you’re using an outdated version of Redpanda Connect with an incompatible JSON schema format: ```json { "type": "error", "error": { "type": "invalid_request_error", "message": "tools.17.custom.input_schema: JSON schema is invalid..." } } ``` Contact Redpanda support if you see this error in Redpanda Cloud. ## [](#runtime-issues)Runtime issues ### [](#tool-not-appearing-in-mcp-client)Tool not appearing in MCP client If your tool doesn’t appear in the MCP client’s tool list: 1. Verify that `meta.mcp.enabled: true` is set in your YAML configuration. 2. Check the tool has the correct tag: - Verify the MCP server status shows **Running** in the Cloud Console. - Check the **Logs** tab for any startup errors. 3. Verify correct directory structure: Example correct structure ```yaml label: my-tool # ... component configuration ... meta: tags: [ my-tag ] # Must match --tag argument mcp: enabled: true # Required for exposure description: Tool description ``` ### [](#tool-returns-unexpected-results)Tool returns unexpected results If your tool runs but returns unexpected data: 1. Check input validation. Add logging to see what inputs the tool receives: ```yaml - log: message: "Received input: ${! json() }" level: DEBUG ``` 2. Verify data transformations. Log intermediate results between processors. 3. Check external API responses. The API may return different data than expected. 4. Review the **Logs** tab in the Cloud Console for error messages. ## [](#connection-issues)Connection issues ### [](#mcp-client-cant-connect-to-server)MCP client can’t connect to server If your MCP client can’t connect to your Remote MCP server: 1. Verify authentication: - Run `rpk cloud login` to refresh your authentication token. - Tokens expire after 1 hour. 2. Check the MCP proxy configuration: - Verify the cluster ID and MCP server ID are correct. - Run `rpk cloud mcp proxy --help` to see available options. 3. Verify the MCP server is running: - Check the server status in the Cloud Console. - Review the **Logs** tab for startup errors. ### [](#connection-drops-or-times-out)Connection drops or times out If connections are unstable: 1. Check network connectivity between the client and server. 2. Verify no firewall rules are blocking the connection. 3. Check if the MCP server is being restarted due to resource limits. Consider scaling up resources. ## [](#debugging)Debugging techniques Use these techniques to systematically isolate and fix issues with your MCP tools. ### [](#add-temporary-logging)Add temporary logging Insert [`log` processors](../../../../develop/connect/components/processors/log/) to debug data flow: ```yaml processors: - log: message: "Input received: ${! json() }" level: DEBUG - # ... your processing logic ... - log: message: "Output produced: ${! json() }" level: DEBUG ``` The `${! json() }` syntax uses [Bloblang interpolation](../../../../develop/connect/guides/bloblang/functions/#json) to insert the current message content. Remove debug processors before deploying to production. ### [](#test-your-tools)Test your tools Build confidence by testing at each stage: 1. Lint your configuration using the **Lint** button in the Cloud Console. 2. Test tool logic using the **MCP Inspector**. 3. Connect to your AI client using `rpk cloud mcp proxy`. 4. Test end-to-end with realistic prompts. ### [](#isolate-the-problem)Isolate the problem When debugging complex tools: 1. Test each processor individually by commenting out others. 2. Use static test data instead of live API calls. 3. Check if the issue is in input validation, processing logic, or output formatting. 4. Compare working tools with broken ones to identify differences. ## [](#next-steps)Next steps If you’re still experiencing issues: - [Create an MCP Tool](../create-tool/) - [MCP Tool Design](../best-practices/) - [MCP Tool Execution and Components](../concepts/) For protocol-level troubleshooting, see the [MCP documentation](https://modelcontextprotocol.io/). --- # Page 46: Transcripts **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/observability.md --- # Transcripts --- title: Transcripts latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: observability/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: observability/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/observability/index.adoc description: Govern agentic AI with complete execution transcripts built on Redpanda's immutable distributed log. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Govern agentic AI with complete execution transcripts built on Redpanda’s immutable distributed log. - [Transcripts and AI Observability](concepts/) Understand how Redpanda captures end-to-end execution transcripts on an immutable distributed log for agent governance and observability. - [View Transcripts](transcripts/) Filter and navigate the Transcripts interface to investigate end-to-end agent execution records stored on Redpanda's immutable log. - [Ingest OpenTelemetry Traces from Custom Agents](ingest-custom-traces/) Configure a Redpanda Connect pipeline to ingest OpenTelemetry traces from custom agents into Redpanda's immutable log for unified governance and observability. --- # Page 47: Transcripts and AI Observability **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/observability/concepts.md --- # Transcripts and AI Observability --- title: Transcripts and AI Observability latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: observability/concepts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: observability/concepts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/observability/concepts.adoc description: Understand how Redpanda captures end-to-end execution transcripts on an immutable distributed log for agent governance and observability. page-topic-type: concepts personas: evaluator, agent_developer, platform_admin, data_engineer learning-objective-1: Explain how transcripts and spans capture execution flow learning-objective-2: Interpret transcript structure for debugging and monitoring learning-objective-3: Distinguish between transcripts and audit logs page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Redpanda provides complete observability and governance for AI agents through automated [transcript](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#transcript) capture. Every agent execution, from simple tool calls to complex multi-agent, multi-turn workflows, generates a permanent, write-once record stored on Redpanda’s [log](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#log). This captures all agent reasoning, tool invocations, model interactions, and data flows with 100% sampling and no gaps. With transcripts, organizations gain the ability to debug agent behavior, identify performance bottlenecks, meet regulatory compliance requirements, and maintain accountability for AI-driven decisions. Transcripts use OpenTelemetry standards and [Raft](https://raft.github.io/)\-based consensus for correctness, establishing a trustworthy foundation for agent governance. After reading this page, you will be able to: - Explain how transcripts and spans capture execution flow - Interpret transcript structure for debugging and monitoring - Distinguish between transcripts and audit logs ## [](#what-are-transcripts)What are transcripts A transcript records the complete execution of an agentic behavior from start to finish. It captures every step — across multiple agents, tools, models, and services — in a single, traceable record. The AI Gateway and every [agent](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) and [MCP server](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#mcp-server) in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a [topic](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#topic) called `redpanda.otel_traces`. Redpanda’s immutable distributed log stores these traces. Transcripts capture: - Tool invocations and results - Agent reasoning steps - Data processing operations - External API calls - Error conditions - Performance metrics With 100% sampling, every operation is captured with no gaps. The underlying storage uses a distributed log built on Raft consensus (with TLA+ proven correctness), giving transcripts a trustworthy, immutable record for governance, debugging, and performance analysis. ## [](#traces-and-spans)Traces and spans [OpenTelemetry](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#opentelemetry) traces provide a complete picture of how a request flows through your system: - A _trace_ represents the entire lifecycle of a request (for example, a tool invocation from start to finish). - A _span_ represents a single unit of work within that trace (such as a data processing operation or an external API call). - A trace contains one or more spans organized hierarchically, showing how operations relate to each other. ## [](#agent-transcript-hierarchy)Agent transcript hierarchy Agent executions create a hierarchy of spans that reflect how agents process requests. Understanding this hierarchy helps you interpret agent behavior and identify where issues occur. ### [](#agent-span-types)Agent span types Agent transcripts contain these span types: | Span Type | Description | Use To | | --- | --- | --- | | ai-agent | Top-level span representing the entire agent invocation from start to finish. Includes all processing time, from receiving the request through executing the reasoning loop, calling tools, and returning the final response. | Measure total request duration and identify slow agent invocations. | | agent | Internal agent processing that represents reasoning and decision-making. Shows time spent in the LLM reasoning loop, including context processing, tool selection, and response generation. Multiple agent spans may appear when the agent iterates through its reasoning loop. | Track reasoning time and identify iteration patterns. | | invoke_agent | Agent and sub-agent invocation in multi-agent architectures, following the OpenTelemetry agent invocation semantic conventions. Represents one agent calling another via the A2A protocol. | Trace calls between root agents and sub-agents, measure cross-agent latency, and identify which sub-agent was invoked. | | openai, anthropic, or other LLM providers | LLM provider API call showing calls to the language model. The span name matches the provider, and attributes typically include the model name (like gpt-5.2 or claude-sonnet-4-5). | Identify which model was called, measure LLM response time, and debug LLM API errors. | | rpcn-mcp | MCP tool invocation representing calls to Remote MCP servers. Shows tool execution time, including network latency and tool processing. Child spans with instrumentationScope.name set to redpanda-connect represent internal Redpanda Connect processing. | Measure tool execution time and identify slow MCP tool calls. | ### [](#typical-agent-execution-flow)Typical agent execution flow A simple agent request creates this hierarchy: ai-agent (6.65 seconds) ├── agent (6.41 seconds) │ ├── invoke\_agent: customer-support-agent (6.39 seconds) │ │ └── openai: chat gpt-5.2 (6.2 seconds) This hierarchy shows that the LLM API call (6.2 seconds) accounts for most of the total agent invocation time (6.65 seconds), revealing the bottleneck in this execution flow. ## [](#mcp-server-transcript-hierarchy)MCP server transcript hierarchy MCP server tool invocations produce a different span hierarchy focused on tool execution and internal processing. This structure reveals performance bottlenecks and helps debug tool-specific issues. ### [](#mcp-server-span-types)MCP server span types MCP server transcripts contain these span types: | Span Type | Description | Use To | | --- | --- | --- | | mcp-{server-id} | Top-level span representing the entire MCP server invocation. The server ID uniquely identifies the MCP server instance. This span encompasses all tool execution from request receipt to response completion. | Measure total MCP server response time and identify slow tool invocations. | | service | Internal service processing span that appears at multiple levels in the hierarchy. Represents Redpanda Connect service operations including routing, processing, and component execution. | Track internal processing overhead and identify where time is spent in the service layer. | | Tool name (for example, get_order_status, get_customer_history) | The specific MCP tool being invoked. This span name matches the tool name defined in the MCP server configuration. | Identify which tool was called and measure tool-specific execution time. | | processors | Processor pipeline execution span showing the collection of processors that process the tool’s data. Appears as a child of the tool invocation span. | Measure total processor pipeline execution time. | | Processor name (for example, mapping, http, branch) | Individual processor execution span representing a single Redpanda Connect processor. The span name matches the processor type. | Identify slow processors and debug processing logic. | ### [](#typical-mcp-server-execution-flow)Typical MCP server execution flow An MCP tool invocation creates this hierarchy: mcp-d5mnvn251oos73 (4.00 seconds) ├── service > get\_order\_status (4.07 seconds) │ └── service > processors (43 microseconds) │ └── service > mapping (18 microseconds) This shows: 1. Total MCP server invocation: 4.00 seconds 2. Tool execution (get\_order\_status): 4.07 seconds 3. Processor pipeline: 43 microseconds 4. Mapping processor: 18 microseconds (data transformation) The majority of time (4+ seconds) is spent in tool execution, while internal processing (mapping) takes only microseconds. This indicates the tool itself (likely making external API calls or database queries) is the bottleneck, not Redpanda Connect’s internal processing. ## [](#transcript-layers-and-scope)Transcript layers and scope Transcripts contain multiple layers of instrumentation, from HTTP transport through application logic to external service calls. The `scope.name` field in each span identifies which instrumentation layer created that span. ### [](#instrumentation-layers)Instrumentation layers A complete agent transcript includes these layers: | Layer | Scope Name | Purpose | | --- | --- | --- | | HTTP Server | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp | HTTP transport layer receiving requests. Shows request/response sizes, status codes, client addresses, and network details. | | AI SDK (Agent) | github.com/redpanda-data/ai-sdk-go/plugins/otel | Agent application logic. Shows agent invocations, LLM calls, tool executions, conversation IDs, token usage, and model details. Includes gen_ai.* semantic convention attributes. | | HTTP Client | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp | Outbound HTTP calls from agent to MCP servers. Shows target URLs, request methods, and response codes. | | MCP Server | rpcn-mcp | MCP server tool execution. Shows tool name, input parameters, result size, and execution time. Appears as a separate service.name in resource attributes. | | Redpanda Connect | redpanda-connect | Internal Redpanda Connect component execution within MCP tools. Shows pipeline and individual component spans. | ### [](#how-layers-connect)How layers connect Layers connect through parent-child relationships in a single transcript: ai-agent-http-server (HTTP Server layer) └── invoke\_agent customer-support-agent (AI SDK layer) ├── chat gpt-5-nano (AI SDK layer, LLM call 1) ├── execute\_tool get\_order\_status (AI SDK layer) │ └── HTTP POST (HTTP Client layer) │ └── get\_order\_status (MCP Server layer, different service) │ └── processors (Redpanda Connect layer) └── chat gpt-5-nano (AI SDK layer, LLM call 2) The request flow demonstrates: 1. HTTP request arrives at agent 2. Agent invokes sub-agent 3. Agent makes first LLM call to decide what to do 4. Agent executes tool, making HTTP call to MCP server 5. MCP server processes tool through its pipeline 6. Agent makes second LLM call with tool results 7. Response returns through HTTP layer ### [](#cross-service-transcripts)Cross-service transcripts When agents call MCP tools, the transcript spans multiple services. Each service has a different `service.name` in the resource attributes: - Agent spans: `"service.name": "ai-agent"` - MCP server spans: `"service.name": "mcp-{server-id}"` Both use the same `traceId`, allowing you to follow a request across service boundaries. ### [](#key-attributes-by-layer)Key attributes by layer Different layers expose different attributes: HTTP Server/Client layer (following [OpenTelemetry semantic conventions for HTTP](https://opentelemetry.io/docs/specs/semconv/http/http-spans/)): - `http.request.method`, `http.response.status_code` - `server.address`, `url.path`, `url.full` - `network.peer.address`, `network.peer.port` - `http.request.body.size`, `http.response.body.size` AI SDK layer (following [OpenTelemetry semantic conventions for generative AI](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/)): - `gen_ai.operation.name`: Operation type (`invoke_agent`, `chat`, `execute_tool`) - `gen_ai.conversation.id`: Links spans to the same conversation session. A conversation may include multiple agent invocations (one per user request). Each invocation creates a separate trace that shares the same conversation ID. - `gen_ai.agent.name`: Sub-agent name for multi-agent systems - `gen_ai.provider.name`, `gen_ai.request.model`: LLM provider and model - `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`: Token consumption - `gen_ai.tool.name`, `gen_ai.tool.call.arguments`: Tool execution details - `gen_ai.input.messages`, `gen_ai.output.messages`: Full LLM conversation context MCP Server layer: - Tool-specific attributes like `order_id`, `customer_id` - `result_prefix`, `result_length`: Tool result metadata Redpanda Connect layer: - Component-specific attributes from your tool configuration The `scope.name` field identifies which instrumentation layer created each span. ## [](#understand-the-transcript-structure)Understand the transcript structure Each span captures a unit of work. Here’s what a typical MCP tool invocation looks like: ```json { "traceId": "71cad555b35602fbb35f035d6114db54", "spanId": "43ad6bc31a826afd", "name": "http_processor", "attributes": [ {"key": "city_name", "value": {"stringValue": "london"}}, {"key": "result_length", "value": {"intValue": "198"}} ], "startTimeUnixNano": "1765198415253280028", "endTimeUnixNano": "1765198424660663434", "instrumentationScope": {"name": "rpcn-mcp"}, "status": {"code": 0, "message": ""} } ``` - `traceId` links all spans in the same request across services - `spanId` uniquely identifies this span - `name` identifies the operation or tool - `instrumentationScope.name` identifies which layer created the span (`rpcn-mcp` for MCP tools, `redpanda-connect` for internal processing) - `attributes` contain operation-specific metadata - `status.code` indicates success (0) or error (2) ### [](#parent-child-relationships)Parent-child relationships Transcripts show how operations relate. A tool invocation (parent) may trigger internal operations (children): ```json { "traceId": "71cad555b35602fbb35f035d6114db54", "spanId": "ed45544a7d7b08d4", "parentSpanId": "43ad6bc31a826afd", "name": "http", "instrumentationScope": {"name": "redpanda-connect"}, "status": {"code": 0, "message": ""} } ``` The `parentSpanId` links this child span to the parent tool invocation. Both share the same `traceId` so you can reconstruct the complete operation. ## [](#error-events-in-transcripts)Error events in transcripts When something goes wrong, transcripts capture error details: ```json { "traceId": "71cad555b35602fbb35f035d6114db54", "spanId": "ba332199f3af6d7f", "parentSpanId": "43ad6bc31a826afd", "name": "http_request", "events": [ { "name": "event", "timeUnixNano": "1765198420254169629", "attributes": [{"key": "error", "value": {"stringValue": "type"}}] } ], "status": {"code": 0, "message": ""} } ``` The `events` array captures what happened and when. Use `timeUnixNano` to see exactly when the error occurred within the operation. ## [](#opentelemetry-traces-topic)How Redpanda stores trace data The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda’s [Schema Registry](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#schema-registry) wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/specs/otel/protocol/) specification. Spans include attributes following OpenTelemetry [semantic conventions for generative AI](https://opentelemetry.io/docs/specs/semconv/gen-ai/), such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. Redpanda manages both the `redpanda.otel_traces` topic and its schema automatically. If you delete either the topic or the schema, they are recreated automatically. However, deleting the topic permanently deletes all trace data, and the topic comes back empty. Do not produce your own data to this topic. It is reserved for OpenTelemetry traces. ### [](#topic-configuration-and-lifecycle)Topic configuration and lifecycle The `redpanda.otel_traces` topic has a predefined retention policy. Configuration changes to this topic are not supported. If you modify settings, Redpanda reverts them to the default values. The topic persists in your cluster even after all agents and MCP servers are deleted, allowing you to retain historical trace data for analysis. Transcripts may contain sensitive information from your tool inputs and outputs. Consider implementing appropriate [ACL](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#access-control-list-acl) for the `redpanda.otel_traces` topic, and review the data in transcripts before sharing or exporting to external systems. ## [](#transcripts-compared-to-audit-logs)Transcripts compared to audit logs Transcripts and audit logs serve different but complementary purposes. Transcripts provide: - A complete, immutable record of every execution step, stored on Redpanda’s distributed log with no gaps - Hierarchical view of request flow through your system (parent-child span relationships) - Detailed timing information for performance analysis - Ability to reconstruct execution paths and identify bottlenecks Transcripts are optimized for execution-level observability and governance. For user-level accountability tracking ("who initiated what"), use the session and task topics for agents, which provide records of agent conversations and task execution. ## [](#next-steps)Next steps - [View Transcripts](../transcripts/) - [Monitor Agent Activity](../../agents/monitor-agents/) - [Monitor MCP Server Activity](../../mcp/remote/monitor-mcp-servers/) --- # Page 48: Ingest OpenTelemetry Traces from Custom Agents **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/observability/ingest-custom-traces.md --- # Ingest OpenTelemetry Traces from Custom Agents --- title: Ingest OpenTelemetry Traces from Custom Agents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: observability/ingest-custom-traces page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: observability/ingest-custom-traces.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/observability/ingest-custom-traces.adoc description: Configure a Redpanda Connect pipeline to ingest OpenTelemetry traces from custom agents into Redpanda's immutable log for unified governance and observability. page-topic-type: how-to learning-objective-1: Configure and deploy a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to `redpanda.otel_traces` learning-objective-2: Validate trace data format and compatibility with existing MCP server traces learning-objective-3: Secure the ingestion endpoint using authentication mechanisms page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). You can extend Redpanda’s transcript observability to custom agents built with frameworks like LangChain or instrumented with OpenTelemetry SDKs. By ingesting traces from external applications into the `redpanda.otel_traces` topic, you gain unified visibility across all agent executions, from Redpanda’s declarative agents, Remote MCP servers, to your own custom implementations. After reading this page, you will be able to: - Configure and deploy a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to \`redpanda.otel\_traces\` - Validate trace data format and compatibility with existing MCP server traces - Secure the ingestion endpoint using authentication mechanisms ## [](#prerequisites)Prerequisites - A BYOC cluster - Ability to manage secrets in Redpanda Cloud - The latest version of [`rpk`](../../../manage/rpk/rpk-install/) installed - Custom agent or application instrumented with OpenTelemetry SDK - Basic understanding of the [OpenTelemetry span format](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) and [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/specs/otlp/) ## [](#quickstart-for-langchain-users)Quickstart for LangChain users If you’re using LangChain with OpenTelemetry tracing, you can send traces to Redpanda’s `redpanda.otel_traces` [topic](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#topic) to view them in the Transcripts view. 1. Configure LangChain’s OpenTelemetry integration by following the [LangChain documentation](https://docs.langchain.com/langsmith/trace-with-opentelemetry). 2. Deploy a Redpanda Connect pipeline using the `otlp_http` input to receive OTLP traces over HTTP. Create the pipeline in the **Connect** page of your cluster, or see the [Configure the ingestion pipeline](#configure-the-ingestion-pipeline) section below for a sample configuration. 3. Configure your OTEL exporter to send traces to your Redpanda Connect pipeline using environment variables: ```bash # Configure LangChain OTEL integration export LANGSMITH_OTEL_ENABLED=true export LANGSMITH_TRACING=true # Send traces to Redpanda Connect pipeline (use your pipeline URL) export OTEL_EXPORTER_OTLP_ENDPOINT="https://.pipelines..clusters.rdpa.co" export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer " ``` By default, traces are sent to both LangSmith and your Redpanda Connect pipeline. If you want to send traces only to Redpanda (not LangSmith), set: ```bash export LANGSMITH_OTEL_ONLY="true" ``` Your LangChain application will send traces to the `redpanda.otel_traces` topic, making them visible in the Transcripts view in your cluster alongside Remote MCP server and declarative agent traces. For non-LangChain applications or custom instrumentation, continue with the sections below. ## [](#about-custom-trace-ingestion)About custom trace ingestion Custom agents are applications with OpenTelemetry instrumentation that operate independently of Redpanda’s Remote MCP servers or declarative agents (such as LangChain, CrewAI, or manually instrumented applications). When these agents send traces to `redpanda.otel_traces`, you gain unified observability alongside Remote MCP server and declarative agent traces. See [Cross-service transcripts](../concepts/#cross-service-transcripts) for details on how traces correlate across services. ### [](#trace-format-requirements)Trace format requirements Custom agents must emit traces in OTLP format. The [`otlp_http`](../../../develop/connect/components/inputs/otlp_http/) input accepts both OTLP Protobuf (`application/x-protobuf`) and JSON (`application/json`) payloads. For [gRPC transport](#use-grpc), use the [`otlp_grpc`](../../../develop/connect/components/inputs/otlp_grpc/) input. Each trace must follow the OTLP specification with these required fields: | Field | Description | | --- | --- | | traceId | Hex-encoded unique identifier for the entire trace | | spanId | Hex-encoded unique identifier for this span | | name | Descriptive operation name | | startTimeUnixNano and endTimeUnixNano | Timing information in nanoseconds | | instrumentationScope | Identifies the library that created the span | | status | Operation status with code (0 = UNSET, 1 = OK, 2 = ERROR) | Optional but recommended fields: - `parentSpanId` for hierarchical traces - `attributes` for contextual information For complete trace structure details, see [Understand the transcript structure](../concepts/#understand-the-transcript-structure). ## [](#configure-the-ingestion-pipeline)Configure the ingestion pipeline Create a Redpanda Connect pipeline that receives OTLP traces and publishes them to the `redpanda.otel_traces` topic. Choose HTTP or gRPC transport based on your agent’s requirements. ### [](#create-the-pipeline-configuration)Create the pipeline configuration Create a pipeline configuration file that defines the OTLP ingestion endpoint. #### HTTP The `otlp_http` input component: - Exposes an OpenTelemetry Collector HTTP receiver - Accepts traces at the standard `/v1/traces` endpoint - Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages The following example shows a minimal pipeline configuration. Redpanda Cloud automatically injects authentication handling, so you don’t need to configure `auth_token` in the input. ```yaml input: otlp_http: {} output: redpanda: seed_brokers: - "${PRIVATE_REDPANDA_BROKERS}" tls: enabled: ${PRIVATE_REDPANDA_TLS_ENABLED} sasl: - mechanism: "REDPANDA_CLOUD_SERVICE_ACCOUNT" topic: "redpanda.otel_traces" ``` #### gRPC The `otlp_grpc` input component: - Exposes an OpenTelemetry Collector gRPC receiver - Accepts traces via the OTLP gRPC protocol - Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages The following example shows a minimal pipeline configuration. Redpanda Cloud automatically injects authentication handling. ```yaml input: otlp_grpc: {} output: redpanda: seed_brokers: - "${PRIVATE_REDPANDA_BROKERS}" tls: enabled: ${PRIVATE_REDPANDA_TLS_ENABLED} sasl: - mechanism: "REDPANDA_CLOUD_SERVICE_ACCOUNT" topic: "redpanda.otel_traces" ``` > 📝 **NOTE** > > Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. The OTLP input automatically handles format conversion, so no processors are needed for basic trace ingestion. Each span becomes a separate message in the `redpanda.otel_traces` topic. ### [](#deploy-the-pipeline-in-redpanda-cloud)Deploy the pipeline in Redpanda Cloud 1. In the **Connect** page of your Redpanda Cloud cluster, click **Create Pipeline**. 2. For the input, select the **otlp\_http** (or **otlp\_grpc**) component. 3. Skip to **Add a topic** and select `redpanda.otel_traces` from the list of existing topics. Leave the default advanced settings. 4. In the **Add permissions** step, create a service account with write access to the `redpanda.otel_traces` topic. 5. In the **Create pipeline** step, enter a name for your pipeline and paste the configuration. Redpanda Cloud automatically handles authentication for incoming requests. ## [](#send-traces-from-your-custom-agent)Send traces from your custom agent Configure your custom agent to send OpenTelemetry traces to the pipeline endpoint. After deploying the pipeline, you can find its URL in the Redpanda Cloud UI on the pipeline details page. | Transport | URL Format | | --- | --- | | HTTP | https://.pipelines..clusters.rdpa.co/v1/traces | | gRPC | .pipelines..clusters.rdpa.co:443 | ### [](#authenticate-to-the-pipeline)Authenticate to the pipeline The OTLP pipeline uses the same authentication mechanism as the Redpanda Cloud API. Obtain an access token using your service account credentials as described in [Authenticate to the Cloud API](../../../security/cloud-authentication/#authenticate-to-the-cloud-api). Include the token in your requests: - HTTP: Set the `Authorization` header to `Bearer ` - gRPC: Set the `authorization` metadata field to `Bearer ` ### [](#configure-your-otel-exporter)Configure your OTEL exporter Install the OpenTelemetry SDK for your language and configure the OTLP exporter to target your Redpanda Connect pipeline endpoint. The exporter configuration requires: - Endpoint: Your pipeline’s URL (the SDK adds `/v1/traces` automatically for HTTP) - Headers: Authorization header with your bearer token - Protocol: HTTP to match the `otlp_http` input (or gRPC for `otlp_grpc`) #### HTTP View Python example ```python from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.sdk.resources import Resource # Configure resource attributes to identify your agent resource = Resource(attributes={ "service.name": "my-custom-agent", "service.version": "1.0.0" }) # Configure the OTLP HTTP exporter exporter = OTLPSpanExporter( endpoint="https://.pipelines..clusters.rdpa.co/v1/traces", headers={"Authorization": "Bearer YOUR_TOKEN"} ) # Set up tracing with batch processing provider = TracerProvider(resource=resource) processor = BatchSpanProcessor(exporter) provider.add_span_processor(processor) trace.set_tracer_provider(provider) # Use the tracer with GenAI semantic conventions tracer = trace.get_tracer(__name__) with tracer.start_as_current_span( "invoke_agent my-assistant", kind=trace.SpanKind.INTERNAL ) as span: # Set GenAI semantic convention attributes span.set_attribute("gen_ai.operation.name", "invoke_agent") span.set_attribute("gen_ai.agent.name", "my-assistant") span.set_attribute("gen_ai.provider.name", "openai") span.set_attribute("gen_ai.request.model", "gpt-4") # Your agent logic here result = process_request() # Set token usage if available span.set_attribute("gen_ai.usage.input_tokens", 150) span.set_attribute("gen_ai.usage.output_tokens", 75) ``` View Node.js example ```javascript const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base'); const { Resource } = require('@opentelemetry/resources'); const { trace, SpanKind } = require('@opentelemetry/api'); // Configure resource const resource = new Resource({ 'service.name': 'my-custom-agent', 'service.version': '1.0.0' }); // Configure OTLP HTTP exporter const exporter = new OTLPTraceExporter({ url: 'https://.pipelines..clusters.rdpa.co/v1/traces', headers: { 'Authorization': 'Bearer YOUR_TOKEN' } }); // Set up provider const provider = new NodeTracerProvider({ resource }); provider.addSpanProcessor(new BatchSpanProcessor(exporter)); provider.register(); // Use the tracer with GenAI semantic conventions const tracer = trace.getTracer('my-agent'); const span = tracer.startSpan('invoke_agent my-assistant', { kind: SpanKind.INTERNAL }); // Set GenAI semantic convention attributes span.setAttribute('gen_ai.operation.name', 'invoke_agent'); span.setAttribute('gen_ai.agent.name', 'my-assistant'); span.setAttribute('gen_ai.provider.name', 'openai'); span.setAttribute('gen_ai.request.model', 'gpt-4'); // Your agent logic processRequest().then(result => { // Set token usage if available span.setAttribute('gen_ai.usage.input_tokens', 150); span.setAttribute('gen_ai.usage.output_tokens', 75); span.end(); }); ``` View Go example ```go package main import ( "context" "log" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/sdk/resource" sdktrace "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.26.0" "go.opentelemetry.io/otel/trace" ) func main() { ctx := context.Background() // Configure OTLP HTTP exporter exporter, err := otlptracehttp.New(ctx, otlptracehttp.WithEndpoint(".pipelines..clusters.rdpa.co"), otlptracehttp.WithHeaders(map[string]string{ "Authorization": "Bearer YOUR_TOKEN", }), ) if err != nil { log.Fatalf("Failed to create exporter: %v", err) } // Configure resource res, _ := resource.New(ctx, resource.WithAttributes( semconv.ServiceName("my-custom-agent"), semconv.ServiceVersion("1.0.0"), ), ) // Set up tracer provider tp := sdktrace.NewTracerProvider( sdktrace.WithBatcher(exporter), sdktrace.WithResource(res), ) defer tp.Shutdown(ctx) otel.SetTracerProvider(tp) tracer := tp.Tracer("my-agent") // Create span with GenAI semantic conventions _, span := tracer.Start(ctx, "invoke_agent my-assistant", trace.WithSpanKind(trace.SpanKindInternal), ) span.SetAttributes( attribute.String("gen_ai.operation.name", "invoke_agent"), attribute.String("gen_ai.agent.name", "my-assistant"), attribute.String("gen_ai.provider.name", "openai"), attribute.String("gen_ai.request.model", "gpt-4"), attribute.Int("gen_ai.usage.input_tokens", 150), attribute.Int("gen_ai.usage.output_tokens", 75), ) span.End() tp.ForceFlush(ctx) } ``` #### gRPC View Python example ```python from opentelemetry import trace from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.sdk.resources import Resource resource = Resource(attributes={ "service.name": "my-custom-agent", "service.version": "1.0.0" }) # gRPC endpoint without https:// prefix exporter = OTLPSpanExporter( endpoint=".pipelines..clusters.rdpa.co:443", headers={"authorization": "Bearer YOUR_TOKEN"} ) provider = TracerProvider(resource=resource) provider.add_span_processor(BatchSpanProcessor(exporter)) trace.set_tracer_provider(provider) ``` View Node.js example ```javascript const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc'); const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base'); const { Resource } = require('@opentelemetry/resources'); const resource = new Resource({ 'service.name': 'my-custom-agent', 'service.version': '1.0.0' }); // gRPC exporter with TLS const exporter = new OTLPTraceExporter({ url: 'https://.pipelines..clusters.rdpa.co:443', headers: { 'authorization': 'Bearer YOUR_TOKEN' } }); const provider = new NodeTracerProvider({ resource }); provider.addSpanProcessor(new BatchSpanProcessor(exporter)); provider.register(); ``` View Go example ```go package main import ( "context" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" "google.golang.org/grpc" "google.golang.org/grpc/credentials" ) func createGRPCExporter(ctx context.Context) (*otlptracegrpc.Exporter, error) { return otlptracegrpc.New(ctx, otlptracegrpc.WithEndpoint(".pipelines..clusters.rdpa.co:443"), otlptracegrpc.WithDialOption(grpc.WithTransportCredentials(credentials.NewTLS(nil))), otlptracegrpc.WithHeaders(map[string]string{ "authorization": "Bearer YOUR_TOKEN", }), ) } ``` > 💡 **TIP** > > Use environment variables for the endpoint URL and authentication token to keep credentials out of your code. ### [](#use-recommended-semantic-conventions)Use recommended semantic conventions The Transcripts view recognizes [OpenTelemetry semantic conventions for GenAI operations](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/). Following these conventions ensures your traces display correctly with proper attribution, token usage, and operation identification. #### [](#required-attributes-for-agent-operations)Required attributes for agent operations Following the OpenTelemetry semantic conventions, agent spans should include these attributes: - Operation identification: - `gen_ai.operation.name` - Set to `"invoke_agent"` for agent execution spans - `gen_ai.agent.name` - Human-readable name of your agent (displayed in Transcripts view) - LLM provider details: - `gen_ai.provider.name` - LLM provider identifier (e.g., `"openai"`, `"anthropic"`, `"gcp.vertex_ai"`) - `gen_ai.request.model` - Model name (e.g., `"gpt-4"`, `"claude-sonnet-4"`) - Token usage (for cost tracking): - `gen_ai.usage.input_tokens` - Number of input tokens consumed - `gen_ai.usage.output_tokens` - Number of output tokens generated - Session correlation: - `gen_ai.conversation.id` - Identifier linking related agent invocations in the same conversation #### [](#required-attributes-for-proper-display)Required attributes for proper display Set these attributes on your spans for proper display and filtering in the Transcripts view: | Attribute | Purpose | | --- | --- | | gen_ai.operation.name | Set to "invoke_agent" for agent execution spans | | gen_ai.agent.name | Human-readable name displayed in Transcripts view | | gen_ai.provider.name | LLM provider (e.g., "openai", "anthropic") | | gen_ai.request.model | Model name (e.g., "gpt-4", "claude-sonnet-4") | | gen_ai.usage.input_tokens / gen_ai.usage.output_tokens | Token counts for cost tracking | | gen_ai.conversation.id | Links related agent invocations in the same conversation | See the code examples earlier in this page for how to set these attributes in Python, Node.js, or Go. ### [](#validate-trace-format)Validate trace format Before deploying to production, verify your traces match the expected format: 1. Run your agent locally and enable debug logging in your OpenTelemetry SDK to inspect outgoing spans. 2. Verify required fields are present: - `traceId`, `spanId`, `name` - `startTimeUnixNano`, `endTimeUnixNano` - `instrumentationScope` with a `name` field - `status` with a `code` field (1 for success, 2 for error) 3. Check that `service.name` is set in the resource attributes to identify your agent in the Transcripts view. 4. Verify GenAI semantic convention attributes if you want proper display in the Transcripts view: - `gen_ai.operation.name` set to `"invoke_agent"` for agent spans - `gen_ai.agent.name` for agent identification - Token usage attributes if tracking costs ## [](#verify-trace-ingestion)Verify trace ingestion After deploying your pipeline and configuring your custom agent, verify traces are flowing correctly. ### [](#consume-traces-from-the-topic)Consume traces from the topic Check that traces are being published to the `redpanda.otel_traces` topic: ```bash rpk topic consume redpanda.otel_traces --offset end -n 10 ``` You can also view the `redpanda.otel_traces` topic in the **Topics** page of Redpanda Cloud UI. Look for spans with your custom `instrumentationScope.name` to identify traces from your agent. ### [](#view-traces-in-transcripts)View traces in Transcripts After your custom agent sends traces through the pipeline, they appear in your cluster’s **Agentic AI > Transcripts** view alongside traces from Remote MCP servers, declarative agents, and AI Gateway. #### [](#identify-custom-agent-transcripts)Identify custom agent transcripts Custom agent transcripts are identified by the `service.name` resource attribute, which differs from Redpanda’s built-in services (`ai-agent` for declarative agents, `mcp-{server-id}` for MCP servers). See [Cross-service transcripts](../concepts/#cross-service-transcripts) to understand how the `service.name` attribute identifies transcript sources. Your custom agent transcripts display with: - **Service name** in the service filter dropdown (from your `service.name` resource attribute) - **Agent name** in span details (from the `gen_ai.agent.name` attribute) - **Operation names** like `"invoke_agent my-assistant"` indicating agent executions For detailed instructions on filtering, searching, and navigating transcripts in the UI, see [View Transcripts](../transcripts/). #### [](#token-usage-tracking)Token usage tracking If your spans include the recommended token usage attributes (`gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens`), they display in the summary panel’s token usage section. This enables cost tracking alongside Remote MCP server and declarative agent transcripts. ## [](#troubleshooting)Troubleshooting If traces from your custom agent aren’t appearing in the Transcripts view, use these diagnostic steps to identify and resolve common ingestion issues. ### [](#pipeline-not-receiving-requests)Pipeline not receiving requests If your custom agent cannot reach the ingestion endpoint: 1. Verify the endpoint URL format: - HTTP: `https://.pipelines..clusters.rdpa.co/v1/traces` - gRPC: `.pipelines..clusters.rdpa.co:443` (no `https://` prefix for gRPC clients) 2. Check network connectivity and firewall rules. 3. Ensure authentication tokens are valid and properly formatted in the `Authorization: Bearer ` header (HTTP) or `authorization` metadata field (gRPC). 4. Verify the Content-Type header matches your data format (`application/x-protobuf` or `application/json`). 5. Review pipeline logs for connection errors or authentication failures. ### [](#traces-not-appearing-in-topic)Traces not appearing in topic If requests succeed but traces do not appear in `redpanda.otel_traces`: 1. Check pipeline output configuration. 2. Verify topic permissions. 3. Validate trace format matches OTLP specification. ## [](#limitations)Limitations - The `otlp_http` and `otlp_grpc` inputs accept only traces, logs, and metrics, not profiles. - Only traces are published to the `redpanda.otel_traces` topic. - Exceeded rate limits return HTTP 429 (HTTP) or ResourceExhausted status (gRPC). ## [](#next-steps)Next steps - [View Transcripts](../transcripts/) - [Observability for declarative agents](../../agents/monitor-agents/) - [OTLP HTTP input reference](../../../develop/connect/components/inputs/otlp_http/) - Complete configuration options for the `otlp_http` component - [OTLP gRPC input reference](../../../develop/connect/components/inputs/otlp_grpc/) - Alternative gRPC-based trace ingestion --- # Page 49: View Transcripts **URL**: https://docs.redpanda.com/redpanda-cloud/ai-agents/observability/transcripts.md --- # View Transcripts --- title: View Transcripts latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: observability/transcripts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: observability/transcripts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/ai-agents/pages/observability/transcripts.adoc description: Filter and navigate the Transcripts interface to investigate end-to-end agent execution records stored on Redpanda's immutable log. page-topic-type: how-to personas: agent_developer, platform_admin learning-objective-1: Filter transcripts to find specific execution traces learning-objective-2: Use the timeline interactively to navigate to specific time periods learning-objective-3: Navigate between detail views to inspect span information at different levels page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-19" --- > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). Use the Transcripts view to filter, inspect, and debug agent execution records. Filter by operation type, time range, or service to isolate specific executions, then drill into span hierarchies to trace request flow and identify where failures or performance bottlenecks occur. For conceptual background on spans and trace structure, see [Transcripts and AI Observability](../concepts/). After reading this page, you will be able to: - Filter transcripts to find specific execution traces - Use the timeline interactively to navigate to specific time periods - Navigate between detail views to inspect span information at different levels ## [](#prerequisites)Prerequisites - [Running agent](../../agents/create-agent/) or [MCP server](../../mcp/remote/quickstart/) with at least one execution - Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` [topic](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#topic)) ## [](#navigate-the-transcripts-interface)Navigate the Transcripts interface ### [](#filter-transcripts)Filter transcripts Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript table updates to show only matching results. The Transcripts view provides several quick-filter buttons: - **Service**: Isolate operations from a particular component in your agentic data plane (agents, MCP servers, or AI Gateway) - **LLM Calls**: Inspect large language model (LLM) invocations, including chat completions and embeddings - **Tool Calls**: View tool executions by agents - **Agent Spans**: Inspect agent invocation and reasoning - **Errors Only**: Filter for failed operations or errors - **Slow (>5s)**: Isolate operations that exceeded five seconds in duration, useful for performance investigation You can combine multiple filters to narrow results further. For example, use **Tool Calls** and **Errors Only** together to investigate failed tool executions. Toggle **Full traces** on to see the complete execution context, in grayed-out text, for the filtered transcripts in the table. #### [](#filter-by-attribute)Filter by attribute Click the **Attribute** button to query exact matches on specific span metadata such as the following: - Agent names - LLM model names, for example, `gemini-3-flash-preview` - Tool names - Span and trace IDs You can add multiple attribute filters to refine results. ### [](#use-the-interactive-timeline)Use the interactive timeline Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from specific time windows when investigating issues that occurred at known times. Click on any bar in the timeline to zoom into transcripts from that specific time period. The transcript table automatically scrolls to show operations from the time bucket in view. > 📝 **NOTE** > > When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of data currently loaded into view, which may be narrower than your selected time range. > > Refer to the timeline header to check the exact range and count of visible transcripts, for example, "Showing 100 of 299 transcripts from 13:17 to 15:16". ## [](#inspect-span-details)Inspect span details The transcript table shows: - **Time**: When the [span](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#span) started (sortable) - **Span**: Span type and name with hierarchical tree structure - **Duration**: Total time or relative duration shown as visual bars To view nested operations, expand any parent span. To learn more about span hierarchies and cross-service traces, see [Transcripts and AI Observability](../concepts/). Click any span to view details in the panel: - **Summary tab**: High-level overview with token usage, operation counts, and conversation history. - **Attributes tab**: Structured metadata for debugging (see [standard attributes by layer](../concepts/#key-attributes-by-layer)). - **Raw data tab**: Complete [OpenTelemetry](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#opentelemetry) span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. > 📝 **NOTE** > > Rows labeled "awaiting root — waiting for parent span" indicate incomplete [traces](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#trace). This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. ## [](#common-investigation-tasks)Common investigation tasks The following patterns demonstrate how to use the Transcripts view for understanding and troubleshooting your agentic systems. ### [](#debug-errors)Debug errors 1. Use **Errors Only** to filter for failed operations, or review the timeline to identify and zoom in to when errors began occurring. 2. Expand error spans to examine the failure context. 3. Check preceding tool call arguments and LLM responses for root cause. ### [](#investigate-performance-issues)Investigate performance issues 1. Use the **Slow (>5s)** filter to identify operations with high latency. 2. Expand slow spans to identify bottlenecks in the execution tree. 3. Compare duration bars across similar operations to spot anomalies. ### [](#analyze-tool-usage)Analyze tool usage 1. Apply the **Tool Calls** filter and optionally use the **Attribute** filter to focus on a specific tool. 2. Review tool execution frequency in the timeline. 3. Click individual tool call spans to inspect arguments and responses. 1. Check the Description field to understand tool invocation context. 2. Use the Arguments field to verify correct parameter passing. ### [](#monitor-llm-interactions)Monitor LLM interactions 1. Click **LLM Calls** to focus on model invocations and optionally filter by model name and provider using the **Attribute** filter. 2. Review token usage patterns across different time periods. 3. Examine conversation history to understand model behavior. 4. Spot unexpected model calls or token consumption spikes. ### [](#trace-multi-service-operations)Trace multi-service operations 1. Locate the parent agent or gateway span in the transcript table. 2. Use the **Attribute** filter to follow the trace ID through agent and MCP server boundaries. 3. Expand the transcript tree to reveal child spans across services. 4. Review durations to understand where latency occurs in distributed calls. ## [](#next-steps)Next steps - [Monitor Agent Activity](../../agents/monitor-agents/) - [Monitor MCP Server Activity](../../mcp/remote/monitor-mcp-servers/) - [Troubleshoot AI Agents](../../agents/troubleshooting/) --- # Page 50: Manage Billing **URL**: https://docs.redpanda.com/redpanda-cloud/billing.md --- # Manage Billing --- title: Manage Billing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/index.adoc description: Learn about the metrics Redpanda uses to measure consumption and about subscriptions with committed use. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- - [Billing and Support](billing/) Learn about the metrics Redpanda uses to measure consumption in Redpanda Cloud. - [Manage Billing Notifications](billing-notifications/) Manage billing notifications in Redpanda Cloud: what alerts you receive, who receives them, and how to configure your notification preferences. - [Use AWS Commitments](aws-commit/) Subscribe to Redpanda in AWS Marketplace with committed use. - [Use Azure Commitments](azure-commit/) Subscribe to Redpanda in Azure Marketplace with committed use. - [Use GCP Commitments](gcp-commit/) Subscribe to Redpanda in Google Cloud Marketplace with committed use. - [Use AWS Pay As You Go](aws-pay-as-you-go/) Subscribe to Redpanda in AWS Marketplace with pay-as-you-go billing, and cancel anytime. --- # Page 51: Use AWS Commitments **URL**: https://docs.redpanda.com/redpanda-cloud/billing/aws-commit.md --- # Use AWS Commitments --- title: Use AWS Commitments latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: aws-commit page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: aws-commit.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/aws-commit.adoc description: Subscribe to Redpanda in AWS Marketplace with committed use. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-10-17" --- You can subscribe to Redpanda Cloud through AWS Marketplace and use your existing marketplace billing and credits to quickly provision clusters. View your bills and manage your subscription directly in the marketplace. With a usage-based billing commitment, you sign up for a minimum spend amount. Commitments are minimums: - If you use less than your committed amount, you still pay the minimum. Any unused amount on a monthly commitment rolls over to the next month until the end of your term. - If you use more than your committed amount, you can continue using Redpanda Cloud without interruption. You’re charged for any additional usage until the end of your term. > ❗ **IMPORTANT** > > When you subscribe to Redpanda Cloud through AWS Marketplace, you can only create clusters on AWS. ## [](#sign-up-in-aws-marketplace)Sign up in AWS Marketplace 1. Contact [Redpanda Sales](https://redpanda.com/contact) to request a private offer with possible discounts. 2. You will receive a private offer on AWS Marketplace. Review the policy and required terms, and click **Accept**. > 📝 **NOTE** > > If you don’t have a billing account associated with your project, you’re prompted to enable billing to link the subscription with a billing account. You are taken to the Redpanda sign-up page. 3. On the Redpanda sign-up page: - For **Email**, enter your email address to register with Redpanda. - For **Organization name**, enter a name for your new organization connected through AWS Marketplace. Redpanda organizations contain all resources, including clusters and networks. - Click **Sign up and create organization**. You will receive an email sent to the address you entered. 4. In the email, click **Verify email address**. This completes the registration and associates the email with a Redpanda account. 5. On the **Accept your invitation to sign up** page, click **Sign up** or **Log in**. You can now create resource groups, clusters, and networks in your organization. ## [](#next-steps)Next steps - [Create a Serverless cluster](../../get-started/cluster-types/serverless/#create-a-serverless-cluster) - [Create a BYOC cluster](../../get-started/cluster-types/byoc/) - [Create a Dedicated cluster](../../get-started/cluster-types/create-dedicated-cloud-cluster/#create-a-dedicated-cluster) --- # Page 52: Use AWS Pay As You Go **URL**: https://docs.redpanda.com/redpanda-cloud/billing/aws-pay-as-you-go.md --- # Use AWS Pay As You Go --- title: Use AWS Pay As You Go latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: aws-pay-as-you-go page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: aws-pay-as-you-go.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/aws-pay-as-you-go.adoc description: Subscribe to Redpanda in AWS Marketplace with pay-as-you-go billing, and cancel anytime. page-git-created-date: "2024-09-19" page-git-modified-date: "2025-03-25" --- Subscribe to Redpanda Cloud through AWS Marketplace to quickly provision Serverless and Dedicated clusters. With a usage-based pay-as-you-go subscription, you only pay for what you use and can cancel anytime. > ❗ **IMPORTANT** > > When you sign up for Redpanda Cloud through AWS Marketplace, you can only create clusters on AWS. ## [](#sign-up-in-aws-marketplace)Sign up in AWS Marketplace 1. In the AWS Marketplace, select [**Redpanda Cloud - The proven Apache Kafka alternative (Pay as You Go)**](https://aws.amazon.com/marketplace/pp/prodview-ecbu7wwsfh644?applicationId=AWSMPContessa&ref_=beagle&sr=0-3). 2. On the **Redpanda Cloud - Pay as You Go** overview page, click **View purchase options**, then click **Subscribe**. > 📝 **NOTE** > > If you don’t have a billing account associated with your project, you’re prompted to link the subscription with a billing account. 3. On the **Subscribe to Redpanda Cloud** page, click **Set up your account**. You’re taken to the Redpanda sign-up page. 4. On the Redpanda sign-up page: - For **Email**, enter your email address to register with Redpanda. - For **Organization name**, enter a name for your new organization connected through AWS Marketplace. > 💡 **TIP** > > This process creates a new organization, even for existing Redpanda customers. Organizations contain all resources, including clusters and networks. - Click **Sign up and create organization**. You will receive an email sent to the address you entered. 5. In the email, click **Verify email address**. This associates the email with a Redpanda account. 6. On the **Accept your invitation to sign up** page, enter the credentials you want to use for Redpanda Cloud. You can now create resource groups, networks, and clusters in your organization. ## [](#next-steps)Next steps - [Create a Serverless cluster](../../get-started/cluster-types/serverless/#create-a-serverless-cluster) - [Create a Dedicated cluster](../../get-started/cluster-types/create-dedicated-cloud-cluster/#create-a-dedicated-cluster) --- # Page 53: Use Azure Commitments **URL**: https://docs.redpanda.com/redpanda-cloud/billing/azure-commit.md --- # Use Azure Commitments --- title: Use Azure Commitments latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: azure-commit page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: azure-commit.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/azure-commit.adoc description: Subscribe to Redpanda in Azure Marketplace with committed use. page-git-created-date: "2024-10-30" page-git-modified-date: "2025-10-17" --- You can subscribe to Redpanda Cloud through Azure Marketplace and use your existing marketplace billing and credits to quickly provision clusters. View your bills and manage your subscription directly in the marketplace. With a usage-based billing commitment, you sign up for a monthly or an annual minimum spend amount. Commitments are minimums: - If you use less than your committed amount, you still pay the minimum. Any unused amount on a monthly commitment rolls over to the next month until the end of your term. - If you use more than your committed amount, you can continue using Redpanda Cloud without interruption. You’re charged for any additional usage until the end of your term. > ❗ **IMPORTANT** > > When you subscribe to Redpanda Cloud through Azure Marketplace, you can only create clusters on Azure. ## [](#sign-up-in-azure-marketplace)Sign up in Azure Marketplace 1. Contact [Redpanda sales](https://redpanda.com/contact) to request a private offer with possible discounts. You will receive a private offer on Azure Marketplace. This offer is associated with an Azure user account that has access to the Azure subscription used for billing. 2. In Azure Marketplace, review the policy and required terms, and click **Accept**. You are taken to the Redpanda sign-up page. 3. On the Redpanda sign-up page: - For **Email**, enter your email address to register with Redpanda. - For **Organization name**, enter a name for your new organization connected through Azure Marketplace. Redpanda organizations contain all resources, including clusters and networks. - Click **Sign up and create organization**. You will receive an email sent to the address you entered. 4. In the email, click **Verify email address**. This completes the registration and associates the email with a Redpanda account. 5. On the **Accept your invitation to sign up** page, click **Sign up** or **Log in**. You can now create resource groups, clusters, and networks in your organization. ## [](#next-steps)Next steps - [Create a BYOC cluster](../../get-started/cluster-types/byoc/) - [Create a Dedicated cluster](../../get-started/cluster-types/create-dedicated-cloud-cluster/) --- # Page 54: Manage Billing Notifications **URL**: https://docs.redpanda.com/redpanda-cloud/billing/billing-notifications.md --- # Manage Billing Notifications --- title: Manage Billing Notifications latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: billing-notifications page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: billing-notifications.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/billing-notifications.adoc description: "Manage billing notifications in Redpanda Cloud: what alerts you receive, who receives them, and how to configure your notification preferences." page-topic-type: how-to personas: platform_admin, evaluator learning-objective-1: Identify the billing notifications Redpanda Cloud sends and their thresholds learning-objective-2: Configure which users in your organization receive billing notifications learning-objective-3: Opt out of billing notification emails for yourself or your organization page-git-created-date: "2026-03-27" page-git-modified-date: "2026-04-07" --- Redpanda Cloud sends email notifications to help you monitor your billing balance. Organization admins receive alerts when credit or commit balances reach spending thresholds. In this guide, you will: - Identify the billing notifications Redpanda Cloud sends and their thresholds - Configure which users in your organization receive billing notifications - Opt out of billing notification emails for yourself or your organization ## [](#what-notifications-you-receive)What notifications you receive Redpanda Cloud monitors your balance and sends a notification when it crosses each threshold. Each threshold triggers one notification. If your balance crosses the same threshold again after adding credits, you may receive another notification at that level. | Notification | Description | Thresholds | | --- | --- | --- | | Low credit balance | Sent when your pre-paid credit balance is running low. Credits are drawn down by usage, similar to a prepaid account. | 50%, 30%, 10%, 0% remaining | | Low commit balance | Sent when your contractual commit balance is running low. Commits represent a minimum spend over a contract period. | 50%, 30%, 10%, 0% remaining | Notifications are sent to email only. The subject line follows this format: `Action Required: Your Redpanda Cloud is % remaining` ## [](#who-receives-notifications)Who receives notifications All users with the **Admin** role in your organization receive billing notifications by default. To change who receives notifications, update role assignments on the **Organization IAM** page. See [Role-Based Access Control](../../security/authorization/rbac/rbac/) or [Group-Based Access Control](../../security/authorization/gbac/gbac/). ## [](#opt-out-of-notifications)Opt out of notifications ### [](#individual-opt-out)Individual opt-out To stop receiving billing notification emails: - Open any billing notification email. - Click the **Unsubscribe** or **Manage notification preferences** link at the bottom of the email. No support ticket is needed. The change takes effect within 24-48 hours. ### [](#organization-wide-opt-out)Organization-wide opt-out To disable billing notifications for all admins in your organization, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). > 📝 **NOTE** > > If billing notifications are enabled for the organization, individual admins who have not unsubscribed will continue to receive notifications. ## [](#common-questions)Common questions - I didn’t sign up for these emails. Why am I receiving them? Billing notifications are sent automatically to all organization admins. If you don’t want to receive them, click the **Unsubscribe** link at the bottom of the email. - I got an alert but I already added credits. Why? Notifications are triggered when your balance crosses a threshold. If you added credits after the threshold was crossed, the notification was already queued. If your balance later crosses the same threshold again (for example, after adding credits and then using them), you may receive another notification. - Who else in my organization is getting these? All users with the Admin role receive billing notifications. To see who has the Admin role, check the **Organization IAM** > **Users** page in Redpanda Cloud. - I unsubscribed but still received a notification. What happened? Unsubscribe requests take 24-48 hours to process. If you receive a notification during that window, it was sent before your request was fully applied. - What should I do when I get an alert? Review your current balance on the **Billing** page. You can add credits or contact your Redpanda account team to discuss your usage and plan options. - Do trial accounts get notifications? Only if the trial has promotional credits. Standard trial accounts without a credit balance do not receive billing notifications. --- # Page 55: Billing and Support **URL**: https://docs.redpanda.com/redpanda-cloud/billing/billing.md --- # Billing and Support --- title: Billing and Support latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: billing page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: billing.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/billing.adoc description: Learn about the metrics Redpanda uses to measure consumption in Redpanda Cloud. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-03-27" --- Redpanda Cloud uses various metrics to measure the consumption of resources. - All pricing is set in US dollars (USD). - All billing computations are conducted in Coordinated Universal Time (UTC). Billing accrues at hourly intervals. Any usage that is less than an hour is billed for the full hour. - The **Billing** page shows detailed billing activity for each cluster and lets you manage payment methods. Redpanda charges the credit card marked as the default. The most recently added credit card becomes the default payment method, unless you select a different one. - To download a CSV summary of your monthly charges per resource, click the download icon on the **Billing** page. You can use the exported file for record keeping or to import into your billing system. > 💡 **TIP** > > Redpanda Cloud can notify you when your credit or commit balance is running low. See [Manage Billing Notifications](../billing-notifications/). > 📝 **NOTE** > > Pricing information is available on [redpanda.com](https://www.redpanda.com/price-estimator). For questions about billing, contact [billing@redpanda.com](mailto:billing@redpanda.com). ## [](#usage-based-billing-metrics)Usage-based billing metrics ### Serverless Pricing for Serverless clusters depends on the data in, data out, data stored, partitions (virtual streams), and the time the instance is up. The cost for each Serverless metric varies based on the region you select for your cluster. | Metric | Description | | --- | --- | | Uptime | Tracks the number of hours the instance is running.NOTE: Uptime is not charged if partitions = 0 and storage = 0. This condition is met when all topics are deleted. | | Ingress | Tracks the data written into Redpanda (in GB).All Kafka protocol requests (except message headers) are counted as ingress as soon as they are read by Redpanda’s proxy process. | | Egress | Tracks the data read out of Redpanda (in GB).All Kafka protocol responses generated by the cluster (except message headers) are counted as egress as soon as the cluster processes the request, even if the client drops the connection before they are delivered. | | Partitions | Tracks the number of partitions used per hour. | | Storage | Tracks the data in object storage per hour (in GB). | See also: [Serverless limits](../../get-started/cluster-types/serverless/#serverless-usage-limits) ### Dedicated Pricing for Dedicated clusters depends on the time the instance is up, the data in, data out, and data stored. | Metric | Description | | --- | --- | | Uptime | Tracks the number of hours the instance is running.The cost varies based on the region and tier you select for your cluster. | | Ingress | Tracks the data written into Redpanda (in GB).All Kafka protocol requests (including message headers) are counted as ingress as soon as they are read by Redpanda’s proxy process.The cost varies based on the region you select for your cluster. | | Egress | Tracks the data read out of Redpanda (in GB).All Kafka protocol responses generated by the cluster (including message headers) are counted as egress as soon as the cluster processes the request, even if the client drops the connection before they are delivered.The cost varies based on the number of availability zones (AZ) you select for your cluster. | | Storage | Tracks the usage of object storage on an hourly basis during the billing period (in GB-hours).Replication to object storage is implemented with Tiered Storage. All topics have a fixed replication factor of 3, but Redpanda counts each byte only once. | ### BYOC Pricing for BYOC clusters depends on compute, data in, data out, and data stored. The rate decreases as usage increases. | Metric | Description | | --- | --- | | Compute | Tracks the server resources (vCPU and memory) a cluster uses on an hourly basis in Redpanda units (RPUs). Where:1 RPU = 2 vCPU + 8 GB memory | | Ingress | Tracks the data written into Redpanda (in GB).All Kafka protocol requests (including message headers) are counted as ingress as soon as they are read by Redpanda’s proxy process. | | Ingress to Iceberg topics | Tracks the data written to Iceberg tables per hour (in GB).NOTE: This metric applies only if you write to Iceberg topics. This charge is in addition to the standard ingress charge. | | Egress | Tracks the data read out of Redpanda (in GB).All Kafka protocol responses generated by the cluster (including message headers) are counted as egress as soon as the cluster processes the request, even if the client drops the connection before they are delivered.The cost varies based on the number of availability zones (AZ) you select for your cluster. | | Storage | Tracks the usage of object storage on an hourly basis during the billing period (in GB-hours).Replication to object storage is implemented with Tiered Storage. All topics have a fixed replication factor of 3, but Redpanda counts each byte only once. | ## [](#redpanda-connect-billing-metrics)Redpanda Connect billing metrics Pricing per pipeline depends on the compute units you allocate. The cost of a compute unit can vary based on the cloud provider and region you select for your cluster. | Metric | Description | | --- | --- | | Compute | Tracks the server resources (vCPU and memory) a pipeline uses in compute units per hour. Where:1 compute unit = 0.1 CPU + 400 MB memory | ## [](#remote-mcp-billing-metrics)Remote MCP billing metrics Remote MCP usage appears as a separate line item on your invoice and uses the same pricing structure as Redpanda Connect. Pricing per MCP server depends on the compute units you allocate. The cost of a compute unit can vary based on the cloud provider and region you select for your cluster. | Metric | Description | | --- | --- | | Compute | Tracks the server resources (vCPU and memory) an MCP server uses in compute units per hour. Where:1 compute unit = 0.1 CPU + 400 MB memory | > 📝 **NOTE** > > Compute units for Remote MCP use the same definition and rates as those for Redpanda Connect. MCP servers automatically emit OpenTelemetry traces to the [`redpanda.otel_traces` topic](../../ai-agents/observability/concepts/#opentelemetry-traces-topic). For Serverless clusters, usage of this system-managed traces topic is not billed. You will not incur ingress, egress, storage, or partition charges for trace data. For Dedicated and BYOC clusters, standard billing metrics apply to the traces topic. ## [](#support-plans)Support plans All organizations in Redpanda require one of the following support plans: | Support plan | Features | | --- | --- | | Basic | Designed for non-production environmentsProvides minimal support: priority 3 tickets within 8 business hours response time and priority 4 tickets with no target response timeSupport availability is 8:00 AM to 5:00 PM Pacific Time, Monday through Friday, excluding federal US holidays | | Enterprise | Designed for production environments needing continuous availabilityP1/P2 tickets may be submittedSupport availability is 24/7, including holidays | | Premium | Designed for mission-critical workloads30-minute response times for production outagesIncludes a named Customer Success Manager to support planning and coordination, and 10 hours per month of consulting from a Solutions ArchitectRequired for deployments with BYOVPC/BYOVnet clusters | ## [](#next-steps)Next steps - [Use AWS Commitments](../aws-commit/) - [Use Azure Commitments](../azure-commit/) - [Use GCP Commitments](../gcp-commit/) - [Create a Serverless cluster](../../get-started/cluster-types/serverless/#create-a-serverless-cluster) - [Create a Dedicated cluster](../../get-started/cluster-types/create-dedicated-cloud-cluster/#create-a-dedicated-cluster) - [Create a BYOC cluster](../../get-started/cluster-types/byoc/) --- # Page 56: Use GCP Commitments **URL**: https://docs.redpanda.com/redpanda-cloud/billing/gcp-commit.md --- # Use GCP Commitments --- title: Use GCP Commitments latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: gcp-commit page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: gcp-commit.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/billing/pages/gcp-commit.adoc description: Subscribe to Redpanda in Google Cloud Marketplace with committed use. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-10-17" --- You can subscribe to Redpanda Cloud through Google Cloud Marketplace and use your existing marketplace billing and credits to quickly provision clusters. View your bills and manage your subscription directly in the marketplace. With a usage-based billing commitment, you sign up for a monthly or an annual minimum spend amount. Commitments are minimums: - If you use less than your committed amount, you still pay the minimum. Any unused amount on a monthly commitment rolls over to the next month until the end of your term. - If you use more than your committed amount, you can continue using Redpanda Cloud without interruption. You’re charged for any additional usage until the end of your term. > ❗ **IMPORTANT** > > When you subscribe to Redpanda Cloud through Google Cloud Marketplace, you can only create clusters on GCP. ## [](#sign-up-in-google-cloud-marketplace)Sign up in Google Cloud Marketplace 1. Contact [Redpanda sales](https://redpanda.com/contact) to request a private offer with possible discounts. 2. You will receive a private offer on Google Cloud Marketplace. Review the policy and required terms, and click **Accept**. > 📝 **NOTE** > > If you don’t have a billing account associated with your project, you’re prompted to enable billing to link the subscription with a billing account. You are taken to the Redpanda sign-up page. 3. On the Redpanda sign-up page: - For **Email**, enter your email address to register with Redpanda. - For **Organization name**, enter a name for your new organization connected through Google Cloud Marketplace. Redpanda organizations contain all resources, including clusters and networks. - Click **Sign up and create organization**. You will receive an email sent to the address you entered. 4. In the email, click **Verify email address**. This completes the registration and associates the email with a Redpanda account. 5. On the **Accept your invitation to sign up** page, click **Sign up** or **Log in**. You can now create resource groups, clusters, and networks in your organization. ## [](#next-steps)Next steps - [Create a BYOC cluster](../../get-started/cluster-types/byoc/) - [Create a Dedicated cluster](../../get-started/cluster-types/create-dedicated-cloud-cluster/) --- # Page 57: Develop **URL**: https://docs.redpanda.com/redpanda-cloud/develop.md --- # Develop --- title: Develop latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/index.adoc description: Develop doc topics. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-06-07" --- - [Kafka Compatibility](kafka-clients/) Kafka clients, version 0.11 or later, are compatible with Redpanda. Validations and exceptions are listed. - [Topics](topics/) Overview of standard topics in Redpanda Cloud. - [Produce Data](produce-data/) Learn how to configure producers and idempotent producers. - [Consume Data](consume-data/) Learn about consumer offsets and follower fetching. - [Use Redpanda with the HTTP Proxy API](http-proxy/) HTTP Proxy exposes a REST API to list topics, produce events, and subscribe to events from topics using consumer groups. - [Data Transforms](data-transforms/) Learn about WebAssembly data transforms within Redpanda Cloud. - [Transactions](transactions/) Learn how to use transactions; for example, you can fetch messages starting from the last consumed offset and transactionally process them one by one, updating the last consumed offset and producing events at the same time. - [Kafka Connect](managed-connectors/) Use Kafka Connect to stream data into and out of Redpanda. --- # Page 58: Redpanda Connect in Redpanda Cloud **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/about.md --- # Redpanda Connect in Redpanda Cloud --- title: Redpanda Connect in Redpanda Cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/about.adoc description: Learn about Redpanda Connect in Redpanda Cloud and its wide range of connectors. page-git-created-date: "2024-09-09" page-git-modified-date: "2025-08-20" --- Redpanda Connect in Redpanda Cloud lets you quickly build and deploy streaming data pipelines on your clusters from a fully-integrated UI or using the [Data Plane API](/api/doc/cloud-dataplane/group/endpoint-redpanda-connect-pipeline). Choose from a [wide range of connectors](../components/about/) to suit your use case, including connectors to: - Integrate data sources ([inputs](../components/inputs/about/)) - Write to data sinks ([outputs](../components/outputs/about/)) - Transform data ([processors](../components/processors/about/)) Comprehensive data pipeline metrics are also available to help you to [monitor your data pipelines](../configuration/monitor-connect/) and [per pipeline scaling](../configuration/resource-management/). Try this [quickstart](../connect-quickstart/). > 💡 **TIP** > > If you’re new to Redpanda Connect, try [building and testing data pipelines locally](../../../../redpanda-connect/get-started/quickstarts/rpk/) before deploying to the Cloud. --- # Page 59: Components Catalog **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/about.md --- # Components Catalog --- title: Components Catalog latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2025-08-08" --- Use the following table to search for available inputs, outputs, and processors. Type: All Types Selected ▼ Processor Input Output Scanner Metric Cache Tracer Rate limit Buffer | Name | Connector Type | | --- | --- | | a2a_message | Processor | | amqp_0_9RabbitMQ AMQP | Input, Output | | archiveZIP TAR GZIP | Processor | | avro | Processor, Scanner | | aws_bedrock_chatAmazon AWS Bedrock Chat | Processor | | aws_bedrock_embeddingsAmazon AWS Bedrock Embeddings | Processor | | aws_cloudwatch_logsAWS CloudWatch Logs Amazon CloudWatch Logs | Input | | aws_dynamodbAWS DynamoDB Amazon DynamoDB DynamoDB | Cache, Output | | aws_dynamodb_cdcAmazon DynamoDB CDC | Input | | aws_dynamodb_partiqlAmazon AWS DynamoDB PartiQL | Processor | | aws_kinesisAWS Kinesis Amazon Kinesis Kinesis | Input, Output | | aws_kinesis_firehoseAWS Kinesis Firehose Amazon Kinesis Firehose Kinesis Firehose | Output | | aws_lambdaAWS Lambda Amazon Lambda Lambda | Processor | | aws_s3AWS S3 Amazon S3 S3 Simple Storage Service | Cache, Input, Output | | aws_snsAWS SNS Amazon SNS SNS Simple Notification Service | Output | | aws_sqsAWS SQS Amazon SQS SQS Simple Queue Service | Input, Output | | azure_blob_storageAzure Blob Storage Microsoft Azure Storage | Input, Output | | azure_cosmosdbMicrosoft Azure Azure | Input, Output, Processor | | azure_data_lake_gen2Microsoft Azure Azure | Output | | azure_queue_storageAzure Queue Storage Microsoft Azure Queue | Input, Output | | azure_table_storageAzure Table Storage Microsoft Azure Table | Input, Output | | batched | Input | | benchmark | Processor | | bloblang | Processor | | bounds_check | Processor | | branch | Processor | | broker | Input, Output | | cache | Output, Processor | | cached | Processor | | catch | Processor | | chunker | Scanner | | cohere_chat | Processor | | cohere_embeddings | Processor | | cohere_rerank | Processor | | compress | Processor | | csvComma-Separated Values | Scanner | | cyborgdb | Output | | decompress | Processor, Scanner | | dedupe | Processor | | drop | Output | | drop_on | Output | | elasticsearch_v8 | Output | | fallback | Output | | for_each | Processor | | gateway | Input | | gcp_bigqueryGCP BigQuery Google BigQuery BigQuery | Output | | gcp_bigquery_selectGCP BigQuery Google Cloud GCP | Input, Processor | | gcp_cloud_storageGCP Cloud Storage Google Cloud Storage GCS | Cache, Input, Output | | gcp_cloudtraceGCP Cloud Trace | Tracer | | gcp_pubsubGCP PubSub Google Cloud Pub/Sub GCP Pub/Sub Google Pub/Sub | Input, Output | | gcp_spanner_cdcGoogle Cloud GCP | Input | | gcp_vertex_ai_chatGCP Vertex AI Google Cloud GCP | Processor | | gcp_vertex_ai_embeddingsGoogle Cloud GCP | Processor | | generate | Input | | git | Input | | google_drive_download | Processor | | google_drive_list_labels | Processor | | google_drive_search | Processor | | group_by | Processor | | group_by_value | Processor | | http | Processor | | http_clientHTTP REST API REST | Input, Output | | http_serverHTTP REST API REST Gateway | Input | | icebergApache Iceberg Apache Polaris AWS Glue Databricks Unity Catalog | Output | | inproc | Input, Output | | insert_part | Processor | | jira | Processor | | jmespath | Processor | | jq | Processor | | json_array | Scanner | | json_documents | Scanner | | json_schemaJSON Schema | Processor | | kafkaApache Kafka | Input, Output | | kafka_franzApache Kafka Kafka | Input, Output | | lines | Scanner | | local | Rate_limit | | log | Processor | | lru | Cache | | mapping | Processor | | memcached | Cache | | memory | Buffer, Cache | | metric | Processor | | microsoft_sql_server_cdc | Input | | mongodbMongo | Cache, Input, Output, Processor | | mongodb_cdcMongoDB CDC | Input | | mqtt | Input, Output | | multilevel | Cache | | mutation | Processor | | mysql_cdc | Input | | natsNATS.io | Input, Output | | nats_jetstreamNATS JetStream NATS | Input, Output | | nats_kvNATS KV | Cache, Input, Output, Processor | | nats_request_replyNATS Request Reply | Processor | | none | Buffer, Metric, Tracer | | noop | Cache, Processor | | openai_chat_completion | Processor | | openai_embeddings | Processor | | openai_image_generation | Processor | | openai_speech | Processor | | openai_transcription | Processor | | openai_translation | Processor | | opensearch | Output | | oracledb_cdcOracle CDC OracleDB CDC Oracle Database CDC | Input | | otlp_grpcOpenTelemetry OTLP OTel gRPC | Input, Output | | otlp_httpOpenTelemetry OTLP OTel | Input, Output | | parallel | Processor | | parquet_decode | Processor | | parquet_encode | Processor | | parse_log | Processor | | pg_stream | | | pinecone | Output | | postgres_cdc | Input | | processors | Processor | | prometheus | Metric | | qdrant | Output, Processor | | questdb | Output | | rate_limit | Processor | | re_match | Scanner | | read_until | Input | | redis | Cache, Processor, Rate_limit | | redis_hashRedis Hash Redis | Output | | redis_listRedis List Redis Lists Redis | Input, Output | | redis_pubsubRedis PubSub Redis Pub/Sub Redis | Input, Output | | redis_scanRedis | Input | | redis_scriptRedis Script | Processor | | redis_streamsRedis Streams Redis | Input, Output | | redpanda | Cache, Input, Output, Tracer | | redpanda_common | Input, Output | | redpanda_migrator | Input, Output | | reject | Output | | reject_errored | Output | | resource | Input, Output, Processor | | retry | Output, Processor | | ristretto | Cache | | schema_registry | Input, Output | | schema_registry_decode | Processor | | schema_registry_encode | Processor | | select_parts | Processor | | sequence | Input | | sftp | Input, Output | | skip_bom | Scanner | | slack | Input | | slack_postSlack Post | Output | | slack_reactionSlack Reaction | Output | | slack_threadSlack Thread | Processor | | slack_usersSlack Users | Input | | sleep | Processor | | snowflake_putSnowflake | Output | | snowflake_streamingSnowflake Streaming | Output | | spicedb_watch | Input | | split | Processor | | splunk | Input | | splunk_hecSplunk | Output | | sql | Cache | | sql_driver_clickhouseClickHouse | | | sql_driver_mysqlMYSQL | | | sql_driver_oracleOracle | | | sql_driver_postgresPostgreSQL | | | sql_driver_sqliteSQLite | | | sql_insertSQL PostgreSQL MySQL Microsoft SQL Server ClickHouse Trino | Output, Processor | | sql_rawSQL PostgreSQL MySQL Microsoft SQL Server ClickHouse Trino | Input, Output, Processor | | sql_selectSQL PostgreSQL MySQL Microsoft SQL Server ClickHouse Trino | Input, Processor | | string_split | Processor | | switch | Output, Processor, Scanner | | sync_response | Output, Processor | | system_window | Buffer | | tar | Scanner | | text_chunker | Processor | | timeplus | Input, Output | | to_the_end | Scanner | | try | Processor | | ttlru | Cache | | unarchiveZIP TAR GZIP Archive | Processor | | while | Processor | | workflow | Processor | | xml | Processor | ## [](#about-components)About Components Every Redpanda Connect pipeline has at least one [input](../inputs/about/), an optional [buffer](../buffers/about/), an [output](../outputs/about/) and any number of [processors](../processors/about/): ```yaml input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup buffer: type: none pipeline: processors: - mapping: | message = this meta.link_count = links.length() output: aws_s3: bucket: TODO path: '${! meta("kafka_topic") }/${! json("message.id") }.json' ``` These are the main components within Redpanda Connect and they provide the majority of useful behavior. ## [](#observability-components)Observability components There are also the observability components: [logger](../logger/about/), [metrics](../metrics/about/), and [tracing](../tracers/about/), which allow you to specify how Redpanda Connect exposes observability data. ```yaml http: address: 0.0.0.0:4195 enabled: true debug_endpoints: false logger: format: json level: WARN metrics: statsd: address: localhost:8125 flush_period: 100ms tracer: jaeger: agent_address: localhost:6831 ``` ## [](#resource-components)Resource components Finally, there are [caches](../caches/about/) and [rate limits](../rate_limits/about/). These are components that are referenced by core components and can be shared. ```yaml input: http_client: # This is an input url: TODO rate_limit: foo_ratelimit # This is a reference to a rate limit pipeline: processors: - cache: # This is a processor resource: baz_cache # This is a reference to a cache operator: add key: '${! json("id") }' value: "x" - mapping: root = if errored() { deleted() } rate_limit_resources: - label: foo_ratelimit local: count: 500 interval: 1s cache_resources: - label: baz_cache memcached: addresses: [ localhost:11211 ] ``` It’s also possible to configure inputs, outputs and processors as resources which allows them to be reused throughout a configuration with the [`resource` input](../inputs/resource/), [`resource` output](../outputs/resource/) and [`resource` processor](../processors/resource/) respectively. For more information about any of these component types check out their sections: - [inputs](../inputs/about/) - [processors](../processors/about/) - [outputs](../outputs/about/) - [buffers](../buffers/about/) - [metrics](../metrics/about/) - [tracers](../tracers/about/) - [logger](../logger/about/) - [caches](../caches/about/) - [rate limits](../rate_limits/about/) --- # Page 60: Buffers **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/buffers/about.md --- # Buffers --- title: Buffers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/buffers/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/buffers/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/buffers/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect uses a transaction model internally for guaranteeing delivery of messages, this means that a message from an input is not acknowledged (or its offset committed, etc) until that message has been processed and either intentionally deleted or successfully delivered to all outputs. This transaction model makes Redpanda Connect safe to deploy in scenarios where data loss is unacceptable. However, sometimes it’s useful to customize the way in which messages are delivered, and this is where buffers come in. A buffer is an optional component type that comes immediately after the input layer and can be used as a way of decoupling the transaction model from components downstream such as the processing layer and outputs. This is considered an advanced component as most users will likely not benefit from a buffer, but they enable you to do things like group messages using window algorithms or intentionally weaken the delivery guarantees of the pipeline depending on the buffer you choose. Since buffers are able to modify (or disable) the transaction model within Redpanda Connect it is important that when you choose a buffer you read its documentation to understand the implication it will have on delivery guarantees. --- # Page 61: memory **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/buffers/memory.md --- # memory --- title: memory latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/buffers/memory page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/buffers/memory.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/buffers/memory.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Buffer ▼ [Buffer](/redpanda-cloud/develop/connect/components/buffers/memory/)[Cache](/redpanda-cloud/develop/connect/components/caches/memory/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/buffers/memory/ "View the Self-Managed version of this component") Stores consumed messages in memory and acknowledges them at the input level. During shutdown Redpanda Connect will make a best attempt at flushing all remaining messages before exiting cleanly. #### Common ```yml buffers: memory: limit: 524288000 batch_policy: enabled: false count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml buffers: memory: limit: 524288000 batch_policy: enabled: false count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` This buffer is appropriate when consuming messages from inputs that do not gracefully handle back pressure and where delivery guarantees aren’t critical. This buffer has a configurable limit, where consumption will be stopped with back pressure upstream if the total size of messages in the buffer reaches this amount. Since this calculation is only an estimate, and the real size of messages in RAM is always higher, it is recommended to set the limit significantly below the amount of RAM available. ## [](#delivery-guarantees)Delivery guarantees This buffer intentionally weakens the delivery guarantees of the pipeline and therefore should never be used in places where data loss is unacceptable. ## [](#batching)Batching It is possible to batch up messages sent from this buffer using a [batch policy](../../../configuration/batching/#batch-policy). ## [](#fields)Fields ### [](#batch_policy)`batch_policy` Optionally configure a policy to flush buffered messages in batches. **Type**: `object` ### [](#batch_policy-byte_size)`batch_policy.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batch_policy-check)`batch_policy.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batch_policy-count)`batch_policy.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batch_policy-enabled)`batch_policy.enabled` Whether to batch messages as they are flushed. **Type**: `bool` **Default**: `false` ### [](#batch_policy-period)`batch_policy.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batch_policy-processors)`batch_policy.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#limit)`limit` The maximum buffer size (in bytes) to allow before applying backpressure upstream. **Type**: `int` **Default**: `524288000` --- # Page 62: none **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/buffers/none.md --- # none --- title: none latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/buffers/none page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/buffers/none.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/buffers/none.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Buffer ▼ [Buffer](/redpanda-cloud/develop/connect/components/buffers/none/)[Metric](/redpanda-cloud/develop/connect/components/metrics/none/)[Tracer](/redpanda-cloud/develop/connect/components/tracers/none/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/buffers/none/ "View the Self-Managed version of this component") Do not buffer messages. This is the default and most resilient configuration. ```yml # Config fields, showing default values buffer: none: {} ``` Selecting no buffer means the output layer is directly coupled with the input layer. This is the safest and lowest latency option since acknowledgements from at-least-once protocols can be propagated all the way from the output protocol to the input protocol. If the output layer is hit with back pressure it will propagate all the way to the input layer, and further up the data stream. If you need to relieve your pipeline of this back pressure consider using a more robust buffering solution such as Kafka before resorting to alternatives. --- # Page 63: system_window **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/buffers/system_window.md --- # system\_window --- title: system_window latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/buffers/system_window page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/buffers/system_window.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/buffers/system_window.adoc categories: "[\"Windowing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/buffers/system_window/ "View the Self-Managed version of this component") Chops a stream of messages into tumbling or sliding windows of fixed temporal size, following the system clock. #### Common ```yml buffers: system_window: timestamp_mapping: root = now() size: "" # No default (required) slide: "" offset: "" allowed_lateness: "" ``` #### Advanced ```yml buffers: system_window: timestamp_mapping: root = now() size: "" # No default (required) slide: "" offset: "" allowed_lateness: "" ``` A window is a grouping of messages that fit within a discrete measure of time following the system clock. Messages are allocated to a window either by the processing time (the time at which they’re ingested) or by the event time, and this is controlled via the [`timestamp_mapping` field](#timestamp_mapping). In tumbling mode (default) the beginning of a window immediately follows the end of a prior window. When the buffer is initialized the first window to be created and populated is aligned against the zeroth minute of the zeroth hour of the day by default, and may therefore be open for a shorter period than the specified size. A window is flushed only once the system clock surpasses its scheduled end. If an [`allowed_lateness`](#allowed_lateness) is specified then the window will not be flushed until the scheduled end plus that length of time. When a message is added to a window it has a metadata field `window_end_timestamp` added to it containing the timestamp of the end of the window as an RFC3339 string. ## [](#sliding-windows)Sliding windows Sliding windows begin from an offset of the prior windows' beginning rather than its end, and therefore messages may belong to multiple windows. In order to produce sliding windows specify a [`slide` duration](#slide). ## [](#back-pressure)Back pressure If back pressure is applied to this buffer either due to output services being unavailable or resources being saturated, windows older than the current and last according to the system clock will be dropped in order to prevent unbounded resource usage. This means you should ensure that under the worst case scenario you have enough system memory to store two windows' worth of data at a given time (plus extra for redundancy and other services). If messages could potentially arrive with event timestamps in the future (according to the system clock) then you should also factor in these extra messages in memory usage estimates. ## [](#delivery-guarantees)Delivery guarantees This buffer honours the transaction model within Redpanda Connect in order to ensure that messages are not acknowledged until they are either intentionally dropped or successfully delivered to outputs. However, since messages belonging to an expired window are intentionally dropped there are circumstances where not all messages entering the system will be delivered. When this buffer is configured with a slide duration it is possible for messages to belong to multiple windows, and therefore be delivered multiple times. In this case the first time the message is delivered it will be acked (or nacked) and subsequent deliveries of the same message will be a "best attempt". During graceful termination if the current window is partially populated with messages they will be nacked such that they are re-consumed the next time the service starts. ## [](#examples)Examples ### Counting Passengers at Traffic Given a stream of messages relating to cars passing through various traffic lights of the form: ```json { "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", "created_at": "2021-08-07T09:49:35Z", "registration_plate": "AB1C DEF", "passengers": 3 } ``` We can use a window buffer in order to create periodic messages summarizing the traffic for a period of time of this form: ```json { "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", "created_at": "2021-08-07T10:00:00Z", "total_cars": 15, "passengers": 43 } ``` With the following config: ```yaml buffer: system_window: timestamp_mapping: root = this.created_at size: 1h pipeline: processors: # Group messages of the window into batches of common traffic light IDs - group_by_value: value: '${! json("traffic_light") }' # Reduce each batch to a single message by deleting indexes > 0, and # aggregate the car and passenger counts. - mapping: | root = if batch_index() == 0 { { "traffic_light": this.traffic_light, "created_at": meta("window_end_timestamp"), "total_cars": json("registration_plate").from_all().unique().length(), "passengers": json("passengers").from_all().sum(), } } else { deleted() } ``` ## [](#fields)Fields ### [](#timestamp_mapping)`timestamp_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) applied to each message during ingestion that provides the timestamp to use for allocating it a window. By default the function `now()` is used in order to generate a fresh timestamp at the time of ingestion (the processing time), whereas this mapping can instead extract a timestamp from the message itself (the event time). The timestamp value assigned to `root` must either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in ISO 8601 format. If the mapping fails or provides an invalid result the message will be dropped (with logging to describe the problem). **Type**: `string` **Default**: `"root = now()"` ```yml # Examples timestamp_mapping: root = this.created_at timestamp_mapping: root = meta("kafka_timestamp_unix").number() ``` ### [](#size)`size` A duration string describing the size of each window. By default windows are aligned to the zeroth minute and zeroth hour on the UTC clock, meaning windows of 1 hour duration will match the turn of each hour in the day, this can be adjusted with the `offset` field. **Type**: `string` ```yml # Examples size: 30s size: 10m ``` ### [](#slide)`slide` An optional duration string describing by how much time the beginning of each window should be offset from the beginning of the previous, and therefore creates sliding windows instead of tumbling. When specified this duration must be smaller than the `size` of the window. **Type**: `string` **Default**: `""` ```yml # Examples slide: 30s slide: 10m ``` ### [](#offset)`offset` An optional duration string to offset the beginning of each window by, otherwise they are aligned to the zeroth minute and zeroth hour on the UTC clock. The offset cannot be a larger or equal measure to the window size or the slide. **Type**: `string` **Default**: `""` ```yml # Examples offset: -6h offset: 30m ``` ### [](#allowed_lateness)`allowed_lateness` An optional duration string describing the length of time to wait after a window has ended before flushing it, allowing late arrivals to be included. Since this windowing buffer uses the system clock an allowed lateness can improve the matching of messages when using event time. **Type**: `string` **Default**: `""` ```yml # Examples allowed_lateness: 10s allowed_lateness: 1m ``` --- # Page 64: Caches **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/about.md --- # Caches --- title: Caches latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- A cache is a key/value store which can be used by certain components for applications such as deduplication or data joins. Caches are configured as a named resource: ```yaml cache_resources: - label: foobar memcached: addresses: - localhost:11211 default_ttl: 60s ``` > It’s possible to layer caches with read-through and write-through behavior using the [`multilevel` cache](../multilevel/). And then any components that use caches have a field `resource` that specifies the cache resource: ```yaml pipeline: processors: - cache: resource: foobar operator: add key: '${! json("message.id") }' value: "storeme" - mapping: root = if errored() { deleted() } ``` For the simple case where you wish to store messages in a cache as an output destination for your pipeline check out the [`cache` output](../../outputs/cache/). To see examples of more advanced uses of caches such as hydration and deduplication check out the [`cache` processor](../../processors/cache/). --- # Page 65: aws_dynamodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/aws_dynamodb.md --- # aws\_dynamodb --- title: aws_dynamodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/aws_dynamodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/aws_dynamodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/aws_dynamodb.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/aws_dynamodb/)[Output](/redpanda-cloud/develop/connect/components/outputs/aws_dynamodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/aws_dynamodb/ "View the Self-Managed version of this component") Stores key/value pairs as a single document in a DynamoDB table. The key is stored as a string value and used as the table hash key. The value is stored as a binary value using the `data_key` field name. #### Common ```yml caches: aws_dynamodb: table: "" # No default (required) hash_key: "" # No default (required) data_key: "" # No default (required) ``` #### Advanced ```yml caches: aws_dynamodb: table: "" # No default (required) hash_key: "" # No default (required) data_key: "" # No default (required) consistent_read: false default_ttl: "" # No default (optional) ttl_key: "" # No default (optional) retries: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` A prefix can be specified to allow multiple cache types to share a single DynamoDB table. An optional TTL duration (`ttl`) and field (`ttl_key`) can be specified if the backing table has TTL enabled. Strong read consistency can be enabled using the `consistent_read` configuration field. ## [](#fields)Fields ### [](#consistent_read)`consistent_read` Whether to use strongly consistent reads on Get commands. **Type**: `bool` **Default**: `false` ### [](#credentials)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#data_key)`data_key` The key of the table column to store item values within. **Type**: `string` ### [](#default_ttl)`default_ttl` An optional default TTL to set for items, calculated from the moment the item is cached. A `ttl_key` must be specified in order to set item TTLs. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#hash_key)`hash_key` The key of the table column to store item keys within. **Type**: `string` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#retries)`retries` Determine time intervals and cut offs for retry attempts. **Type**: `object` ### [](#retries-initial_interval)`retries.initial_interval` The initial period to wait between retry attempts. **Type**: `string` **Default**: `1s` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#retries-max_elapsed_time)`retries.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. **Type**: `string` **Default**: `30s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#retries-max_interval)`retries.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `5s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` ### [](#table)`table` The table to store items in. **Type**: `string` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#ttl_key)`ttl_key` The column key to place the TTL value within. **Type**: `string` --- # Page 66: aws_s3 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/aws_s3.md --- # aws\_s3 --- title: aws_s3 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/aws_s3 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/aws_s3.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/aws_s3.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/aws_s3/)[Input](/redpanda-cloud/develop/connect/components/inputs/aws_s3/)[Output](/redpanda-cloud/develop/connect/components/outputs/aws_s3/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/aws_s3/ "View the Self-Managed version of this component") Stores each item in an S3 bucket as a file, where an item ID is the path of the item within the bucket. #### Common ```yml caches: aws_s3: bucket: "" # No default (required) content_type: application/octet-stream ``` #### Advanced ```yml caches: aws_s3: bucket: "" # No default (required) content_type: application/octet-stream force_path_style_urls: false retries: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` It is not possible to atomically upload S3 objects exclusively when the target does not already exist, therefore this cache is not suitable for deduplication. ## [](#fields)Fields ### [](#bucket)`bucket` The S3 bucket to store items in. **Type**: `string` ### [](#content_type)`content_type` The content type to set for each item. **Type**: `string` **Default**: `application/octet-stream` ### [](#credentials)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#force_path_style_urls)`force_path_style_urls` Forces the client API to use path style URLs, which helps when connecting to custom endpoints. **Type**: `bool` **Default**: `false` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#retries)`retries` Determine time intervals and cut offs for retry attempts. **Type**: `object` ### [](#retries-initial_interval)`retries.initial_interval` The initial period to wait between retry attempts. **Type**: `string` **Default**: `1s` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#retries-max_elapsed_time)`retries.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. **Type**: `string` **Default**: `30s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#retries-max_interval)`retries.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `5s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 67: gcp_cloud_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/gcp_cloud_storage.md --- # gcp\_cloud\_storage --- title: gcp_cloud_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/gcp_cloud_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/gcp_cloud_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/gcp_cloud_storage.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/gcp_cloud_storage/)[Input](/redpanda-cloud/develop/connect/components/inputs/gcp_cloud_storage/)[Output](/redpanda-cloud/develop/connect/components/outputs/gcp_cloud_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/gcp_cloud_storage/ "View the Self-Managed version of this component") Use a Google Cloud Storage bucket as a cache. ```yml caches: gcp_cloud_storage: bucket: "" # No default (required) content_type: "" # No default (optional) credentials_json: "" ``` It is not possible to atomically upload cloud storage objects exclusively when the target does not already exist, therefore this cache is not suitable for deduplication. ## [](#fields)Fields ### [](#bucket)`bucket` The Google Cloud Storage bucket to store items in. **Type**: `string` ### [](#content_type)`content_type` Optional field to explicitly set the Content-Type. **Type**: `string` ### [](#credentials_json)`credentials_json` An optional field to set Google Service Account Credentials json. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` --- # Page 68: lru **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/lru.md --- # lru --- title: lru latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/lru page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/lru.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/lru.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/lru/ "View the Self-Managed version of this component") Stores key/value pairs in a lru in-memory cache. This cache is therefore reset every time the service restarts. #### Common ```yml caches: lru: cap: 1000 init_values: {} ``` #### Advanced ```yml caches: lru: cap: 1000 init_values: {} algorithm: standard two_queues_recent_ratio: 0.25 two_queues_ghost_ratio: 0.5 optimistic: false ``` This provides the lru package which implements a fixed-size thread safe LRU cache. It uses the package [`lru`](https://github.com/hashicorp/golang-lru/v2) The field init\_values can be used to pre-populate the memory cache with any number of key/value pairs: ```yaml cache_resources: - label: foocache lru: cap: 1024 init_values: foo: bar ``` These values can be overridden during execution. ## [](#fields)Fields ### [](#algorithm)`algorithm` the lru cache implementation **Type**: `string` **Default**: `standard` | Option | Summary | | --- | --- | | arc | is an adaptive replacement cache. It tracks recent evictions as well as recent usage in both the frequent and recent caches. Its computational overhead is comparable to two_queues, but the memory overhead is linear with the size of the cache. ARC has been patented by IBM. | | standard | is a simple LRU cache. It is based on the LRU implementation in groupcache | | two_queues | tracks frequently used and recently used entries separately. This avoids a burst of accesses from taking out frequently used entries, at the cost of about 2x computational overhead and some extra bookkeeping. | ### [](#cap)`cap` The cache maximum capacity (number of entries) **Type**: `int` **Default**: `1000` ### [](#init_values)`init_values` A table of key/value pairs that should be present in the cache on initialization. This can be used to create static lookup tables. **Type**: `string` **Default**: `{}` ```yaml # Examples: init_values: Nickelback: "1995" Spice Girls: "1994" The Human League: "1977" ``` ### [](#optimistic)`optimistic` If true, we do not lock on read/write events. The lru package is thread-safe, however the ADD operation is not atomic. **Type**: `bool` **Default**: `false` ### [](#two_queues_ghost_ratio)`two_queues_ghost_ratio` is the default ratio of ghost entries kept to track entries recently evicted on two\_queues cache. **Type**: `float` **Default**: `0.5` ### [](#two_queues_recent_ratio)`two_queues_recent_ratio` is the ratio of the two\_queues cache dedicated to recently added entries that have only been accessed once. **Type**: `float` **Default**: `0.25` --- # Page 69: memcached **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/memcached.md --- # memcached --- title: memcached latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/memcached page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/memcached.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/memcached.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/memcached/ "View the Self-Managed version of this component") Connects to a cluster of memcached services, a prefix can be specified to allow multiple cache types to share a memcached cluster under different namespaces. #### Common ```yml caches: memcached: addresses: [] # No default (required) prefix: "" # No default (optional) default_ttl: 300s ``` #### Advanced ```yml caches: memcached: addresses: [] # No default (required) prefix: "" # No default (optional) default_ttl: 300s retries: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` ## [](#fields)Fields ### [](#addresses)`addresses[]` A list of addresses of memcached servers to use. **Type**: `array` ### [](#default_ttl)`default_ttl` A default TTL to set for items, calculated from the moment the item is cached. **Type**: `string` **Default**: `300s` ### [](#prefix)`prefix` An optional string to prefix item keys with in order to prevent collisions with similar services. **Type**: `string` ### [](#retries)`retries` Determine time intervals and cut offs for retry attempts. **Type**: `object` ### [](#retries-initial_interval)`retries.initial_interval` The initial period to wait between retry attempts. **Type**: `string` **Default**: `1s` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#retries-max_elapsed_time)`retries.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. **Type**: `string` **Default**: `30s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#retries-max_interval)`retries.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `5s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` --- # Page 70: memory **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/memory.md --- # memory --- title: memory latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/memory page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/memory.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/memory.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/memory/)[Buffer](/redpanda-cloud/develop/connect/components/buffers/memory/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/memory/ "View the Self-Managed version of this component") Stores key/value pairs in a map held in memory. This cache is therefore reset every time the service restarts. Each item in the cache has a TTL set from the moment it was last edited, after which it will be removed during the next compaction. #### Common ```yml caches: memory: default_ttl: 5m compaction_interval: 60s init_values: {} ``` #### Advanced ```yml caches: memory: default_ttl: 5m compaction_interval: 60s init_values: {} shards: 1 ``` The compaction interval determines how often the cache is cleared of expired items, and this process is only triggered on writes to the cache. Access to the cache is blocked during this process. Item expiry can be disabled entirely by setting the `compaction_interval` to an empty string. The field `init_values` can be used to prepopulate the memory cache with any number of key/value pairs which are exempt from TTLs: ```yaml cache_resources: - label: foocache memory: default_ttl: 60s init_values: foo: bar ``` These values can be overridden during execution, at which point the configured TTL is respected as usual. ## [](#fields)Fields ### [](#compaction_interval)`compaction_interval` The period of time to wait before each compaction, at which point expired items are removed. This field can be set to an empty string in order to disable compactions/expiry entirely. **Type**: `string` **Default**: `60s` ### [](#default_ttl)`default_ttl` The default TTL of each item. After this period an item will be eligible for removal during the next compaction. **Type**: `string` **Default**: `5m` ### [](#init_values)`init_values` A table of key/value pairs that should be present in the cache on initialization. This can be used to create static lookup tables. **Type**: `string` **Default**: `{}` ```yaml # Examples: init_values: Nickelback: "1995" Spice Girls: "1994" The Human League: "1977" ``` ### [](#shards)`shards` A number of logical shards to spread keys across, increasing the shards can have a performance benefit when processing a large number of keys. **Type**: `int` **Default**: `1` --- # Page 71: mongodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/mongodb.md --- # mongodb --- title: mongodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/mongodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/mongodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/mongodb.adoc page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/mongodb/)[Input](/redpanda-cloud/develop/connect/components/inputs/mongodb/)[Output](/redpanda-cloud/develop/connect/components/outputs/mongodb/)[Processor](/redpanda-cloud/develop/connect/components/processors/mongodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/mongodb/ "View the Self-Managed version of this component") Use a MongoDB instance as a cache. #### Common ```yml caches: mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collection: "" # No default (required) key_field: "" # No default (required) value_field: "" # No default (required) ``` #### Advanced ```yml caches: mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" app_name: benthos collection: "" # No default (required) key_field: "" # No default (required) value_field: "" # No default (required) ``` ## [](#fields)Fields ### [](#app_name)`app_name` The client application name. **Type**: `string` **Default**: `benthos` ### [](#collection)`collection` The name of the target collection. **Type**: `string` ### [](#database)`database` The name of the target MongoDB database. **Type**: `string` ### [](#key_field)`key_field` The field in the document that is used as the key. **Type**: `string` ### [](#password)`password` The password to connect to the database. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#url)`url` The URL of the target MongoDB server. **Type**: `string` ```yaml # Examples: url: mongodb://localhost:27017 ``` ### [](#username)`username` The username to connect to the database. **Type**: `string` **Default**: `""` ### [](#value_field)`value_field` The field in the document that is used as the value. **Type**: `string` --- # Page 72: multilevel **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/multilevel.md --- # multilevel --- title: multilevel latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/multilevel page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/multilevel.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/multilevel.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/multilevel/ "View the Self-Managed version of this component") Combines multiple caches as levels, performing read-through and write-through operations across them. ```yml caches: multilevel: - label: "" memory: default_ttl: 5m compaction_interval: 60s - label: "" redis: url: redis://localhost:6379 expiration: 24h ``` ## [](#examples)Examples ### [](#hot-and-cold-cache)Hot and cold cache The multilevel cache is useful for reducing traffic against a remote cache by routing it through a local cache. In the following example requests will only go through to the memcached server if the local memory cache is missing the key. ```yaml pipeline: processors: - branch: processors: - cache: resource: leveled operator: get key: ${! json("key") } - catch: - mapping: 'root = {"err":error()}' result_map: 'root.result = this' cache_resources: - label: leveled multilevel: [ hot, cold ] - label: hot memory: default_ttl: 60s - label: cold memcached: addresses: [ TODO:11211 ] default_ttl: 60s ``` --- # Page 73: nats_kv **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/nats_kv.md --- # nats\_kv --- title: nats_kv latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/nats_kv page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/nats_kv.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/nats_kv.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/nats_kv/)[Input](/redpanda-cloud/develop/connect/components/inputs/nats_kv/)[Output](/redpanda-cloud/develop/connect/components/outputs/nats_kv/)[Processor](/redpanda-cloud/develop/connect/components/processors/nats_kv/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/nats_kv/ "View the Self-Managed version of this component") Cache key/value pairs in a NATS key-value bucket. #### Common ```yml caches: nats_kv: urls: [] # No default (required) bucket: "" # No default (required) ``` #### Advanced ```yml caches: nats_kv: urls: [] # No default (required) max_reconnects: "" # No default (optional) bucket: "" # No default (required) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) ``` ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` The NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of an user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plain text user JWT (given along with the corresponding user NKey Seed). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plain text user NKey Seed (given along with the corresponding user JWT). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#bucket)`bucket` The name of the KV bucket. **Type**: `string` ```yaml # Examples: bucket: my_kv_bucket ``` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If an item of the list contains commas it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 74: noop **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/noop.md --- # noop --- title: noop latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/noop page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/noop.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/noop.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/noop/)[Processor](/redpanda-cloud/develop/connect/components/processors/noop/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/noop/ "View the Self-Managed version of this component") Noop is a cache that stores nothing, all gets returns not found. Why? Sometimes doing nothing is the braver option. Introduced in version 4.27.0. ```yml caches: noop: {} ``` --- # Page 75: redis **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/redis.md --- # redis --- title: redis latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/redis page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/redis.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/redis.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/redis/)[Processor](/redpanda-cloud/develop/connect/components/processors/redis/)[Rate\_limit](/redpanda-cloud/develop/connect/components/rate_limits/redis/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/redis/ "View the Self-Managed version of this component") Use a Redis instance as a cache. The expiration can be set to zero or an empty string in order to set no expiration. #### Common ```yml caches: redis: url: "" # No default (required) prefix: "" # No default (optional) ``` #### Advanced ```yml caches: redis: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] prefix: "" # No default (optional) default_ttl: "" # No default (optional) retries: initial_interval: 500ms max_interval: 1s max_elapsed_time: 5s ``` ## [](#fields)Fields ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#default_ttl)`default_ttl` An optional default TTL to set for items, calculated from the moment the item is cached. **Type**: `string` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#prefix)`prefix` An optional string to prefix item keys with in order to prevent collisions with similar services. **Type**: `string` ### [](#retries)`retries` Determine time intervals and cut offs for retry attempts. **Type**: `object` ### [](#retries-initial_interval)`retries.initial_interval` The initial period to wait between retry attempts. **Type**: `string` **Default**: `500ms` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#retries-max_elapsed_time)`retries.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. **Type**: `string` **Default**: `5s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#retries-max_interval)`retries.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `1s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 76: redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/redpanda.md --- # redpanda --- title: redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/redpanda page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/redpanda.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/redpanda.adoc categories: "[Services]" description: A Kafka cache using the https://github.com/twmb/franz-go[Franz Kafka client library^]. page-git-created-date: "2025-07-08" page-git-modified-date: "2025-07-08" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/redpanda/)[Input](/redpanda-cloud/develop/connect/components/inputs/redpanda/)[Output](/redpanda-cloud/develop/connect/components/outputs/redpanda/)[Tracer](/redpanda-cloud/develop/connect/components/tracers/redpanda/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/redpanda/ "View the Self-Managed version of this component") A Kafka cache implemented using the [Franz Kafka client library](https://github.com/twmb/franz-go). #### Common ```yaml caches: redpanda: seed_brokers: [] # No default (required) topic: "" # No default (required) ``` #### Advanced ```yaml caches: redpanda: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topic: "" # No default (required) allow_auto_topic_creation: true ``` A cache that stores data in a Kafka topic. This cache is useful for data that is written frequently and queried infrequently. Reads from the cache require scanning the entire topic partition. If you expect frequent access, consider placing an in-memory caching layer in front of this one. Because only the latest values are needed, configure compaction for topics used as caches so that reads are less expensive when topics are rescanned. See [Compaction Settings](../../../../../../current/manage/cluster-maintenance/compaction-settings/). The cache does not have any TTL mechanisms. Use the Kafka topic retention policies to manage TTL. ## [](#fields)Fields ### [](#allow_auto_topic_creation)`allow_auto_topic_creation` Enables topics to be auto created if they do not exist when fetching their metadata. **Type**: `bool` **Default**: `true` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#conn_idle_timeout)`conn_idle_timeout` The amount of time that connections can remain idle before they are closed. **Type**: `string` **Default**: `20s` ### [](#metadata_max_age)`metadata_max_age` The maximum age of metadata before it is refreshed. This interval also controls how frequently regex topic patterns are re-evaluated to discover new matching topics. **Type**: `string` **Default**: `1m` ### [](#request_timeout_overhead)`request_timeout_overhead` Additional time to apply as overhead when calculating request deadlines. This buffer helps prevent premature timeouts, especially for requests that already define their own timeout values. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more SASL authentication methods. Each method is tried in the order specified. If the broker supports the first mechanism, outgoing client connections use that mechanism. If the first mechanism fails, the client will use the first supported mechanism. If the broker does not support any client mechanisms, connections will fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS-specific fields for when [`sasl.mechanism`](#sasl-mechanism) is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. For more information, see the [credentials for AWS](../../../guides/cloud/aws/) guide. **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` The credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` The ARN of the role to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming the specified role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used. Required only when using short-term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` A custom endpoint URL for AWS API requests. Use this to connect to AWS-compatible services or local testing environments instead of the standard AWS endpoints. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use for authentication. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM-based authentication as specified by the aws-msk-iam-auth Java library. | | OAUTHBEARER | OAuth Bearer authentication. | | PLAIN | PLAIN mechanism for plaintext password authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM authentication as specified in RFC5802. | | none | Disable SASL authentication. | ### [](#sasl-password)`sasl[].password` The password to use for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` The username to use for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to. Items containing commas are expanded into multiple addresses. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` The plaintext certificate to use for TLS authentication. Must be paired with the corresponding private key in the `key` field when using inline PEM data for mTLS client certificates. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path to a file containing the certificate to use for TLS authentication. Must be paired with the corresponding private key file in the `key_file` field when using file-based configuration for mTLS client certificates. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` Private key for mTLS client certificate as inline PEM data. Must correspond to the client certificate specified in the `cert` field. Use this field together with `cert` when providing certificate data inline rather than through files. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` Path to private key file for mTLS client certificate in PEM format. Must correspond to the client certificate specified in the `cert_file` field. Use this field together with `cert_file` when loading certificate data from files. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` The password to use for the private key (specified in the `key` or `key_file` fields), if it is password-protected. The PKCS#1 and PKCS#8 formats are supported. Supports environment variable interpolation for secure password management. The `pbeWithMD5AndDES-CBC` algorithm is obsolete and not supported for the PKCS#8 format. This algorithm does not authenticate the ciphertext, making it vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` The topic to store data in. **Type**: `string` --- # Page 77: ristretto **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/ristretto.md --- # ristretto --- title: ristretto latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/ristretto page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/ristretto.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/ristretto.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/ristretto/ "View the Self-Managed version of this component") Stores key/value pairs in a map held in the memory-bound [Ristretto cache](https://github.com/dgraph-io/ristretto). #### Common ```yml caches: ristretto: default_ttl: "" ``` #### Advanced ```yml caches: ristretto: default_ttl: "" get_retries: enabled: false initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` This cache is more efficient and appropriate for high-volume use cases than the standard memory cache. However, the add command is non-atomic, and therefore this cache is not suitable for deduplication. ## [](#fields)Fields ### [](#default_ttl)`default_ttl` A default TTL to set for items, calculated from the moment the item is cached. Set to an empty string or zero duration to disable TTLs. **Type**: `string` **Default**: `""` ```yaml # Examples: default_ttl: 5m # --- default_ttl: 60s ``` ### [](#get_retries)`get_retries` Determines how and whether get attempts should be retried if the key is not found. Ristretto is a concurrent cache that does not immediately reflect writes, and so it can sometimes be useful to enable retries at the cost of speed in cases where the key is expected to exist. **Type**: `object` ### [](#get_retries-enabled)`get_retries.enabled` Whether retries should be enabled. **Type**: `bool` **Default**: `false` ### [](#get_retries-initial_interval)`get_retries.initial_interval` The initial period to wait between retry attempts. **Type**: `string` **Default**: `1s` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#get_retries-max_elapsed_time)`get_retries.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. **Type**: `string` **Default**: `30s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#get_retries-max_interval)`get_retries.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `5s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` --- # Page 78: sql **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/sql.md --- # sql --- title: sql latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/sql page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/sql.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/sql.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Cache ▼ [Cache](/redpanda-cloud/develop/connect/components/caches/sql/)[Output](/redpanda-connect/components/outputs/sql/)[Processor](/redpanda-connect/components/processors/sql/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/sql/ "View the Self-Managed version of this component") Uses an SQL database table as a destination for storing cache key/value items. #### Common ```yml caches: sql: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) key_column: "" # No default (required) value_column: "" # No default (required) set_suffix: "" # No default (optional) ``` #### Advanced ```yml caches: sql: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) key_column: "" # No default (required) value_column: "" # No default (required) set_suffix: "" # No default (optional) init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` Each cache key/value pair will exist as a row within the specified table. Currently only the key and value columns are set, and therefore any other columns present within the target table must allow NULL values if this cache is going to be used for set and add operations. Cache operations are translated into SQL statements as follows: ## [](#get)Get All `get` operations are performed with a traditional `select` statement. ## [](#delete)Delete All `delete` operations are performed with a traditional `delete` statement. ## [](#set)Set The `set` operation is performed with a traditional `insert` statement. This will behave as an `add` operation by default, and so ideally needs to be adapted in order to provide updates instead of failing on collision s. Since different SQL engines implement upserts differently it is necessary to specify a `set_suffix` that modifies an `insert` statement in order to perform updates on conflict. ## [](#add)Add The `add` operation is performed with a traditional `insert` statement. ## [](#fields)Fields ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#key_column)`key_column` The name of a column to be used for storing cache item keys. This column should support strings of arbitrary size. **Type**: `string` ```yaml # Examples: key_column: foo ``` ### [](#set_suffix)`set_suffix` An optional suffix to append to each insert query for a cache `set` operation. This should modify an insert statement into an upsert appropriate for the given SQL engine. **Type**: `string` ```yaml # Examples: set_suffix: ON DUPLICATE KEY UPDATE bar=VALUES(bar) # --- set_suffix: ON CONFLICT (foo) DO UPDATE SET bar=excluded.bar # --- set_suffix: ON CONFLICT (foo) DO NOTHING ``` ### [](#table)`table` The table to insert/read/delete cache items. **Type**: `string` ```yaml # Examples: table: foo ``` ### [](#value_column)`value_column` The name of a column to be used for storing cache item values. This column should support strings of arbitrary size. **Type**: `string` ```yaml # Examples: value_column: bar ``` --- # Page 79: ttlru **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/caches/ttlru.md --- # ttlru --- title: ttlru latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/caches/ttlru page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/caches/ttlru.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/caches/ttlru.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/caches/ttlru/ "View the Self-Managed version of this component") Stores key/value pairs in a ttlru in-memory cache. This cache is therefore reset every time the service restarts. #### Common ```yml caches: ttlru: cap: 1024 default_ttl: 5m0s init_values: {} ``` #### Advanced ```yml caches: ttlru: cap: 1024 default_ttl: 5m0s ttl: "" # No default (optional) init_values: {} optimistic: false ``` The cache ttlru provides a simple, goroutine safe, cache with a fixed number of entries. Each entry has a per-cache defined TTL. This TTL is reset on both modification and access of the value. As a result, if the cache is full, and no items have expired, when adding a new item, the item with the soonest expiration will be evicted. It uses the package [`expirable`](https://github.com/hashicorp/golang-lru/tree/main/expirable) The field init\_values can be used to pre-populate the memory cache with any number of key/value pairs: ```yaml cache_resources: - label: foocache ttlru: default_ttl: '5m' cap: 1024 init_values: foo: bar ``` These values can be overridden during execution. ## [](#fields)Fields ### [](#cap)`cap` The cache maximum capacity (number of entries) **Type**: `int` **Default**: `1024` ### [](#default_ttl)`default_ttl` The cache ttl of each element **Type**: `string` **Default**: `5m0s` ### [](#init_values)`init_values` A table of key/value pairs that should be present in the cache on initialization. This can be used to create static lookup tables. **Type**: `string` **Default**: `{}` ```yaml # Examples: init_values: Nickelback: "1995" Spice Girls: "1994" The Human League: "1977" ``` ### [](#optimistic)`optimistic` If true, we do not lock on read/write events. The ttlru package is thread-safe, however the ADD operation is not atomic. **Type**: `bool` **Default**: `false` ### [](#ttl)`ttl` Deprecated. Please use `default_ttl` field **Type**: `string` --- # Page 80: Inputs **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/about.md --- # Inputs --- title: Inputs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- An input is a source of data piped through an array of optional [processors](../../processors/about/): ```yaml input: label: my_redis_input redis_streams: url: tcp://localhost:6379 streams: - benthos_stream body_key: body consumer_group: benthos_group # Optional list of processing steps processors: - mapping: | root.document = this.without("links") root.link_count = this.links.length() ``` Some inputs have a logical end, when this happens the input gracefully terminates and Redpanda Connect will shut itself down once all messages have been processed fully. It’s also possible to specify a logical end for an input that otherwise doesn’t have one with the [`read_until` input](../read_until/), which checks a condition against each consumed message in order to determine whether it should be the last. ## [](#brokering)Brokering Only one input is configured at the root of a Redpanda Connect config. However, the root input can be a [broker](../broker/) which combines multiple inputs and merges the streams: ```yaml input: broker: inputs: - kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup - redis_streams: url: tcp://localhost:6379 streams: - benthos_stream body_key: body consumer_group: benthos_group ``` ## [](#labels)Labels Inputs have an optional field `label` that can uniquely identify them in observability data such as metrics and logs. This can be useful when running configs with multiple inputs, otherwise their metrics labels will be generated based on their composition. For more information check out the [metrics documentation](../../metrics/about/). ### [](#sequential-reads)Sequential reads Sometimes it’s useful to consume a sequence of inputs, where an input is only consumed once its predecessor is drained fully, you can achieve this with the [`sequence` input](../sequence/). ## [](#generating-messages)Generating messages It’s possible to generate data with Redpanda Connect using the [`generate` input](../generate/), which is also a convenient way to trigger scheduled pipelines. --- # Page 81: amqp_0_9 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/amqp_0_9.md --- # amqp\_0\_9 --- title: amqp_0_9 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/amqp_0_9 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/amqp_0_9.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/amqp_0_9.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/amqp_0_9/)[Output](/redpanda-cloud/develop/connect/components/outputs/amqp_0_9/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/amqp_0_9/ "View the Self-Managed version of this component") Connects to an AMQP (0.91) queue. AMQP is a messaging protocol used by various message brokers, including RabbitMQ. #### Common ```yml inputs: label: "" amqp_0_9: urls: [] # No default (required) queue: "" # No default (required) consumer_tag: "" prefetch_count: 10 ``` #### Advanced ```yml inputs: label: "" amqp_0_9: urls: [] # No default (required) queue: "" # No default (required) queue_declare: enabled: false durable: true auto_delete: false arguments: "" # No default (optional) bindings_declare: [] # No default (optional) consumer_tag: "" auto_ack: false nack_reject_patterns: [] prefetch_count: 10 prefetch_size: 0 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] ``` TLS is automatically enabled when connecting to an `amqps` URL. However, you can customize [TLS settings](#tls) if required. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `amqp_content_type` - `amqp_content_encoding` - `amqp_delivery_mode` - `amqp_priority` - `amqp_correlation_id` - `amqp_reply_to` - `amqp_expiration` - `amqp_message_id` - `amqp_timestamp` - `amqp_type` - `amqp_user_id` - `amqp_app_id` - `amqp_consumer_tag` - `amqp_delivery_tag` - `amqp_redelivered` - `amqp_exchange` - `amqp_routing_key` - All existing message headers, including nested headers prefixed with the key of their respective parent. You can access these metadata fields using [function interpolations](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#auto_ack)`auto_ack` Set to `true` to automatically acknowledge messages as soon as they are consumed rather than waiting for acknowledgments from downstream. This can improve throughput and prevent the pipeline from becoming blocked, but delivery guarantees are lost. **Type**: `bool` **Default**: `false` ### [](#bindings_declare)`bindings_declare[]` Passively declares the bindings of the target queue to make sure they exist and are configured correctly. If the bindings exist, then the passive declaration verifies that fields specified in this object match them. **Type**: `object` ```yaml # Examples: bindings_declare: - exchange: foo key: bar ``` ### [](#bindings_declare-exchange)`bindings_declare[].exchange` The exchange of the declared binding. **Type**: `string` **Default**: `""` ### [](#bindings_declare-key)`bindings_declare[].key` The key of the declared binding. **Type**: `string` **Default**: `""` ### [](#consumer_tag)`consumer_tag` A consumer tag to uniquely identify the consumer. **Type**: `string` **Default**: `""` ### [](#nack_reject_patterns)`nack_reject_patterns[]` A list of regular expression patterns to match against errors in messages that Redpanda Connect fails to deliver. When a message has an error that matches a pattern, it is dropped or delivered to a dead-letter queue (if a queue has been configured). By default, failed messages are negatively acknowledged (nacked) and requeued. **Type**: `array` **Default**: `[]` ```yaml # Examples: nack_reject_patterns: - "^reject me please:.+$" ``` ### [](#prefetch_count)`prefetch_count` The maximum number of pending messages at a given time. **Type**: `int` **Default**: `10` ### [](#prefetch_size)`prefetch_size` The maximum size of pending messages (in bytes) at a given time. **Type**: `int` **Default**: `0` ### [](#queue)`queue` An AMQP queue to consume from. **Type**: `string` ### [](#queue_declare)`queue_declare` Passively declares the [target queue](#queue) to make sure a queue with the specified name exists and is configured correctly. If the queue exists, then the passive declaration verifies that fields specified in this object match the its properties. **Type**: `object` ### [](#queue_declare-arguments)`queue_declare.arguments` Arguments for server-specific implementations of the queue (optional). You can use arguments to configure additional parameters for queue types that require them. For more information about available arguments, see the [RabbitMQ Client Library](https://github.com/rabbitmq/amqp091-go/blob/b3d409fe92c34bea04d8123a136384c85e8dc431/types.go#L282-L362). | Argument | Description | Accepted values | | --- | --- | --- | | x-queue-type | Declares the type of queue. | Options: classic (default), quorum, stream, drop-head, reject-publish, and reject-publish-dlx. | | x-max-length | The maximum number of messages in the queue. | A non-negative integer. | | x-max-length-bytes | The maximum size of messages (in bytes) in the queue. | A non-negative integer. | | x-overflow | Sets the queue’s overflow behavior. | Options: drop-head (default), reject-publish, reject-publish-dlx. | | x-message-ttl | The duration (in milliseconds) that messages remain in the queue before they expire and are discarded. | A string that represents the number of milliseconds. For example, 60000 retains messages for one minute. | | x-expires | The duration after which the queue automatically expires. | A positive integer. | | x-max-age | The duration (in configurable units) that streamed messages are retained on disk before they are discarded. | Options: Y, M, D, h, m, s. For example, 7D retains messages for a week. | | x-stream-max-segment-size-bytes | The maximum size (in bytes) of the segment files held on disk. | A positive integer. Default: 500000000 (approximately 500 MB). | | x-queue-version | The version of the classic queue to use. | Options: 1 or 2. | | x-consumer-timeout | The duration (in milliseconds) that a consumer can remain idle before it is automatically canceled. | A positive integer that represents the number of milliseconds. For example, 60000 sets a timeout duration of one minute. | | x-single-active-consumer | When set to true, a single consumer receives messages from the queue even when multiple consumers are subscribed to it. | A boolean. | **Type**: `object` ```yaml # Examples: arguments: x-max-length: 1000 x-max-length-bytes: 4096 x-queue-type: quorum ``` ### [](#queue_declare-auto_delete)`queue_declare.auto_delete` Whether the declared queue auto-deletes when there are no active consumers. **Type**: `bool` **Default**: `false` ### [](#queue_declare-durable)`queue_declare.durable` Whether the declared queue is durable. **Type**: `bool` **Default**: `true` ### [](#queue_declare-enabled)`queue_declare.enabled` Whether to enable queue declaration. **Type**: `bool` **Default**: `false` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. This input attempts to connect to each URL in the list, in order, until a successful connection is established. It then continues to use that URL until the connection is closed. If an item in the list contains commas, it is split into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "amqp://guest:guest@127.0.0.1:5672/" # --- urls: - "amqp://127.0.0.1:5672/,amqp://127.0.0.2:5672/" # --- urls: - "amqp://127.0.0.1:5672/" - "amqp://127.0.0.2:5672/" ``` --- # Page 82: aws_cloudwatch_logs **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/aws_cloudwatch_logs.md --- # aws\_cloudwatch\_logs --- title: aws_cloudwatch_logs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/aws_cloudwatch_logs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/aws_cloudwatch_logs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_cloudwatch_logs.adoc categories: "[Services, AWS]" description: Consumes log events from AWS CloudWatch Logs. page-git-created-date: "2026-03-13" page-git-modified-date: "2026-03-13" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/aws_cloudwatch_logs/ "View the Self-Managed version of this component") Consumes log events from AWS CloudWatch Logs. #### Common ```yml inputs: label: "" aws_cloudwatch_logs: log_group_name: "" # No default (required) log_stream_names: [] # No default (optional) log_stream_prefix: "" # No default (optional) filter_pattern: "" # No default (optional) start_time: "" # No default (optional) poll_interval: 5s auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" aws_cloudwatch_logs: log_group_name: "" # No default (required) log_stream_names: [] # No default (optional) log_stream_prefix: "" # No default (optional) filter_pattern: "" # No default (optional) start_time: "" # No default (optional) poll_interval: 5s limit: 1000 structured_log: true api_timeout: 30s auto_replay_nacks: true region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` Polls CloudWatch Log Groups for log events. Supports filtering by log streams, CloudWatch filter patterns, and configurable start times. Each log event becomes a separate message with metadata including the log group name, log stream name, timestamp, and ingestion time. > ❗ **IMPORTANT** > > This input provides at-least-once delivery. It tracks its position in memory only, so if the process restarts, it resumes from the configured `start_time` (or the beginning if not set). Duplicates can occur across restarts. For exactly-once outcomes, implement idempotent or deduplicated downstream processing. ## [](#credentials)Credentials By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level to transfer data across accounts. You can find out more in [AWS credentials](../../../guides/cloud/aws/). ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `cloudwatch_log_group`: The name of the log group. - `cloudwatch_log_stream`: The name of the log stream. - `cloudwatch_timestamp`: The timestamp of the log event (Unix milliseconds). - `cloudwatch_ingestion_time`: The ingestion timestamp (Unix milliseconds). - `cloudwatch_event_id`: The unique event ID. You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#api_timeout)`api_timeout` The maximum time to wait for an API request to complete. **Type**: `string` **Default**: `30s` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#filter_pattern)`filter_pattern` An optional CloudWatch Logs filter pattern to apply when querying log events. For syntax details, see the [CloudWatch Logs filter and pattern syntax](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) documentation. **Type**: `string` ```yaml # Examples: filter_pattern: [ERROR] ``` ### [](#limit)`limit` The maximum number of log events to return in a single API call. Valid range: 1-10000. **Type**: `int` **Default**: `1000` ### [](#log_group_name)`log_group_name` The name of the CloudWatch Log Group to consume from. **Type**: `string` ```yaml # Examples: log_group_name: my-app-logs ``` ### [](#log_stream_names)`log_stream_names[]` An optional list of log stream names to consume from. If not set, events from all streams in the log group will be consumed. **Type**: `array` ```yaml # Examples: log_stream_names: - stream-1 - stream-2 ``` ### [](#log_stream_prefix)`log_stream_prefix` An optional log stream name prefix to filter streams. Only streams starting with this prefix will be consumed. **Type**: `string` ```yaml # Examples: log_stream_prefix: prod- ``` ### [](#poll_interval)`poll_interval` The interval at which to poll for new log events. **Type**: `string` **Default**: `5s` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#start_time)`start_time` The time to start consuming log events from. Can be an RFC3339 timestamp (for example, `2024-01-01T00:00:00Z`) or the string `now` to start consuming from the current time. If not set, starts from the beginning of available logs. **Type**: `string` ```yaml # Examples: start_time: 2024-01-01T00:00:00Z # --- start_time: now ``` ### [](#structured_log)`structured_log` Whether to output log events as structured JSON objects with all metadata fields, or as plain text messages with metadata stored in Redpanda Connect message metadata. **Type**: `bool` **Default**: `true` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 83: aws_dynamodb_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/aws_dynamodb_cdc.md --- # aws\_dynamodb\_cdc --- title: aws_dynamodb_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/aws_dynamodb_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/aws_dynamodb_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_dynamodb_cdc.adoc categories: "[Services]" description: Reads change data capture (CDC) events from DynamoDB Streams. page-topic-type: reference personas: data_engineer, streaming_developer, platform_operator learning-objective-1: Look up configuration options for DynamoDB CDC streaming learning-objective-2: Find metadata fields available for message processing learning-objective-3: Identify checkpointing and performance tuning settings page-git-created-date: "2026-03-04" page-git-modified-date: "2026-03-04" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/aws_dynamodb_cdc/ "View the Self-Managed version of this component") Stream item-level changes from DynamoDB tables using DynamoDB Streams. This input automatically manages shards, checkpoints progress for recovery, and processes multiple shards concurrently. Use this reference to: - Look up configuration options for DynamoDB CDC streaming - Find metadata fields available for message processing - Identify checkpointing and performance tuning settings #### Common ```yml inputs: label: "" aws_dynamodb_cdc: tables: [] checkpoint_table: redpanda_dynamodb_checkpoints start_from: trim_horizon snapshot_mode: none ``` #### Advanced ```yml inputs: label: "" aws_dynamodb_cdc: tables: [] table_discovery_mode: single table_tag_filter: "" table_discovery_interval: 5m checkpoint_table: redpanda_dynamodb_checkpoints batch_size: 1000 poll_interval: 1s start_from: trim_horizon checkpoint_limit: 1000 max_tracked_shards: 10000 throttle_backoff: 100ms snapshot_mode: none snapshot_segments: 1 snapshot_batch_size: 100 snapshot_throttle: 100ms snapshot_deduplicate: true snapshot_buffer_size: 100000 region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` ## [](#prerequisites)Prerequisites The source DynamoDB table must have [DynamoDB Streams](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) enabled. You can enable streams with one of these view types: - `KEYS_ONLY`: Only the key attributes of the modified item - `NEW_IMAGE`: The entire item as it appears after the modification - `OLD_IMAGE`: The entire item as it appeared before the modification - `NEW_AND_OLD_IMAGES`: Both the new and old item images ## [](#checkpointing)Checkpointing Checkpoints are stored in a separate DynamoDB table (configured via `checkpoint_table`). This table is created automatically if it does not exist. On restart, the input resumes from the last checkpointed position for each shard. ## [](#alternative-components)Alternative components For better performance and longer retention (up to 1 year vs 24 hours), consider using Kinesis Data Streams for DynamoDB with the `aws_kinesis` input instead. ## [](#message-structure)Message structure Each CDC event is delivered as a JSON message with the following structure. Use these fields in your Bloblang mappings with `this.`: ```json { "eventID": "abc123-", (1) "eventName": "INSERT | MODIFY | REMOVE", (2) "eventSource": "aws:dynamodb", "awsRegion": "us-east-1", "tableName": "my-table", (3) "dynamodb": { "keys": { (4) "pk": "user#123", "sk": "profile" }, "newImage": { (5) "pk": "user#123", "sk": "profile", "name": "Alice", "email": "alice@example.com" }, "oldImage": { (6) "pk": "user#123", "sk": "profile", "name": "Alice Smith" }, "sequenceNumber": "12345678901234567890", (7) "sizeBytes": 256, "streamViewType": "NEW_AND_OLD_IMAGES" } } ``` | 1 | Unique identifier for this change event. | | --- | --- | | 2 | Type of change: INSERT (new item), MODIFY (updated item), or REMOVE (deleted item). | | 3 | Name of the source DynamoDB table. | | 4 | Primary key attributes of the changed item. Always present. | | 5 | Item state after the change. Present for INSERT and MODIFY events (requires NEW_IMAGE or NEW_AND_OLD_IMAGES stream view type). | | 6 | Item state before the change. Present for MODIFY and REMOVE events (requires OLD_IMAGE or NEW_AND_OLD_IMAGES stream view type). | | 7 | Position of this record in the shard, used for ordering and checkpointing. | > 📝 **NOTE** > > DynamoDB attribute values are automatically unmarshalled from DynamoDB’s type format (`{"S": "value"}`) to plain values (`"value"`). ### [](#example-mapping)Example mapping ```yaml pipeline: processors: - mapping: | root.event_type = this.eventName root.table = this.tableName root.keys = this.dynamodb.keys root.new_data = this.dynamodb.newImage root.old_data = this.dynamodb.oldImage ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `dynamodb_shard_id`: The shard ID from which the record was read - `dynamodb_sequence_number`: The sequence number of the record in the stream - `dynamodb_event_name`: The type of change: INSERT, MODIFY, or REMOVE - `dynamodb_table`: The name of the DynamoDB table ## [](#metrics)Metrics This input emits the following metrics: - `dynamodb_cdc_shards_tracked`: Total number of shards being tracked (gauge) - `dynamodb_cdc_shards_active`: Number of shards currently being read from (gauge) ## [](#fields)Fields ### [](#batch_size)`batch_size` Maximum number of records to read per shard in a single request. Valid range: 1-1000. **Type**: `int` **Default**: `1000` ### [](#checkpoint_limit)`checkpoint_limit` Maximum number of unacknowledged messages before forcing a checkpoint update. Lower values provide better recovery guarantees but increase write overhead. **Type**: `int` **Default**: `1000` ### [](#checkpoint_table)`checkpoint_table` DynamoDB table name for storing checkpoints. Will be created if it doesn’t exist. **Type**: `string` **Default**: `redpanda_dynamodb_checkpoints` ### [](#credentials)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#max_tracked_shards)`max_tracked_shards` Maximum number of shards to track simultaneously. Prevents memory issues with extremely large tables. **Type**: `int` **Default**: `10000` ### [](#poll_interval)`poll_interval` Time to wait between polling attempts when no records are available. **Type**: `string` **Default**: `1s` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#snapshot_batch_size)`snapshot_batch_size` Records per scan request during snapshot. Maximum 1000. Lower values provide better backpressure control but require more API calls. **Type**: `int` **Default**: `100` ### [](#snapshot_buffer_size)`snapshot_buffer_size` Maximum CDC events to buffer for deduplication (approximately 100 bytes per entry). If exceeded, deduplication is disabled and duplicates may be emitted. **Type**: `int` **Default**: `100000` ### [](#snapshot_deduplicate)`snapshot_deduplicate` Deduplicate records that appear in both snapshot and CDC stream. Requires buffering CDC events during snapshot. If buffer is exceeded, deduplication is disabled to prevent data loss. **Type**: `bool` **Default**: `true` ### [](#snapshot_mode)`snapshot_mode` `none`: Streams CDC events only (default). `snapshot_only`: Performs a one-time full table scan with no ongoing streaming. `snapshot_and_cdc`: Scans the entire table, then streams changes. **Type**: `string` **Default**: `none` **Options**: `none`, `snapshot_only`, `snapshot_and_cdc` ### [](#snapshot_segments)`snapshot_segments` Number of parallel scan segments (1-10). Higher parallelism scans faster but consumes more Read Capacity Units (RCUs). A lower value is safer to start with. **Type**: `int` **Default**: `1` ### [](#snapshot_throttle)`snapshot_throttle` Minimum time between scan requests per segment. Use this to limit Read Capacity Unit (RCU) consumption during snapshot. **Type**: `string` **Default**: `100ms` ### [](#start_from)`start_from` Where to start reading when no checkpoint exists. `trim_horizon` starts from the oldest available record, `latest` starts from new records. **Type**: `string` **Default**: `trim_horizon` **Options**: `trim_horizon`, `latest` ### [](#table_discovery_interval)`table_discovery_interval` Interval for rescanning and discovering new tables when using `tag` or `includelist` mode. Set to 0 to disable periodic rescanning. **Type**: `string` **Default**: `5m` ### [](#table_discovery_mode)`table_discovery_mode` `single`: Streams from tables specified in the `tables` list. `tag`: Auto-discovers tables by tags (ignores the `tables` field). `includelist`: Streams from tables in the `tables` list. Use `single` instead; `includelist` is kept for backward compatibility. **Type**: `string` **Default**: `single` **Options**: `single`, `tag`, `includelist` ### [](#table_tag_filter)`table_tag_filter` Multi-tag filter in the format `key1:v1,v2;key2:v3,v4`. Matches tables where (key1=v1 OR key1=v2) AND (key2=v3 OR key2=v4). Required when `table_discovery_mode` is `tag`. **Type**: `string` **Default**: `""` ### [](#tables)`tables[]` List of table names to stream from. For single table mode, provide one table. For multi-table mode, provide multiple tables. **Type**: `array` **Default**: `[]` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#throttle_backoff)`throttle_backoff` Time to wait when applying backpressure due to too many in-flight messages. **Type**: `string` **Default**: `100ms` ## [](#examples)Examples ### [](#consume-cdc-events)Consume CDC events Read change events from a DynamoDB table with streams enabled. ```yaml input: aws_dynamodb_cdc: tables: [my-table] region: us-east-1 ``` ### [](#start-from-latest)Start from latest Only process new changes, ignoring existing stream data. ```yaml input: aws_dynamodb_cdc: tables: [orders] start_from: latest region: us-west-2 ``` ### [](#snapshot-and-cdc)Snapshot and CDC Scan all existing records, then stream ongoing changes. ```yaml input: aws_dynamodb_cdc: tables: [products] snapshot_mode: snapshot_and_cdc snapshot_segments: 5 region: us-east-1 ``` ### [](#auto-discover-tables-by-tag)Auto-discover tables by tag Automatically discover and stream from all tables with a specific tag. ```yaml input: aws_dynamodb_cdc: table_discovery_mode: tag table_tag_filter: "stream-enabled:true" table_discovery_interval: 5m region: us-east-1 ``` ### [](#auto-discover-tables-by-multiple-tags)Auto-discover tables by multiple tags Discover tables matching multiple tag criteria with OR logic per key, AND logic across keys. ```yaml input: aws_dynamodb_cdc: table_discovery_mode: tag table_tag_filter: "environment:prod,staging;team:data,analytics" table_discovery_interval: 5m region: us-east-1 # Matches tables with: (environment=prod OR environment=staging) AND (team=data OR team=analytics) ``` ### [](#stream-from-multiple-specific-tables)Stream from multiple specific tables Stream from an explicit list of tables simultaneously. ```yaml input: aws_dynamodb_cdc: table_discovery_mode: includelist tables: - orders - customers - products region: us-west-2 ``` ## [](#suggested-reading)Suggested reading For common patterns including filtering events, routing to Kafka or S3, and detecting changed fields, see the [DynamoDB CDC Patterns](../../../cookbooks/dynamodb_cdc/) cookbook. --- # Page 84: aws_kinesis **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/aws_kinesis.md --- # aws\_kinesis --- title: aws_kinesis latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/aws_kinesis page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/aws_kinesis.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_kinesis.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/aws_kinesis/)[Output](/redpanda-cloud/develop/connect/components/outputs/aws_kinesis/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/aws_kinesis/ "View the Self-Managed version of this component") Receive messages from one or more Kinesis streams. #### Common ```yml inputs: label: "" aws_kinesis: streams: [] # No default (required) dynamodb: table: "" create: false billing_mode: PAY_PER_REQUEST read_capacity_units: 0 write_capacity_units: 0 region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) checkpoint_limit: 1024 auto_replay_nacks: true commit_period: 5s steal_grace_period: 2s start_from_oldest: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml inputs: label: "" aws_kinesis: streams: [] # No default (required) dynamodb: table: "" create: false billing_mode: PAY_PER_REQUEST read_capacity_units: 0 write_capacity_units: 0 region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) checkpoint_limit: 1024 auto_replay_nacks: true commit_period: 5s steal_grace_period: 2s rebalance_period: 30s lease_period: 30s start_from_oldest: true region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Consumes messages from one or more Kinesis streams either by automatically balancing shards across other instances of this input, or by consuming shards listed explicitly. The latest message sequence consumed by this input is stored within a [DynamoDB table](#table-schema), which allows it to resume at the correct sequence of the shard during restarts. This table is also used for coordination across distributed inputs when shard balancing. Redpanda Connect will not store a consumed sequence unless it is acknowledged at the output level, which ensures at-least-once delivery guarantees. ## [](#ordering)Ordering By default messages of a shard can be processed in parallel, up to a limit determined by the field `checkpoint_limit`. However, if strict ordered processing is required then this value must be set to 1 in order to process shard messages in lock-step. When doing so it is recommended that you perform batching at this component for performance as it will not be possible to batch lock-stepped messages at the output level. ## [](#table-schema)Table schema It’s possible to configure Redpanda Connect to create the DynamoDB table required for coordination if it does not already exist. However, if you wish to create this yourself (recommended) then create a table with a string HASH key `StreamID` and a string RANGE key `ShardID`. ## [](#batching)Batching Use the `batching` fields to configure an optional [batching policy](../../../configuration/batching/#batch-policy). Each stream shard will be batched separately in order to ensure that acknowledgements aren’t contaminated. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching-2)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum gap between the in flight sequence versus the latest acknowledged sequence at a given time. Increasing this limit enables parallel processing and batching at the output level to work on individual shards. Any given sequence will not be committed unless all messages under that offset are delivered in order to preserve at least once delivery guarantees. **Type**: `int` **Default**: `1024` ### [](#commit_period)`commit_period` The period of time between each update to the checkpoint table. **Type**: `string` **Default**: `5s` ### [](#credentials)`credentials` Manually configure the AWS credentials to use (optional). For more information, see the [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of the AWS credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` The profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` The role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to use when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the AWS credentials in use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the AWS credentials in use. This is a required value for short-term credentials. **Type**: `string` ### [](#dynamodb)`dynamodb` Determines the table used for storing and accessing the latest consumed sequence for shards, and for coordinating balanced consumers of streams. **Type**: `object` ### [](#dynamodb-billing_mode)`dynamodb.billing_mode` When creating the table determines the billing mode. **Type**: `string` **Default**: `PAY_PER_REQUEST` **Options**: `PROVISIONED`, `PAY_PER_REQUEST` ### [](#dynamodb-create)`dynamodb.create` Whether, if the table does not exist, it should be created. **Type**: `bool` **Default**: `false` ### [](#dynamodb-credentials)`dynamodb.credentials` Manually configure the AWS credentials to use (optional). For more information, see the [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#dynamodb-credentials-from_ec2_role)`dynamodb.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#dynamodb-credentials-id)`dynamodb.credentials.id` The ID of the AWS credentials to use. **Type**: `string` ### [](#dynamodb-credentials-profile)`dynamodb.credentials.profile` The profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#dynamodb-credentials-role)`dynamodb.credentials.role` The role ARN to assume. **Type**: `string` ### [](#dynamodb-credentials-role_external_id)`dynamodb.credentials.role_external_id` An external ID to use when assuming a role. **Type**: `string` ### [](#dynamodb-credentials-secret)`dynamodb.credentials.secret` The secret for the AWS credentials in use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#dynamodb-credentials-token)`dynamodb.credentials.token` The token for the AWS credentials in use. This is a required value for short-term credentials. **Type**: `string` ### [](#dynamodb-endpoint)`dynamodb.endpoint` A custom endpoint URL for AWS API requests. Use this to connect to AWS-compatible services or local testing environments instead of the standard AWS endpoints. **Type**: `string` ### [](#dynamodb-read_capacity_units)`dynamodb.read_capacity_units` Set the provisioned read capacity when creating the table with a `billing_mode` of `PROVISIONED`. **Type**: `int` **Default**: `0` ### [](#dynamodb-region)`dynamodb.region` The AWS region to target. **Type**: `string` ### [](#dynamodb-table)`dynamodb.table` The name of the table to access. **Type**: `string` **Default**: `""` ### [](#dynamodb-tcp)`dynamodb.tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#dynamodb-tcp-connect_timeout)`dynamodb.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#dynamodb-tcp-keep_alive)`dynamodb.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#dynamodb-tcp-keep_alive-count)`dynamodb.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#dynamodb-tcp-keep_alive-idle)`dynamodb.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#dynamodb-tcp-keep_alive-interval)`dynamodb.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#dynamodb-tcp-tcp_user_timeout)`dynamodb.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#dynamodb-write_capacity_units)`dynamodb.write_capacity_units` Set the provisioned write capacity when creating the table with a `billing_mode` of `PROVISIONED`. **Type**: `int` **Default**: `0` ### [](#endpoint)`endpoint` A custom endpoint URL for AWS API requests. Use this to connect to AWS-compatible services or local testing environments instead of the standard AWS endpoints. **Type**: `string` ### [](#lease_period)`lease_period` The period of time after which a client that has failed to update a shard checkpoint is assumed to be inactive. **Type**: `string` **Default**: `30s` ### [](#rebalance_period)`rebalance_period` The period of time between each attempt to rebalance shards across clients. **Type**: `string` **Default**: `30s` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#start_from_oldest)`start_from_oldest` Whether to consume from the oldest message when a sequence does not yet exist for the stream. **Type**: `bool` **Default**: `true` ### [](#steal_grace_period)`steal_grace_period` Determines how long beyond the next commit period a client will wait when stealing a shard for the current owner to store a checkpoint. A longer value increases the time taken to balance shards but reduces the likelihood of processing duplicate messages. **Type**: `string` **Default**: `2s` ### [](#streams)`streams[]` One or more Kinesis data streams to consume from. Streams can either be specified by their name or full ARN. Shards of a stream are automatically balanced across consumers by coordinating through the provided DynamoDB table. Multiple comma separated streams can be listed in a single element. Shards are automatically distributed across consumers of a stream by coordinating through the provided DynamoDB table. Alternatively, it’s possible to specify an explicit shard to consume from with a colon after the stream name, e.g. `foo:0` would consume the shard `0` of the stream `foo`. **Type**: `array` ```yaml # Examples: streams: - foo - "arn:aws:kinesis:*:111122223333:stream/my-stream" ``` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 85: aws_s3 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/aws_s3.md --- # aws\_s3 --- title: aws_s3 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/aws_s3 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/aws_s3.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_s3.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/aws_s3/)[Cache](/redpanda-cloud/develop/connect/components/caches/aws_s3/)[Output](/redpanda-cloud/develop/connect/components/outputs/aws_s3/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/aws_s3/ "View the Self-Managed version of this component") Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix, either by walking the items in the bucket or by streaming upload notifications in real time. #### Common ```yml inputs: label: "" aws_s3: bucket: "" prefix: "" scanner: to_the_end: {} sqs: url: "" endpoint: "" key_path: Records.*.s3.object.key bucket_path: Records.*.s3.bucket.name envelope_path: "" delay_period: "" max_messages: 10 wait_time_seconds: 0 nack_visibility_timeout: 0 ``` #### Advanced ```yml inputs: label: "" aws_s3: bucket: "" prefix: "" region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) force_path_style_urls: false delete_objects: false scanner: to_the_end: {} sqs: url: "" endpoint: "" key_path: Records.*.s3.object.key bucket_path: Records.*.s3.bucket.name envelope_path: "" delay_period: "" max_messages: 10 wait_time_seconds: 0 nack_visibility_timeout: 0 ``` ## [](#stream-objects-on-upload-with-sqs)Stream objects on upload with SQS A common pattern for consuming S3 objects is to emit upload notification events from the bucket either directly to an SQS queue, or to an SNS topic that is consumed by an SQS queue, and then have your consumer listen for events that prompt it to download the newly uploaded objects. More information about this pattern and how to set it up can be found in the [Amazon S3 docs](https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html). Redpanda Connect is able to follow this pattern when you configure an `sqs.url`, where it consumes events from SQS and downloads only the object keys contained in those events. For this to work, Redpanda Connect needs to know where within the event the key and bucket names can be found, specified as [dot paths](../../../configuration/field_paths/) with the fields `sqs.key_path` and `sqs.bucket_path`. The default values for these fields should already be correct when following the guide above. If your notification events are being routed to SQS via an SNS topic, the events are enveloped by SNS, in which case you also need to specify the field `sqs.envelope_path`, which in the case of SNS to SQS will usually be `Message`. When using SQS, make sure you have sensible values for `sqs.max_messages` and also the visibility timeout of the queue itself. When Redpanda Connect consumes an S3 object the SQS message that triggered it is not deleted until the S3 object has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 object takes longer to process than the visibility timeout of your queue, then the same objects might be processed multiple times. ## [](#download-large-files)Download large files When downloading large files, process them in streamed parts to avoid loading the entire file into memory at once. To do this, specify a [`scanner`](#scanner) that determines how to break the input into smaller individual messages. ## [](#bucket-and-prefix)Bucket and prefix The `bucket` field accepts a bucket name only, not an ARN. For example, use `my-bucket`, not `arn:aws:s3:::my-bucket`. The `prefix` field accepts a single string. To consume from multiple prefixes in the same bucket, use multiple `aws_s3` inputs in a [`broker` input](../broker/): ```yaml input: broker: inputs: - aws_s3: bucket: my-bucket prefix: logs/app1/ - aws_s3: bucket: my-bucket prefix: logs/app2/ ``` ## [](#credentials)Credentials By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level to transfer data across accounts. You can find out more in [AWS credentials](../../../guides/cloud/aws/). ## [](#s3-compatible-storage)S3-compatible storage The `endpoint` and `force_path_style_urls` fields let you connect to S3-compatible storage services such as Cloudflare R2, MinIO, or DigitalOcean Spaces. For Cloudflare R2, set `endpoint` to your account endpoint URL and enable `force_path_style_urls`: ```yaml input: aws_s3: bucket: r2-bucket endpoint: https://.r2.cloudflarestorage.com force_path_style_urls: true region: auto credentials: id: secret: ``` Find your account ID in the Cloudflare dashboard under **R2 > Overview > Account Details**. Generate API credentials under **R2 > Manage R2 API Tokens**. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - s3\_key - s3\_bucket - s3\_last\_modified\_unix - s3\_last\_modified (RFC3339) - s3\_content\_type - s3\_content\_encoding - s3\_version\_id - All user defined metadata You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). User-defined metadata is case insensitive in AWS, so keys are often received in capitalized form. To normalize them, map all metadata keys to lowercase or uppercase using a Bloblang mapping such as `meta = meta().map_each_key(key → key.lowercase())`. ## [](#fields)Fields ### [](#bucket)`bucket` The bucket to consume from. If the field `sqs.url` is specified this field is optional. **Type**: `string` **Default**: `""` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#delete_objects)`delete_objects` Whether to delete downloaded objects from the bucket once they are processed. **Type**: `bool` **Default**: `false` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#force_path_style_urls)`force_path_style_urls` Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints. **Type**: `bool` **Default**: `false` ### [](#prefix)`prefix` An optional path prefix, if set only objects with the prefix are consumed when walking a bucket. **Type**: `string` **Default**: `""` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#scanner)`scanner` The [scanner](../../scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` ### [](#sqs)`sqs` Consume SQS messages in order to trigger key downloads. **Type**: `object` ### [](#sqs-bucket_path)`sqs.bucket_path` A [dot path](../../../configuration/field_paths/) whereby the bucket name can be found in SQS messages. **Type**: `string` **Default**: `Records.*.s3.bucket.name` ### [](#sqs-delay_period)`sqs.delay_period` An optional period of time to wait from when a notification was originally sent to when the target key download is attempted. **Type**: `string` **Default**: `""` ```yaml # Examples: delay_period: 10s # --- delay_period: 5m ``` ### [](#sqs-endpoint)`sqs.endpoint` A custom endpoint to use when connecting to SQS. **Type**: `string` **Default**: `""` ### [](#sqs-envelope_path)`sqs.envelope_path` A [dot path](../../../configuration/field_paths/) of a field to extract an enveloped JSON payload for further extracting the key and bucket from SQS messages. This is specifically useful when subscribing an SQS queue to an SNS topic that receives bucket events. **Type**: `string` **Default**: `""` ```yaml # Examples: envelope_path: Message ``` ### [](#sqs-key_path)`sqs.key_path` A [dot path](../../../configuration/field_paths/) whereby object keys are found in SQS messages. **Type**: `string` **Default**: `Records.*.s3.object.key` ### [](#sqs-max_messages)`sqs.max_messages` The maximum number of SQS messages to consume from each request. **Type**: `int` **Default**: `10` ### [](#sqs-nack_visibility_timeout)`sqs.nack_visibility_timeout` Custom SQS Nack Visibility timeout in seconds. Default is 0 **Type**: `int` **Default**: `0` ### [](#sqs-url)`sqs.url` An optional SQS URL to connect to. When specified this queue will control which objects are downloaded. **Type**: `string` **Default**: `""` ### [](#sqs-wait_time_seconds)`sqs.wait_time_seconds` Whether to set the wait time. Enabling this activates long-polling. Valid values: 0 to 20. **Type**: `int` **Default**: `0` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 86: aws_sqs **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/aws_sqs.md --- # aws\_sqs --- title: aws_sqs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/aws_sqs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/aws_sqs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_sqs.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/aws_sqs/)[Output](/redpanda-cloud/develop/connect/components/outputs/aws_sqs/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/aws_sqs/ "View the Self-Managed version of this component") Consume messages from an AWS SQS URL. #### Common ```yml inputs: label: "" aws_sqs: url: "" # No default (required) max_outstanding_messages: 1000 ``` #### Advanced ```yml inputs: label: "" aws_sqs: url: "" # No default (required) delete_message: true reset_visibility: true max_number_of_messages: 10 max_outstanding_messages: 1000 wait_time_seconds: 0 message_timeout: 30s region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` ## [](#credentials)Credentials By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level, which allows you to transfer data across accounts. To find out more, see [Amazon Web Services](../../../guides/cloud/aws/). ## [](#metadata)Metadata This input adds the following metadata fields to each message: - sqs\_message\_id - sqs\_receipt\_handle - sqs\_approximate\_receive\_count - All message attributes You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#delete_message)`delete_message` Whether to delete the consumed message when it’s acknowledged. Set to `false` to handle the deletion using a different mechanism. **Type**: `bool` **Default**: `true` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#max_number_of_messages)`max_number_of_messages` The maximum number of messages that Redpanda Connect can return each time it polls the SQS URL. Enter values from `1` to `10` only. **Type**: `int` **Default**: `10` ### [](#max_outstanding_messages)`max_outstanding_messages` The maximum number of pending messages that Redpanda Connect can have in flight at the same time. **Type**: `int` **Default**: `1000` ### [](#message_timeout)`message_timeout` The maximum time allowed to process a received message before Redpanda Connect refreshes the [receipt handle](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-queue-message-identifiers.html), and the message becomes visible in the queue again. Redpanda Connect attempts to refresh the receipt handle after half of the timeout has elapsed. **Type**: `string` **Default**: `30s` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#reset_visibility)`reset_visibility` Whether to set the visibility timeout of the consumed message to zero if Redpanda Connect receives a negative acknowledgement. Set to `false` to use the [queue’s visibility timeout](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html) for each message rather than releasing the message immediately for reprocessing. **Type**: `bool` **Default**: `true` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#url)`url` The SQS URL to consume from. **Type**: `string` ### [](#wait_time_seconds)`wait_time_seconds` Whether to set a wait time (in seconds). Enter values from `1` to `20` to enable wait times and to activate [log polling](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-short-and-long-polling.html) for queued messages. **Type**: `int` **Default**: `0` --- # Page 87: azure_blob_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/azure_blob_storage.md --- # azure\_blob\_storage --- title: azure_blob_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/azure_blob_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/azure_blob_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/azure_blob_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/azure_blob_storage/)[Output](/redpanda-cloud/develop/connect/components/outputs/azure_blob_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/azure_blob_storage/ "View the Self-Managed version of this component") Downloads objects within an Azure Blob Storage container, optionally filtered by a prefix. #### Common ```yml inputs: label: "" azure_blob_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" container: "" # No default (required) prefix: "" scanner: to_the_end: {} targets_input: "" # No default (optional) ``` #### Advanced ```yml inputs: label: "" azure_blob_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" container: "" # No default (required) prefix: "" scanner: to_the_end: {} delete_objects: false targets_input: "" # No default (optional) ``` Supports multiple authentication methods but only one of the following is required: - `storage_connection_string` - `storage_account` and `storage_access_key` - `storage_account` and `storage_sas_token` - `storage_account` to access via [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) If multiple are set then the `storage_connection_string` is given priority. If the `storage_connection_string` does not contain the `AccountName` parameter, please specify it in the `storage_account` field. ## [](#download-large-files)Download large files When downloading large files it’s often necessary to process it in streamed parts in order to avoid loading the entire file in memory at a given time. In order to do this a [`scanner`](#scanner) can be specified that determines how to break the input into smaller individual messages. ## [](#stream-new-files)Stream new files By default this input will consume all files found within the target container and will then gracefully terminate. This is referred to as a "batch" mode of operation. However, it’s possible to instead configure a container as [an Event Grid source](https://learn.microsoft.com/en-gb/azure/event-grid/event-schema-blob-storage) and then use this as a [`targets_input`](#targets_input), in which case new files are consumed as they’re uploaded and Redpanda Connect will continue listening for and downloading files as they arrive. This is referred to as a "streamed" mode of operation. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - blob\_storage\_key - blob\_storage\_container - blob\_storage\_last\_modified - blob\_storage\_last\_modified\_unix - blob\_storage\_content\_type - blob\_storage\_content\_encoding - All user defined metadata You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#container)`container` The name of the container from which to download blobs. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#delete_objects)`delete_objects` Whether to delete downloaded objects from the blob once they are processed. **Type**: `bool` **Default**: `false` ### [](#prefix)`prefix` An optional path prefix, if set only objects with the prefix are consumed. **Type**: `string` **Default**: `""` ### [](#scanner)`scanner` The [scanner](../../scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The storage account SAS token. This field is ignored if `storage_connection_string` or `storage_access_key` are set. **Type**: `string` **Default**: `""` ### [](#targets_input)`targets_input` > ⚠️ **CAUTION** > > This is an experimental field that provides an optional source of download targets, configured as a [regular Redpanda Connect input](../about/). Each message yielded by this input should be a single structured object containing a field `name`, which represents the blob to be downloaded. This requires setting up [Azure Blob Storage as an Event Grid source](https://learn.microsoft.com/en-gb/azure/event-grid/event-schema-blob-storage) and an associated event handler that a Redpanda Connect input can read from. For example, use either one of the following: - [Azure Event Hubs](https://learn.microsoft.com/en-gb/azure/event-grid/handler-event-hubs) using the `kafka` input - [Namespace topics](https://learn.microsoft.com/en-gb/azure/event-grid/handler-event-grid-namespace-topic) using the `mqtt` input **Type**: `input` ```yaml # Examples: targets_input: mqtt: topics: - some-topic urls: - example.westeurope-1.ts.eventgrid.azure.net:8883 processors: - unarchive: format: json_array - mapping: |- if this.eventType == "Microsoft.Storage.BlobCreated" { root.name = this.data.url.parse_url().path.trim_prefix("/foocontainer/") } else { root = deleted() } ``` --- # Page 88: azure_cosmosdb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/azure_cosmosdb.md --- # azure\_cosmosdb --- title: azure_cosmosdb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/azure_cosmosdb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/azure_cosmosdb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/azure_cosmosdb.adoc categories: "[\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/azure_cosmosdb/)[Output](/redpanda-cloud/develop/connect/components/outputs/azure_cosmosdb/)[Processor](/redpanda-cloud/develop/connect/components/processors/azure_cosmosdb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/azure_cosmosdb/ "View the Self-Managed version of this component") Executes a SQL query against [Azure CosmosDB](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction) and creates a batch of messages from each page of items. #### Common ```yml inputs: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) query: "" # No default (required) args_mapping: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) query: "" # No default (required) args_mapping: "" # No default (optional) batch_count: -1 auto_replay_nacks: true ``` ## [](#cross-partition-queries)Cross-partition queries Cross-partition queries are currently not supported by the underlying driver. For every query, the PartitionKey values must be known in advance and specified in the config. [See details](https://github.com/Azure/azure-sdk-for-go/issues/18578#issuecomment-1222510989). ## [](#credentials)Credentials You can use one of the following authentication mechanisms: - Set the `endpoint` field and the `account_key` field - Set only the `endpoint` field to use [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) - Set the `connection_string` field ## [](#metadata)Metadata This component adds the following metadata fields to each message: ```none - activity_id - request_charge ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#examples)Examples ### [](#query-container)Query container Execute a parametrized SQL query to select documents from a container. ```yaml input: azure_cosmosdb: endpoint: http://localhost:8080 account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== database: blobbase container: blobfish partition_keys_map: root = "AbyssalPlain" query: SELECT * FROM blobfish AS b WHERE b.species = @species args_mapping: | root = [ { "Name": "@species", "Value": "smooth-head" }, ] ``` ## [](#fields)Fields ### [](#account_key)`account_key` Account key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== ``` ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) that, for each message, creates a list of arguments to use with the query. **Type**: `string` ```yaml # Examples: args_mapping: |- root = [ { "Name": "@name", "Value": "benthos" }, ] ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batch_count)`batch_count` The maximum number of messages that should be accumulated into each batch. Use '-1' specify dynamic page size. **Type**: `int` **Default**: `-1` ### [](#connection_string)`connection_string` Connection string. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: connection_string: AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==; ``` ### [](#container)`container` Container. **Type**: `string` ```yaml # Examples: container: testcontainer ``` ### [](#database)`database` Database. **Type**: `string` ```yaml # Examples: database: testdb ``` ### [](#endpoint)`endpoint` CosmosDB endpoint. **Type**: `string` ```yaml # Examples: endpoint: https://localhost:8081 ``` ### [](#partition_keys_map)`partition_keys_map` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to a single partition key value or an array of partition key values of type string, integer or boolean. Currently, hierarchical partition keys are not supported so only one value may be provided. **Type**: `string` ```yaml # Examples: partition_keys_map: root = "blobfish" # --- partition_keys_map: root = 41 # --- partition_keys_map: root = true # --- partition_keys_map: root = null # --- partition_keys_map: root = now().ts_format("2006-01-02") ``` ### [](#query)`query` The query to execute **Type**: `string` ```yaml # Examples: query: SELECT c.foo FROM testcontainer AS c WHERE c.bar = "baz" AND c.timestamp < @timestamp ``` ## [](#cosmosdb-emulator)CosmosDB emulator If you wish to run the CosmosDB emulator that is referenced in the documentation [here](https://learn.microsoft.com/en-us/azure/cosmos-db/linux-emulator), the following Docker command should do the trick: ```bash > docker run --rm -it -p 8081:8081 --name=cosmosdb -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator ``` Note: `AZURE_COSMOS_EMULATOR_PARTITION_COUNT` controls the number of partitions that will be supported by the emulator. The bigger the value, the longer it takes for the container to start up. Additionally, instead of installing the container self-signed certificate which is exposed via `[https://localhost:8081/_explorer/emulator.pem](https://localhost:8081/_explorer/emulator.pem)`, you can run [mitmproxy](https://mitmproxy.org/) like so: ```bash > mitmproxy -k --mode "reverse:https://localhost:8081" ``` Then you can access the CosmosDB UI via `[http://localhost:8080/_explorer/index.html](http://localhost:8080/_explorer/index.html)` and use `[http://localhost:8080](http://localhost:8080)` as the CosmosDB endpoint. --- # Page 89: azure_queue_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/azure_queue_storage.md --- # azure\_queue\_storage --- title: azure_queue_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/azure_queue_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/azure_queue_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/azure_queue_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/azure_queue_storage/)[Output](/redpanda-cloud/develop/connect/components/outputs/azure_queue_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/azure_queue_storage/ "View the Self-Managed version of this component") Dequeue objects from an Azure Storage Queue. #### Common ```yml inputs: label: "" azure_queue_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" queue_name: "" # No default (required) ``` #### Advanced ```yml inputs: label: "" azure_queue_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" queue_name: "" # No default (required) dequeue_visibility_timeout: 30s max_in_flight: 10 track_properties: false ``` This input adds the following metadata fields to each message: ```none - queue_storage_insertion_time - queue_storage_queue_name - queue_storage_message_lag (if 'track_properties' set to true) - All user defined queue metadata ``` Only one authentication method is required, `storage_connection_string` or `storage_account` and `storage_access_key`. If both are set then the `storage_connection_string` is given priority. ## [](#fields)Fields ### [](#dequeue_visibility_timeout)`dequeue_visibility_timeout` The timeout duration until a dequeued message gets visible again, 30s by default **Type**: `string` **Default**: `30s` ### [](#max_in_flight)`max_in_flight` The maximum number of unprocessed messages to fetch at a given time. **Type**: `int` **Default**: `10` ### [](#queue_name)`queue_name` The name of the source storage queue. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: queue_name: foo_queue # --- queue_name: ${! env("MESSAGE_TYPE").lowercase() } ``` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#track_properties)`track_properties` If set to `true` the queue is polled on each read request for information such as the queue message lag. These properties are added to consumed messages as metadata, but will also have a negative performance impact. **Type**: `bool` **Default**: `false` --- # Page 90: azure_table_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/azure_table_storage.md --- # azure\_table\_storage --- title: azure_table_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/azure_table_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/azure_table_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/azure_table_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/azure_table_storage/)[Output](/redpanda-cloud/develop/connect/components/outputs/azure_table_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/azure_table_storage/ "View the Self-Managed version of this component") Queries an Azure Storage Account Table, optionally with multiple filters. #### Common ```yml inputs: label: "" azure_table_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" table_name: "" # No default (required) ``` #### Advanced ```yml inputs: label: "" azure_table_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" table_name: "" # No default (required) filter: "" select: "" page_size: 1000 ``` Queries an Azure Storage Account Table, optionally with multiple filters. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - table\_storage\_name - row\_num You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#filter)`filter` OData filter expression. Is not set all rows are returned. Valid operators are `eq, ne, gt, lt, ge and le` **Type**: `string` **Default**: `""` ```yaml # Examples: filter: PartitionKey eq 'foo' and RowKey gt '1000' ``` ### [](#page_size)`page_size` Maximum number of records to return on each page. **Type**: `int` **Default**: `1000` ### [](#select)`select` Select expression using OData notation. Limits the columns on each record to just those requested. **Type**: `string` **Default**: `""` ```yaml # Examples: select: PartitionKey,RowKey,Foo,Bar,Timestamp ``` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The storage account SAS token. This field is ignored if `storage_connection_string` or `storage_access_key` are set. **Type**: `string` **Default**: `""` ### [](#table_name)`table_name` The table to read messages from. **Type**: `string` ```yaml # Examples: table_name: Foo ``` --- # Page 91: batched **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/batched.md --- # batched --- title: batched latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/batched page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/batched.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/batched.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/batched/ "View the Self-Managed version of this component") Consumes data from a child input and applies a batching policy to the stream. #### Common ```yml inputs: label: "" batched: child: "" # No default (required) policy: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml inputs: label: "" batched: child: "" # No default (required) policy: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Batching at the input level is sometimes useful for processing across micro-batches, and can also sometimes be a useful performance trick. However, most inputs are fine without it so unless you have a specific plan for batching this component is not worth using. ## [](#fields)Fields ### [](#child)`child` The child input. **Type**: `input` ### [](#policy)`policy` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: policy: byte_size: 5000 count: 0 period: 1s # --- policy: count: 10 period: 1s # --- policy: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#policy-byte_size)`policy.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#policy-check)`policy.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#policy-count)`policy.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#policy-period)`policy.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#policy-processors)`policy.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` --- # Page 92: broker **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/broker.md --- # broker --- title: broker latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/broker page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/broker.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/broker.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/broker/)[Output](/redpanda-cloud/develop/connect/components/outputs/broker/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/broker/ "View the Self-Managed version of this component") Allows you to combine multiple inputs into a single stream of data, where each input will be read in parallel. #### Common ```yml inputs: label: "" broker: inputs: [] # No default (required) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml inputs: label: "" broker: copies: 1 inputs: [] # No default (required) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` A broker type is configured with its own list of input configurations and a field to specify how many copies of the list of inputs should be created. Adding more input types allows you to combine streams from multiple sources into one. For example, reading from both RabbitMQ and Kafka: ```yaml input: broker: copies: 1 inputs: - amqp_0_9: urls: - amqp://guest:guest@localhost:5672/ consumer_tag: benthos-consumer queue: benthos-queue # Optional list of input specific processing steps processors: - mapping: | root.message = this root.meta.link_count = this.links.length() root.user.age = this.user.age.number() - kafka: addresses: - localhost:9092 client_id: benthos_kafka_input consumer_group: benthos_consumer_group topics: [ benthos_stream:0 ] ``` If the number of copies is greater than zero the list will be copied that number of times. For example, if your inputs were of type foo and bar, with 'copies' set to '2', you would end up with two 'foo' inputs and two 'bar' inputs. ## [](#batching)Batching It’s possible to configure a [batch policy](../../../configuration/batching/#batch-policy) with a broker using the `batching` fields. When doing this the feeds from all child inputs are combined. Some inputs do not support broker based batching and specify this in their documentation. ## [](#processors)Processors It is possible to configure [processors](../../processors/about/) at the broker level, where they will be applied to _all_ child inputs, as well as on the individual child inputs. If you have processors at both the broker level _and_ on child inputs then the broker processors will be applied _after_ the child nodes processors. ## [](#fields)Fields ### [](#batching-2)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#copies)`copies` Whatever is specified within `inputs` will be created this many times. **Type**: `int` **Default**: `1` ### [](#inputs)`inputs[]` A list of inputs to create. **Type**: `input` --- # Page 93: gateway **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/gateway.md --- # gateway --- title: gateway latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/gateway page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/gateway.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/gateway.adoc page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Available in:** Cloud The `gateway` input is a Cloud-only component that receives messages over HTTP and injects them into a running Redpanda Connect pipeline. It’s ideal for: - Receiving webhook events from third-party services - Accepting real-time telemetry or sensor data over HTTP - Building lightweight ingest endpoints for client apps For on-premises or self-managed deployments, use the [`http_server`](../../../../../../redpanda-connect/components/inputs/http_server/) input instead. This component is fully managed and available in the following Redpanda Cloud deployment types: - **Serverless** - **Dedicated** - **Bring Your Own Cloud (BYOC)** When a pipeline with a `gateway` input is deployed, Redpanda Cloud provisions a secure URL that you can use to send HTTP requests. You can post raw payloads, JSON messages, or stream events in real time. Authentication and access control are handled through standard Redpanda Cloud API tokens. For more information, see [Cloud API Authentication](/api/doc/cloud-dataplane/authentication). Network access: - On **public clusters** (Serverless and Dedicated), the gateway URL is accessible over the public internet. - On **private clusters** (BYOC), the gateway is accessible only from within your configured VPC. #### Common ```yaml input: label: "" gateway: path: / rate_limit: "" ``` #### Advanced ```yaml input: label: "" gateway: path: / rate_limit: "" sync_response: status: "200" headers: Content-Type: application/octet-stream metadata_headers: include_prefixes: [] include_patterns: [] ``` The field `rate_limit` allows you to specify an optional [`rate_limit` resource](../../rate_limits/about/) that applies to all HTTP requests. When the rate limit is breached, HTTP requests return a 429 response with a Retry-After header. ## [](#responses)Responses You can also return a response for each message received using [synchronous responses](../../../guides/sync_responses/). When doing so, you can customize headers using the `sync_response.headers` field, which supports [function interpolation](../../../configuration/interpolation/#bloblang-queries) in the value based on the response message contents. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `http_server_user_agent` - `http_server_request_path` - `http_server_verb` - `http_server_remote_ip` - All headers (only first values are taken) - All query parameters - All path parameters - All cookies You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#path)`path` The endpoint path to listen for data delivery requests. **Type**: `string` **Default**: `/` ### [](#rate_limit)`rate_limit` An optional [rate limit](../../rate_limits/about/) to throttle requests by. **Type**: `string` **Default**: `""` ### [](#sync_response)`sync_response` Customize messages returned using [synchronous responses](../../../guides/sync_responses/). **Type**: `object` ### [](#sync_response-headers)`sync_response.headers` Specify headers to return with synchronous responses. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: ```yaml Content-Type: "application/octet-stream" ``` ### [](#sync_response-metadata_headers)`sync_response.metadata_headers` Specify criteria for which metadata values are added to the response as headers. **Type**: `object` ### [](#sync_response-metadata_headers-include_patterns)`sync_response.metadata_headers.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#sync_response-metadata_headers-include_prefixes)`sync_response.metadata_headers.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#sync_response-status)`sync_response.status` Specify the status code to return with synchronous responses. This is a string value, which allows you to customize it based on resulting payloads and their metadata. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `200` ```yaml # Examples: status: ${! json("status") } # --- status: ${! meta("status") } ``` ### [](#tcp)`tcp` Customize messages returned via [synchronous responses](../../../guides/sync_responses/). **Type**: `object` ### [](#tcp-reuse_addr)`tcp.reuse_addr` Enable SO\_REUSEADDR, allowing binding to ports in TIME\_WAIT state. Useful for graceful restarts and config reloads where the server needs to rebind to the same port immediately after shutdown. **Type**: `bool` **Default**: `false` ### [](#tcp-reuse_port)`tcp.reuse_port` Enable SO\_REUSEPORT, allowing multiple sockets to bind to the same port for load balancing across multiple processes/threads. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#ingest-a-real-time-stream-of-sensor-data)Ingest a real-time stream of sensor data Use the `gateway` input to stream telemetry data from edge devices or browser clients that connect over HTTP. Suppose a client connects and sends JSON-encoded sensor readings like this: ```json { "sensor_id": "temp-001", "value": 22.5, "unit": "C" } { "sensor_id": "temp-001", "value": 22.8, "unit": "C" } { "sensor_id": "temp-001", "value": 23.1, "unit": "C" } ``` Redpanda Connect treats each line as an individual message. The following pipeline sets up a `gateway` input to handle these connections and logs each message: ```yaml input: label: sensor_stream gateway: path: /ws/sensors rate_limit: "" pipeline: processors: - log: level: INFO message: "Received reading from ${! json(\"sensor_id\") }: ${! json(\"value\") } ${! json(\"unit\") }" ``` This configuration: - Accepts HTTP connections on `/ws/sensors` - Receives a stream of messages over a single connection - Logs each message using Bloblang interpolation You can replace the `log` processor with any downstream output, such as Redpanda or Amazon S3, to persist or analyze the data in real time. --- # Page 94: gcp_bigquery_select **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/gcp_bigquery_select.md --- # gcp\_bigquery\_select --- title: gcp_bigquery_select latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/gcp_bigquery_select page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/gcp_bigquery_select.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/gcp_bigquery_select.adoc categories: "[\"Services\",\"GCP\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/gcp_bigquery_select/)[Processor](/redpanda-cloud/develop/connect/components/processors/gcp_bigquery_select/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/gcp_bigquery_select/ "View the Self-Managed version of this component") Executes a `SELECT` query against BigQuery and creates a message for each row received. ```yml inputs: label: "" gcp_bigquery_select: project: "" # No default (required) credentials_json: "" table: "" # No default (required) columns: [] # No default (required) where: "" # No default (optional) auto_replay_nacks: true job_labels: {} priority: "" args_mapping: "" # No default (optional) prefix: "" # No default (optional) suffix: "" # No default (optional) ``` Once the rows from the query are exhausted, this input shuts down, allowing the pipeline to gracefully terminate (or the next input in a [sequence](../sequence/) to execute). ## [](#examples)Examples ### [](#word-counts)Word counts Here we query the public corpus of Shakespeare’s works to generate a stream of the top 10 words that are 3 or more characters long: ```yaml input: gcp_bigquery_select: project: sample-project table: bigquery-public-data.samples.shakespeare columns: - word - sum(word_count) as total_count where: length(word) >= ? suffix: | GROUP BY word ORDER BY total_count DESC LIMIT 10 args_mapping: | root = [ 3 ] ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `where`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ "article", now().ts_format("2006-01-02") ] ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#columns)`columns[]` A list of columns to query. **Type**: `array` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#job_labels)`job_labels` A list of labels to add to the query job. **Type**: `string` **Default**: `{}` ### [](#prefix)`prefix` An optional prefix to prepend to the select query (before SELECT). **Type**: `string` ### [](#priority)`priority` The priority with which to schedule the query. **Type**: `string` **Default**: `""` ### [](#project)`project` GCP project where the query job will execute. **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the select query. **Type**: `string` ### [](#table)`table` Fully-qualified BigQuery table name to query. **Type**: `string` ```yaml # Examples: table: bigquery-public-data.samples.shakespeare ``` ### [](#where)`where` An optional where clause to add. Placeholder arguments are populated with the `args_mapping` field. Placeholders should always be question marks (`?`). **Type**: `string` ```yaml # Examples: where: type = ? and created_at > ? # --- where: user_id = ? ``` --- # Page 95: gcp_cloud_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/gcp_cloud_storage.md --- # gcp\_cloud\_storage --- title: gcp_cloud_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/gcp_cloud_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/gcp_cloud_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/gcp_cloud_storage.adoc categories: "[\"Services\",\"GCP\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/gcp_cloud_storage/)[Cache](/redpanda-cloud/develop/connect/components/caches/gcp_cloud_storage/)[Output](/redpanda-cloud/develop/connect/components/outputs/gcp_cloud_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/gcp_cloud_storage/ "View the Self-Managed version of this component") Downloads objects within a Google Cloud Storage bucket, optionally filtered by a prefix. #### Common ```yml inputs: label: "" gcp_cloud_storage: bucket: "" # No default (required) prefix: "" credentials_json: "" scanner: to_the_end: {} ``` #### Advanced ```yml inputs: label: "" gcp_cloud_storage: bucket: "" # No default (required) prefix: "" credentials_json: "" scanner: to_the_end: {} delete_objects: false ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```none - gcs_key - gcs_bucket - gcs_last_modified - gcs_last_modified_unix - gcs_content_type - gcs_content_encoding - All user defined metadata ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ### [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to GCP services. You can find out more in [Google Cloud Platform](../../../guides/cloud/gcp/). ## [](#fields)Fields ### [](#bucket)`bucket` The name of the bucket from which to download objects. **Type**: `string` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#delete_objects)`delete_objects` Whether to delete downloaded objects from the bucket once they are processed. **Type**: `bool` **Default**: `false` ### [](#prefix)`prefix` Optional path prefix, if set only objects with the prefix are consumed. **Type**: `string` **Default**: `""` ### [](#scanner)`scanner` The [scanner](../../scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` --- # Page 96: gcp_pubsub **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/gcp_pubsub.md --- # gcp\_pubsub --- title: gcp_pubsub latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/gcp_pubsub page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/gcp_pubsub.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/gcp_pubsub.adoc categories: "[\"Services\",\"GCP\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/gcp_pubsub/)[Output](/redpanda-cloud/develop/connect/components/outputs/gcp_pubsub/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/gcp_pubsub/ "View the Self-Managed version of this component") Consumes messages from a GCP Cloud Pub/Sub subscription. #### Common ```yml inputs: label: "" gcp_pubsub: project: "" # No default (required) credentials_json: "" subscription: "" # No default (required) endpoint: "" sync: false max_outstanding_messages: 1000 max_outstanding_bytes: 1000000000 ``` #### Advanced ```yml inputs: label: "" gcp_pubsub: project: "" # No default (required) credentials_json: "" subscription: "" # No default (required) endpoint: "" sync: false max_outstanding_messages: 1000 max_outstanding_bytes: 1000000000 create_subscription: enabled: false topic: "" ``` For information on how to set up credentials see [this guide](https://cloud.google.com/docs/authentication/production). ## [](#metadata)Metadata This input adds the following metadata fields to each message: - gcp\_pubsub\_publish\_time\_unix - The time at which the message was published to the topic. - gcp\_pubsub\_delivery\_attempt - When dead lettering is enabled, this is set to the number of times PubSub has attempted to deliver a message. - All message attributes You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#create_subscription)`create_subscription` Allows you to configure the input subscription and creates if it doesn’t exist. **Type**: `object` ### [](#create_subscription-enabled)`create_subscription.enabled` Whether to configure subscription or not. **Type**: `bool` **Default**: `false` ### [](#create_subscription-topic)`create_subscription.topic` Defines the topic that the subscription should be vinculated to. **Type**: `string` **Default**: `""` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#endpoint)`endpoint` An optional endpoint to override the default of `pubsub.googleapis.com:443`. This can be used to connect to a region specific pubsub endpoint. For a list of valid values, see [this document](https://cloud.google.com/pubsub/docs/reference/service_apis_overview#list_of_regional_endpoints). **Type**: `string` **Default**: `""` ```yaml # Examples: endpoint: us-central1-pubsub.googleapis.com:443 # --- endpoint: us-west3-pubsub.googleapis.com:443 ``` ### [](#max_outstanding_bytes)`max_outstanding_bytes` The maximum number of outstanding pending messages to be consumed measured in bytes. **Type**: `int` **Default**: `1000000000` ### [](#max_outstanding_messages)`max_outstanding_messages` The maximum number of outstanding pending messages to be consumed at a given time. **Type**: `int` **Default**: `1000` ### [](#project)`project` The project ID of the target subscription. **Type**: `string` ### [](#subscription)`subscription` The target subscription ID. **Type**: `string` ### [](#sync)`sync` Enable synchronous pull mode. **Type**: `bool` **Default**: `false` --- # Page 97: gcp_spanner_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/gcp_spanner_cdc.md --- # gcp\_spanner\_cdc --- title: gcp_spanner_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/gcp_spanner_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/gcp_spanner_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/gcp_spanner_cdc.adoc categories: "[Services, GCP]" description: Creates an input that consumes from a spanner change stream. page-git-created-date: "2025-07-08" page-git-modified-date: "2025-07-08" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/gcp_spanner_cdc/ "View the Self-Managed version of this component") Creates an input that consumes from a spanner change stream. #### Common ```yaml inputs: label: "" gcp_spanner_cdc: credentials_json: "" project_id: "" # No default (required) instance_id: "" # No default (required) database_id: "" # No default (required) stream_id: "" # No default (required) start_timestamp: "" end_timestamp: "" batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yaml inputs: label: "" gcp_spanner_cdc: credentials_json: "" project_id: "" # No default (required) instance_id: "" # No default (required) database_id: "" # No default (required) stream_id: "" # No default (required) start_timestamp: "" end_timestamp: "" heartbeat_interval: 10s metadata_table: "" min_watermark_cache_ttl: 5s allowed_mod_types: [] # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) auto_replay_nacks: true ``` Consumes change records from a Google Cloud Spanner change stream. This input allows you to track and process database changes in real-time, making it useful for data replication, event-driven architectures, and maintaining derived data stores. The input reads from a specified change stream within a Spanner database and converts each change record into a message. The message payload contains the change records in JSON format, and metadata is added with details about the Spanner instance, database, and stream. Change streams provide a way to track mutations to your Spanner database tables. For more information about Spanner change streams, refer to the [Google Cloud documentation](https://cloud.google.com/spanner/docs/change-streams). ## [](#fields)Fields ### [](#allowed_mod_types)`allowed_mod_types[]` List of modification types to process. If not specified, all modification types are processed. Allowed values: INSERT, UPDATE, DELETE **Type**: `array` ```yaml # Examples: allowed_mod_types: - INSERT - UPDATE - DELETE ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The maximum total size (in bytes) that a batch can reach before it is passed on for processing or delivery (flushed). When the combined size of all messages in the batch exceeds this limit, the batch is immediately sent to the next stage (such as a processor or output). Set to `0` to disable size-based batching. When disabled, messages are flushed based on other conditions (such as `batching.count` or `batching.period`). **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages at which the batch should be flushed. Set the value to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The length of time after which an incomplete batch should be flushed regardless of its size. Supported time units are `ns`, `us`, `ms`, `s`, `m`, and `h`. For example, `1s` flushes a batch after one second. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, so any attempt to split it into smaller batches with these processors will be ignored. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#credentials_json)`credentials_json` Base64-encoded JSON credentials file for authenticating to GCP with a service account. If not provided, Application Default Credentials (ADC) is used. For more information about how to create a service account and obtain the credentials JSON, see the [Google Cloud documentation](https://cloud.google.com/docs/authentication/getting-started). **Type**: `string` **Default**: `""` ### [](#database_id)`database_id` The ID of the Spanner database to read from. This is the name of the database as it appears in the Spanner console or API. For more information about how to create a Spanner database, see the [Google Cloud documentation](https://cloud.google.com/spanner/docs/create-manage-databases). **Type**: `string` ### [](#end_timestamp)`end_timestamp` The timestamp at which to stop reading change records from the change stream. This is an optional field that allows you to limit the range of change records processed by the input. The timestamp should be in RFC3339 format, such as `2023-10-01T00:00:00Z`. If not provided, the input reads all available change records up to the current time. **Type**: `string` **Default**: `""` ```yaml # Examples: end_timestamp: 2022-01-01T00:00:00Z ``` ### [](#heartbeat_interval)`heartbeat_interval` The interval at which to send heartbeat messages to the output. Heartbeat messages are sent to indicate that the input is still active and processing changes. This can help prevent timeouts in downstream systems. Supported time units are `ns`, `us`, `ms`, `s`, `m`, and `h`. For example, `1s` sends a heartbeat every second. **Type**: `string` **Default**: `10s` ### [](#instance_id)`instance_id` The ID of the Spanner instance to read from. This is the name of the instance as it appears in the Spanner console or API. For more information about how to create a Spanner instance, see the [Google Cloud documentation](https://cloud.google.com/spanner/docs/create-manage-instances). **Type**: `string` ### [](#metadata_table)`metadata_table` The table to store metadata in (default: `cdc_metadata_`). **Type**: `string` **Default**: `""` ### [](#min_watermark_cache_ttl)`min_watermark_cache_ttl` Sets how frequently to query Spanner for the minimum watermark. **Type**: `string` **Default**: `5s` ### [](#project_id)`project_id` The ID of the GCP project that contains the Spanner instance and database. This is the name of the project as it appears in the GCP console or API. For more information about how to create a GCP project, see the [Google Cloud documentation](https://cloud.google.com/resource-manager/docs/creating-managing-projects). **Type**: `string` ### [](#start_timestamp)`start_timestamp` The timestamp at which to start reading change records from the change stream. This is an optional field that allows you to limit the range of change records processed by the input. The timestamp should be in RFC3339 format, such as `2023-10-01T00:00:00Z` (default: current time). **Type**: `string` **Default**: `""` ```yaml # Examples: start_timestamp: 2022-01-01T00:00:00Z ``` ### [](#stream_id)`stream_id` The name of the change stream to track. The stream must exist in the Spanner database. To create a change stream, follow the [Google Cloud documentation](https://cloud.google.com/spanner/docs/change-streams/manage). **Type**: `string` --- # Page 98: generate **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/generate.md --- # generate --- title: generate latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/generate page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/generate.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/generate.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/generate/ "View the Self-Managed version of this component") Generates messages at a given interval using a [Bloblang](../../../guides/bloblang/about/) mapping executed without a context. This allows you to generate messages for testing your pipeline configs. #### Common ```yml inputs: label: "" generate: mapping: "" # No default (required) interval: 1s count: 0 batch_size: 1 auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" generate: mapping: "" # No default (required) interval: 1s count: 0 batch_size: 1 auto_replay_nacks: true ``` ## [](#examples)Examples ### [](#cron-scheduled-processing)Cron Scheduled Processing A common use case for the generate input is to trigger processors on a schedule so that the processors themselves can behave similarly to an input. The following configuration reads rows from a PostgreSQL table every 5 minutes. ```yaml input: generate: interval: '@every 5m' mapping: 'root = {}' processors: - sql_select: driver: postgres dsn: postgres://foouser:foopass@localhost:5432/testdb?sslmode=disable table: foo columns: [ "*" ] ``` ### [](#generate-100-rows)Generate 100 Rows The generate input can be used as a convenient way to generate test data. The following example generates 100 rows of structured data by setting an explicit count. The interval field is set to empty, which means data is generated as fast as the downstream components can consume it. ```yaml input: generate: count: 100 interval: "" mapping: | root = if random_int() % 2 == 0 { { "type": "foo", "foo": "is yummy" } } else { { "type": "bar", "bar": "is gross" } } ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batch_size)`batch_size` The number of generated messages that should be accumulated into each batch flushed at the specified interval. **Type**: `int` **Default**: `1` ### [](#count)`count` An optional number of messages to generate, if set above 0 the specified number of messages is generated and then the input will shut down. **Type**: `int` **Default**: `0` ### [](#interval)`interval` The time interval at which messages should be generated, expressed either as a duration string or as a cron expression. If set to an empty string messages will be generated as fast as downstream services can process them. Cron expressions can specify a timezone by prefixing the expression with `TZ=`, where the location name corresponds to a file within the IANA Time Zone database. **Type**: `string` **Default**: `1s` ```yaml # Examples: interval: 5s # --- interval: 1m # --- interval: 1h # --- interval: @every 1s # --- interval: 0,30 */2 * * * * # --- interval: TZ=Europe/London 30 3-6,20-23 * * * ``` ### [](#mapping)`mapping` A [Bloblang](../../../guides/bloblang/about/) mapping to use for generating messages. **Type**: `string` ```yaml # Examples: mapping: root = "hello world" # --- mapping: root = {"test":"message","id":uuid_v4()} ``` --- # Page 99: git **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/git.md --- # git --- title: git latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/git page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/git.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/git.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/git/ "View the Self-Managed version of this component") Clones a Git repository, reads its contents, then polls for new commits at a configurable interval. Any updates are emitted as new messages. ```yml inputs: label: "" git: repository_url: "" # No default (required) branch: main poll_interval: 10s include_patterns: [] exclude_patterns: [] max_file_size: 10485760 checkpoint_cache: "" # No default (optional) checkpoint_key: git_last_commit auth: basic: username: "" password: "" ssh_key: private_key_path: "" private_key: "" passphrase: "" token: value: "" auto_replay_nacks: true ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `git_file_path` - `git_file_size` - `git_file_mode` - `git_file_modified` - `git_commit` - `git_mime_type` - `git_is_binary` - `git_deleted` (when a source file is deleted) You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#auth)`auth` Options for authenticating with your Git repository. **Type**: `object` ### [](#auth-basic)`auth.basic` Allows you to specify basic authentication. **Type**: `object` ### [](#auth-basic-password)`auth.basic.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#auth-basic-username)`auth.basic.username` The username to use for authentication. **Type**: `string` **Default**: `""` ### [](#auth-ssh_key)`auth.ssh_key` Allows you to specify SSH key authentication. **Type**: `object` ### [](#auth-ssh_key-passphrase)`auth.ssh_key.passphrase` The passphrase for your SSH private key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#auth-ssh_key-private_key)`auth.ssh_key.private_key` Your private SSH key. When using encrypted keys, you must also set a value for [`private_key_passphrase`](#auth-ssh_key-passphrase). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#auth-ssh_key-private_key_path)`auth.ssh_key.private_key_path` The path to your private SSH key file. When using encrypted keys, you must also set a value for [`private_key_passphrase`](#auth-ssh_key-passphrase). **Type**: `string` **Default**: `""` ### [](#auth-token)`auth.token` Allows you to specify token-based authentication. **Type**: `object` ### [](#auth-token-value)`auth.token.value` The token value to use for token-based authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#branch)`branch` The repository branch to check out. **Type**: `string` **Default**: `main` ### [](#checkpoint_cache)`checkpoint_cache` Specify a [`cache`](../../caches/about/) resource to store the last processed commit hash. After a restart, Redpanda Connect can then continue processing changes from where it left off, avoiding the need to reprocess all detected updates. **Type**: `string` ### [](#checkpoint_key)`checkpoint_key` The key to use when storing the last processed commit hash in the cache. **Type**: `string` **Default**: `git_last_commit` ### [](#exclude_patterns)`exclude_patterns[]` A list of file patterns to exclude. For example, you could choose not to read content from certain Git directories or image files: `'.git/**', '**/*.png'`. These patterns take precedence over `include_patterns`. The following patterns are supported: - Glob patterns: **, `/`**`*/`, `?` - Character ranges: `[a-z]`. Escape any character with a special meaning using a backslash. **Type**: `array` **Default**: `[]` ### [](#include_patterns)`include_patterns[]` A list of file patterns to read from. For example, you could read content from only Markdown and YAML files: `'***/**.md', 'configs/*.yaml'`. The following patterns are supported: - Glob patterns: **, `/`**`*/`, `?` - Character ranges: `[a-z]`. Escape any character with a special meaning using a backslash. If this field is left empty, all files are read from. **Type**: `array` **Default**: `[]` ### [](#max_file_size)`max_file_size` The maximum size of files to read from (in bytes). Files that exceed this limit are skipped. Set to `0` for unlimited file sizes. **Type**: `int` **Default**: `10485760` ### [](#poll_interval)`poll_interval` How frequently this input polls the Git repository for changes. **Type**: `string` **Default**: `10s` ```yaml # Examples: poll_interval: 10s ``` ### [](#repository_url)`repository_url` The URL of the Git repository to clone. **Type**: `string` ```yaml # Examples: repository_url: https://github.com/username/repo.git ``` --- # Page 100: http_client **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/http_client.md --- # http\_client --- title: http_client latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/http_client page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/http_client.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/http_client.adoc page-git-created-date: "2025-03-04" page-git-modified-date: "2025-03-04" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/http_client/)[Output](/redpanda-cloud/develop/connect/components/outputs/http_client/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/http_client/ "View the Self-Managed version of this component") Connects to a server and continuously requests single messages. #### Common ```yml inputs: label: "" http_client: url: "" # No default (required) verb: GET headers: {} rate_limit: "" # No default (optional) timeout: 5s payload: "" # No default (optional) stream: enabled: false reconnect: true scanner: lines: {} auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" http_client: url: "" # No default (required) verb: GET headers: {} metadata: include_prefixes: [] include_patterns: [] dump_request_log_level: "" oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] extract_headers: include_prefixes: [] include_patterns: [] rate_limit: "" # No default (optional) timeout: 5s retry_period: 1s max_retry_backoff: 300s retries: 3 follow_redirects: true backoff_on: - 429 drop_on: [] successful_on: [] proxy_url: "" # No default (optional) disable_http2: false payload: "" # No default (optional) drop_empty_bodies: true stream: enabled: false reconnect: true scanner: lines: {} auto_replay_nacks: true ``` ## [](#dynamic-url-and-header-settings)Dynamic URL and header settings You can set the [`url`](#url) and [`headers`](#headers) values dynamically using [function interpolations](../../../configuration/interpolation/#bloblang-queries). You can also add [function interpolations](../../../configuration/interpolation/#bloblang-queries) to the [`url`](#url) and [`headers`](#headers) fields to implement basic pagination, such as page numbers or tokens, where subsequent requests need to include data from previously-consumed responses. Example: ```yaml input: http_client: url: >- https://api.example.com/search?query=allmyfoos&start_time=${! ( (timestamp_unix()-300).ts_format("2006-01-02T15:04:05Z","UTC").escape_url_query() ) }${! ("&next_token="+this.meta.next_token.not_null()) | "" } verb: GET rate_limit: schedule_searches oauth2: enabled: true token_url: https://api.example.com/oauth2/token client_key: "${EXAMPLE_KEY}" client_secret: "${EXAMPLE_SECRET}" rate_limit_resources: - label: schedule_searches local: count: 1 interval: 30s ``` > 💡 **TIP** > > If pagination requires more complex logic, consider using the [`http` processor](../../processors/http/) combined with a [`generate` input](../generate/), which allows you to schedule the processor. ## [](#streaming-messages)Streaming messages If you [enable streaming](#stream-enabled), Redpanda Connect consumes the body of the server response as a continuous stream of data, and breaks the stream down into smaller, logical messages using the [specified scanner](#stream-scanner). This functionality allows you to consume APIs that provide long-lived streamed data feeds, such as stock market feeds. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay rejected messages (negative acknowledgements) at the output level. If the cause of rejections persists, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#backoff_on)`backoff_on[]` A list of status codes that indicate a request failure, and trigger retries with an increasing backoff period between attempts. **Type**: `int` **Default**: ```yaml - 429 ``` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#disable_http2)`disable_http2` Whether to disable HTTP/2. By default, HTTP/2 is enabled. **Type**: `bool` **Default**: `false` ### [](#drop_empty_bodies)`drop_empty_bodies` Whether to drop empty payloads received from the target server. **Type**: `bool` **Default**: `true` ### [](#drop_on)`drop_on[]` A list of status codes that indicate a request failure, where the input should not attempt retries. This helps avoid unnecessary retries for requests that are unlikely to succeed. > 📝 **NOTE** > > In these cases, the _request_ is dropped, but the _message_ that triggered the request is retained. **Type**: `int` **Default**: `[]` ### [](#dump_request_log_level)`dump_request_log_level` EXPERIMENTAL: Set the logging level for the request and response payloads of each HTTP request. **Type**: `string` **Default**: `""` **Options**: `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR`, `FATAL`, \`\` ### [](#extract_headers)`extract_headers` Specify which response headers to add to the resulting messages as metadata. Header keys are automatically converted to lowercase before matching, so make sure that your patterns target the lowercase versions of the expected header keys. **Type**: `object` ### [](#extract_headers-include_patterns)`extract_headers.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#extract_headers-include_prefixes)`extract_headers.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#follow_redirects)`follow_redirects` Whether or not to transparently follow redirects, i.e. responses with 300-399 status codes. If disabled, the response message will contain the body, status, and headers from the redirect response and the processor will not make a request to the URL set in the Location header of the response. **Type**: `bool` **Default**: `true` ### [](#headers)`headers` A map of headers to add to the request. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/octet-stream traceparent: ${! tracing_span().traceparent } ``` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication. This feature is in beta and may change in future releases. JWT tokens provide secure, stateless authentication between services. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Additional key-value pairs to include in the JWT header (optional). These headers provide extra metadata for JWT processing. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#max_retry_backoff)`max_retry_backoff` The maximum period to wait between failed requests. **Type**: `string` **Default**: `300s` ### [](#metadata)`metadata` Specify matching rules that determine which metadata keys to add to the HTTP request as headers (optional). **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#oauth)`oauth` Configure OAuth version 1.0 authentication for secure API access. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` The value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2)`oauth2` Allows you to specify open authentication using OAuth version 2 and the client credentials token flow. **Type**: `object` ### [](#oauth2-client_key)`oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#oauth2-client_secret)`oauth2.client_secret` The secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth2-enabled)`oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2-endpoint_params)`oauth2.endpoint_params` A list of endpoint parameters specified as arrays of strings (optional). **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: bar: - woof foo: - meow - quack ``` ### [](#oauth2-scopes)`oauth2.scopes[]` A list of requested permissions (optional). **Type**: `array` **Default**: `[]` ### [](#oauth2-token_url)`oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#payload)`payload` A payload to deliver for each request (optional). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#proxy_url)`proxy_url` A HTTP proxy URL (optional). **Type**: `string` ### [](#rate_limit)`rate_limit` A [rate limit](../../rate_limits/about/) to throttle requests by (optional). **Type**: `string` ### [](#retries)`retries` The maximum number of retry attempts to make. **Type**: `int` **Default**: `3` ### [](#retry_period)`retry_period` The initial period to wait between failed requests before retrying. **Type**: `string` **Default**: `1s` ### [](#stream)`stream` Enables streaming mode, where the HTTP connection remains open and messages are processed line-by-line. **Type**: `object` ### [](#stream-enabled)`stream.enabled` Enables streaming mode. **Type**: `bool` **Default**: `false` ### [](#stream-reconnect)`stream.reconnect` Whether to automatically reestablish the HTTP connection if it is lost. **Type**: `bool` **Default**: `true` ### [](#stream-scanner)`stream.scanner` The [scanner](../../scanners/about/) used to split the stream of bytes into individual messages. Scanners are useful for processing large data sources efficiently without holding the entire data set in memory. For example, the `csv` scanner processes individual rows in a CSV file without loading the entire file in memory. **Type**: `scanner` **Default**: ```yaml lines: {} ``` ### [](#successful_on)`successful_on[]` A list of HTTP status codes that should be considered as successful, even if they are not 2XX codes. This is useful for handling cases where non-2XX codes indicate that the request was processed successfully, such as `303 See Other` or `409 Conflict`. By default, all 2XX codes are considered successful unless they are specified in `backoff_on` or `drop_on` fields. **Type**: `int` **Default**: `[]` ### [](#timeout)`timeout` A static timeout to apply to requests. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL to connect to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#verb)`verb` A verb to connect with. **Type**: `string` **Default**: `GET` ```yaml # Examples: verb: POST # --- verb: GET # --- verb: DELETE ``` --- # Page 101: http_server **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/http_server.md --- # http\_server --- title: http_server latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/http_server page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/http_server.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/http_server.adoc categories: "[\"Network\"]" page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-18" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/http_server/)[Output](/redpanda-connect/components/outputs/http_server/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/http_server/ "View the Self-Managed version of this component") Receive messages sent over HTTP using POST requests. HTTP 2.0 is supported when using TLS, which is enabled when key and cert files are specified. #### Common ```yml inputs: label: "" http_server: address: "" path: /post ws_path: /post/ws allowed_verbs: - "POST" timeout: 5s rate_limit: "" ``` #### Advanced ```yml inputs: label: "" http_server: address: "" path: /post ws_path: /post/ws ws_welcome_message: "" ws_rate_limit_message: "" allowed_verbs: - "POST" timeout: 5s rate_limit: "" cert_file: "" key_file: "" cors: enabled: false allowed_origins: [] sync_response: status: 200 headers: Content-Type: "application/octet-stream" metadata_headers: include_prefixes: [] include_patterns: [] tcp: reuse_addr: false reuse_port: false ``` The field `rate_limit` allows you to specify an optional [`rate_limit` resource](../../rate_limits/about/), which will be applied to each HTTP request made and each websocket payload received. When the rate limit is breached HTTP requests will have a 429 response returned with a Retry-After header. Websocket payloads will be dropped and an optional response payload will be sent as per `ws_rate_limit_message`. ## [](#responses)Responses It’s possible to return a response for each message received using [synchronous responses](../../../guides/sync_responses/). When doing so you can customize headers with the `sync_response` field `headers`, which can also use [function interpolation](../../../configuration/interpolation/#bloblang-queries) in the value based on the response message contents. ## [](#endpoints)Endpoints The following fields specify endpoints that are registered for sending messages, and support path parameters of the form `/{foo}`, which are added to ingested messages as metadata. A path ending in `/` will match against all extensions of that path: ### [](#path-defaults-to-post)`path` (defaults to `/post`) This endpoint expects POST requests where the entire request body is consumed as a single message. If the request contains a multipart `content-type` header as per [RFC1341](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html) then the multiple parts are consumed as a batch of messages, where each body part is a message of the batch. ### [](#ws_path-defaults-to-postws)`ws_path` (defaults to `/post/ws`) Creates a websocket connection, where payloads received on the socket are passed through the pipeline as a batch of one message. > ⚠️ **CAUTION: Endpoint caveats** > > Endpoint caveats > > Components within a Redpanda Connect config will register their respective endpoints in a non-deterministic order. This means that establishing precedence of endpoints that are registered via multiple `http_server` inputs or outputs (either within brokers or from cohabiting streams) is not possible in a predictable way. > > This ambiguity makes it difficult to ensure that paths which are both a subset of a path registered by a separate component, and end in a slash (`/`) and will therefore match against all extensions of that path, do not prevent the more specific path from matching against requests. > > It is therefore recommended that you ensure paths of separate components do not collide unless they are explicitly non-competing. > > For example, if you were to deploy two separate `http_server` inputs, one with a path `/foo/` and the other with a path `/foo/bar`, it would not be possible to ensure that the path `/foo/` does not swallow requests made to `/foo/bar`. You may specify an optional `ws_welcome_message`, which is a static payload to be sent to all clients once a websocket connection is first established. It’s also possible to specify a `ws_rate_limit_message`, which is a static payload to be sent to clients that have triggered the servers rate limit. ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - http_server_user_agent - http_server_request_path - http_server_verb - http_server_remote_ip - All headers (only first values are taken) - All query parameters - All path parameters - All cookies ``` If HTTPS is enabled, the following fields are added as well: ```text - http_server_tls_version - http_server_tls_subject - http_server_tls_cipher_suite ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ### [](#headers)Headers Request headers are available as metadata and use the HTTP header name with no additional prefix as a key. During processing, Redpanda Connect changes the format of the header name, as in the following example: ```text x-api-key available as metadata("X-Api-Key") ``` ## [](#examples)Examples ### [](#path-switching)Path Switching This example shows an `http_server` input that captures all requests and processes them by switching on that path: ```yaml input: http_server: path: / allowed_verbs: [ GET, POST ] sync_response: headers: Content-Type: application/json processors: - switch: - check: '@http_server_request_path == "/foo"' processors: - mapping: | root.title = "You Got Fooed!" root.result = content().string().uppercase() - check: '@http_server_request_path == "/bar"' processors: - mapping: 'root.title = "Bar Is Slow"' - sleep: # Simulate a slow endpoint duration: 1s ``` ### [](#mock-oauth-2-0-server)Mock OAuth 2.0 Server This example shows an `http_server` input that mocks an OAuth 2.0 Client Credentials flow server at the endpoint `/oauth2_test`: ```yaml input: http_server: path: /oauth2_test allowed_verbs: [ GET, POST ] sync_response: headers: Content-Type: application/json processors: - log: message: "Received request" level: INFO fields_mapping: | root = @ root.body = content().string() - mapping: | root.access_token = "MTQ0NjJkZmQ5OTM2NDE1ZTZjNGZmZjI3" root.token_type = "Bearer" root.expires_in = 3600 - sync_response: {} - mapping: 'root = deleted()' ``` ## [](#fields)Fields ### [](#address)`address` An alternative address to host from. If left empty the service wide address is used. **Type**: `string` **Default**: `""` ### [](#allowed_verbs)`allowed_verbs[]` An array of verbs that are allowed for the `path` endpoint. **Type**: `array` **Default**: ```yaml - "POST" ``` ### [](#cert_file)`cert_file` Enable TLS by specifying a certificate and key file. Only valid with a custom `address`. **Type**: `string` **Default**: `""` ### [](#cors)`cors` Adds Cross-Origin Resource Sharing headers. Only valid with a custom `address`. **Type**: `object` ### [](#cors-allowed_origins)`cors.allowed_origins[]` An explicit list of origins that are allowed for CORS requests. **Type**: `array` **Default**: `[]` ### [](#cors-enabled)`cors.enabled` Whether to allow CORS requests. **Type**: `bool` **Default**: `false` ### [](#key_file)`key_file` Enable TLS by specifying a certificate and key file. Only valid with a custom `address`. **Type**: `string` **Default**: `""` ### [](#path)`path` The endpoint path to listen for POST requests. **Type**: `string` **Default**: `/post` ### [](#rate_limit)`rate_limit` An optional [rate limit](../../rate_limits/about/) to throttle requests by. **Type**: `string` **Default**: `""` ### [](#sync_response)`sync_response` Customize messages returned via [synchronous responses](../../../guides/sync_responses/). **Type**: `object` ### [](#sync_response-headers)`sync_response.headers` Specify headers to return with synchronous responses. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: ```yaml Content-Type: "application/octet-stream" ``` ### [](#sync_response-metadata_headers)`sync_response.metadata_headers` Specify criteria for which metadata values are added to the response as headers. **Type**: `object` ### [](#sync_response-metadata_headers-include_patterns)`sync_response.metadata_headers.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#sync_response-metadata_headers-include_prefixes)`sync_response.metadata_headers.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#sync_response-status)`sync_response.status` Specify the status code to return with synchronous responses. This is a string value, which allows you to customize it based on resulting payloads and their metadata. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `200` ```yaml # Examples: status: ${! json("status") } # --- status: ${! meta("status") } ``` ### [](#tcp)`tcp` TCP listener configuration for the HTTP server. Only valid with a custom `address`. **Type**: `object` ### [](#tcp-reuse_addr)`tcp.reuse_addr` Enable SO\_REUSEADDR, allowing binding to ports in TIME\_WAIT state. Useful for graceful restarts and config reloads where the server needs to rebind to the same port immediately after shutdown. **Type**: `bool` **Default**: `false` ### [](#tcp-reuse_port)`tcp.reuse_port` Enable SO\_REUSEPORT, allowing multiple sockets to bind to the same port for load balancing across multiple processes/threads. **Type**: `bool` **Default**: `false` ### [](#timeout)`timeout` Timeout for requests. If a consumed messages takes longer than this to be delivered the connection is closed, but the message may still be delivered. **Type**: `string` **Default**: `5s` ### [](#ws_path)`ws_path` The endpoint path to create websocket connections from. **Type**: `string` **Default**: `/post/ws` ### [](#ws_rate_limit_message)`ws_rate_limit_message` An optional message to delivery to websocket connections that are rate limited. **Type**: `string` **Default**: `""` ### [](#ws_welcome_message)`ws_welcome_message` An optional message to deliver to fresh websocket connections. **Type**: `string` **Default**: `""` --- # Page 102: inproc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/inproc.md --- # inproc --- title: inproc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/inproc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/inproc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/inproc.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/inproc/)[Output](/redpanda-cloud/develop/connect/components/outputs/inproc/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/inproc/ "View the Self-Managed version of this component") ```yml inputs: label: "" inproc: "" ``` Directly connect to an output within a Redpanda Connect process by referencing it by a chosen ID. It is possible to connect multiple inputs to the same inproc ID, resulting in messages dispatching in a round-robin fashion to connected inputs. However, only one output can assume an inproc ID, and will replace existing outputs if a collision occurs. --- # Page 103: kafka_franz **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/kafka_franz.md --- # kafka\_franz --- title: kafka_franz latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/kafka_franz page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/kafka_franz.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/kafka_franz.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/kafka_franz/)[Output](/redpanda-cloud/develop/connect/components/outputs/kafka_franz/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/kafka_franz/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../redpanda/) and [`redpanda` output](../../outputs/redpanda/) components. A Kafka input using the [Franz Kafka client library](https://github.com/twmb/franz-go). #### Common ```yml inputs: label: "" kafka_franz: seed_brokers: [] # No default (required) topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" kafka_franz: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) rack_id: "" instance_id: "" rebalance_timeout: 45s session_timeout: 1m heartbeat_interval: 3s start_offset: earliest fetch_max_bytes: 50MiB fetch_max_wait: 5s fetch_min_bytes: 1B fetch_max_partition_bytes: 1MiB transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) checkpoint_limit: 1024 commit_period: 5s multi_header: false batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) topic_lag_refresh_period: 5s auto_replay_nacks: true timely_nacks_maximum_wait: "" # No default (optional) ``` When you specify a consumer group in your configuration, this input consumes one or more topics and automatically balances the topic partitions across any other connected clients with the same consumer group. Otherwise, topics are consumed in their entirety or with explicit partitions. This input often out-performs the traditional `kafka` input and provides more useful logs and error messages. ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - kafka_key - kafka_topic - kafka_partition - kafka_offset - kafka_timestamp_ms - kafka_timestamp_unix - kafka_tombstone_message - All record headers ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay rejected messages (negative acknowledgements) at the output level. If the cause of rejections persists, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Configure a [batching policy](../../../configuration/batching/) that applies to individual topic partitions in order to batch messages together before flushing them for processing. Batching can be beneficial for performance as well as useful for windowed processing, and doing so this way preserves the ordering of topic partitions. **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages that are processed in parallel inside the same partition before back pressure is applied. When a message with a specific offset is delivered to the output, the offset is only committed when all messages of previous offsets have also been delivered. This behavior ensures at-least-once delivery guarantees. However, in the event of crashes or server faults, it also increases the likelihood of duplicates. To decrease this risk, reduce the `checkpoint_limit` value. **Type**: `int` **Default**: `1024` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#commit_period)`commit_period` The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. **Type**: `string` **Default**: `5s` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#consumer_group)`consumer_group` An optional consumer group. When you specify this value: - The partitions of any topics, specified in the `topics` field, are automatically distributed across consumers sharing a consumer group - Partition offsets are automatically committed and resumed under this name Consumer groups are not supported when you specify explicit partitions to consume from in the `topics` field. **Type**: `string` ### [](#fetch_max_bytes)`fetch_max_bytes` The maximum size of a message batch (in bytes) that a broker tries to send during a client fetch. If individual records exceed the `fetch_max_bytes` value, brokers will still send them. **Type**: `string` **Default**: `50MiB` ### [](#fetch_max_partition_bytes)`fetch_max_partition_bytes` The maximum number of bytes that are consumed from a single partition in a fetch request. This field is equivalent to the Java setting `fetch.max.partition.bytes`. If a single batch is larger than the `fetch_max_partition_bytes` value, the batch is still sent so that the client can make progress. **Type**: `string` **Default**: `1MiB` ### [](#fetch_max_wait)`fetch_max_wait` The maximum period of time a broker can wait for a fetch response to reach the required minimum number of bytes (`fetch_min_bytes`). **Type**: `string` **Default**: `5s` ### [](#fetch_min_bytes)`fetch_min_bytes` The minimum number of bytes that a broker tries to send during a fetch. This field is equivalent to the Java setting `fetch.min.bytes`. **Type**: `string` **Default**: `1B` ### [](#heartbeat_interval)`heartbeat_interval` When you specify a `consumer_group`, `heartbeat_interval` sets how frequently a consumer group member should send heartbeats to Apache Kafka. Apache Kafka uses heartbeats to make sure that a group member’s session is active. You must set `heartbeat_interval` to less than one-third of `session_timeout`. This field is equivalent to the Java `heartbeat.interval.ms` setting and accepts Go duration format strings such as `10s` or `2m`. **Type**: `string` **Default**: `3s` ### [](#instance_id)`instance_id` When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to define the group’s static membership, which can prevent unnecessary rebalances during reconnections. When you assign an instance ID, the client does not automatically leave the consumer group when it disconnects. To remove the client, you must use an external admin command on behalf of the instance ID. **Type**: `string` **Default**: `""` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#multi_header)`multi_header` Decode headers into lists to allow the handling of multiple values with the same key. **Type**: `bool` **Default**: `false` ### [](#rack_id)`rack_id` A rack specifies where the client is physically located, and changes fetch requests to consume from the closest replica as opposed to the leader replica. **Type**: `string` **Default**: `""` ### [](#rebalance_timeout)`rebalance_timeout` When you specify a [`consumer_group`](#consumer_group), `rebalance_timeout` sets a time limit for all consumer group members to complete their work and commit offsets after a rebalance has begun. The timeout excludes the time taken to detect a failed or late heartbeat, which indicates a rebalance is required. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `45s` ### [](#regexp_topics_exclude)`regexp_topics_exclude[]` A list of regular expression patterns for excluding topics when regex mode is enabled (using `regexp_topics_include` or the deprecated `regexp_topics` boolean). Topics matching any of these patterns will be excluded from consumption, even if they match include patterns. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so use `^` and `$` for exact matching. Exclude patterns are applied after include patterns, providing fine-grained control over topic selection. Example: `regexp_topics_exclude: ["^_", ".**-temp$", ".**-test.*"]` excludes topics starting with underscore, ending with `-temp`, or containing `-test`. **Type**: `array` ### [](#regexp_topics_include)`regexp_topics_include[]` A list of regular expression patterns for matching topics to consume from. When specified, the client will periodically refresh the list of matching topics based on the `metadata_max_age` interval. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so `logs_.` **matches `my-logs_events` and `logs_errors`. Use `^logs_.`**`$` to match only topics starting with `logs_`. This field enables regex mode (replacing the deprecated `regexp_topics` boolean) and cannot be used together with explicit `topics` lists. Use `regexp_topics_exclude` to filter out specific patterns from the matched topics. Example: `regexp_topics_include: ["events_.**", "logs_.**"]` consumes from all topics starting with `events_` or `logs_`. **Type**: `array` ```yaml # Examples: regexp_topics_include: - logs_.* - metrics_.* # --- regexp_topics_include: - "events_[0-9]+" ``` ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods or mechanisms of SASL authentication, which are attempted in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#session_timeout)`session_timeout` When you specify a `consumer_group`, `session_timeout` sets the maximum interval between heartbeats sent by a consumer group member to the broker. If a broker doesn’t receive a heartbeat from a group member before the timeout expires, it removes the member from the consumer group and initiates a rebalance. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1m` ### [](#start_offset)`start_offset` Specify the offset from which this input starts or restarts consuming messages. Restarts occur when the `OffsetOutOfRange` error is seen during a fetch. **Type**: `string` **Default**: `earliest` | Option | Summary | | --- | --- | | committed | Prevents consuming a partition in a group if the partition has no prior commits. Corresponds to Kafka’s auto.offset.reset=none option | | earliest | Start from the earliest offset. Corresponds to Kafka’s auto.offset.reset=earliest option. | | latest | Start from the latest offset. Corresponds to Kafka’s auto.offset.reset=latest option. | ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timely_nacks_maximum_wait)`timely_nacks_maximum_wait` EXPERIMENTAL: Specify a maximum period of time in which each message can be consumed and awaiting either acknowledgement or rejection before rejection is instead forced. This can be useful for avoiding situations where certain downstream components can result in blocked confirmation of delivery that exceeds SLAs. Accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic_lag_refresh_period)`topic_lag_refresh_period` The interval between refresh cycles. During each cycle, this input queries the Redpanda Connect server to calculate the topic lag minus the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `5s` ### [](#topics)`topics[]` A list of topics to consume from. Use commas to separate multiple topics in a single element. When a `consumer_group` is specified, partitions are automatically distributed across consumers of a topic. Otherwise, all partitions are consumed. Alternatively, you can specify explicit partitions to consume by using a colon after the topic name. For example, `foo:0` would consume the partition `0` of the topic foo. This syntax supports ranges. For example, `foo:0-10` would consume partitions `0` through to `10` inclusive. It is also possible to specify an explicit offset to consume from by adding another colon after the partition. For example, `foo:0:10` would consume the partition `0` of the topic `foo` starting from the offset `10`. If the offset is not present (or remains unspecified) then the field `start_offset` determines which offset to start from. **Type**: `array` ```yaml # Examples: topics: - foo - bar # --- topics: - things.* # --- topics: - "foo,bar" # --- topics: - "foo:0" - "bar:1" - "bar:3" # --- topics: - "foo:0,bar:1,bar:3" # --- topics: - "foo:0-5" ``` ### [](#transaction_isolation_level)`transaction_isolation_level` The isolation level for handling transactional messages. This setting determines how transactions are processed and affects data consistency guarantees. **Type**: `string` **Default**: `read_uncommitted` | Option | Summary | | --- | --- | | read_committed | If set, only committed transactional records are processed. | | read_uncommitted | If set, then uncommitted records are processed. | --- # Page 104: kafka **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/kafka.md --- # kafka --- title: kafka latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/kafka page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/kafka.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/kafka.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/kafka/)[Output](/redpanda-cloud/develop/connect/components/outputs/kafka/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/kafka/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../redpanda/) and [`redpanda` output](../../outputs/redpanda/) components. Connects to Kafka brokers and consumes one or more topics. #### Common ```yml inputs: label: "" kafka: addresses: [] # No default (required) topics: [] # No default (required) target_version: "" # No default (optional) consumer_group: "" checkpoint_limit: 1024 auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" kafka: addresses: [] # No default (required) topics: [] # No default (required) target_version: "" # No default (optional) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: mechanism: none user: "" password: "" access_token: "" token_cache: "" token_key: "" consumer_group: "" client_id: benthos instance_id: "" # No default (optional) rack_id: "" start_from_oldest: true checkpoint_limit: 1024 auto_replay_nacks: true timely_nacks_maximum_wait: "" # No default (optional) commit_period: 1s max_processing_period: 100ms extract_tracing_map: "" # No default (optional) group: session_timeout: 10s heartbeat_interval: 3s rebalance_timeout: 60s fetch_buffer_cap: 256 multi_header: false batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Offsets are managed within Kafka under the specified consumer group, and partitions for each topic are automatically balanced across members of the consumer group. The Kafka input allows parallel processing of messages from different topic partitions, and messages of the same topic partition are processed with a maximum parallelism determined by the field [`checkpoint_limit`](#checkpoint_limit). To enforce ordered processing of partition messages, set the [`checkpoint_limit`](#checkpoint_limit) to `1`, which makes sure that a message is only processed after the previous message is delivered. Batching messages before processing can be enabled using the [`batching`](#batching) field, and this batching is performed per-partition such that messages of a batch will always originate from the same partition. This batching mechanism is capable of creating batches of greater size than the [`checkpoint_limit`](#checkpoint_limit), in which case the next batch will only be created upon delivery of the current one. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - kafka\_key - kafka\_topic - kafka\_partition - kafka\_offset - kafka\_lag - kafka\_timestamp\_ms - kafka\_timestamp\_unix - kafka\_tombstone\_message - All existing message headers (version 0.11+) The field `kafka_lag` is the calculated difference between the high water mark offset of the partition at the time of ingestion and the current message offset. You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#ordering)Ordering By default messages of a topic partition can be processed in parallel, up to a limit determined by the field `checkpoint_limit`. However, if strict ordered processing is required then this value must be set to 1 in order to process shard messages in lock-step. When doing so it is recommended that you perform batching at this component for performance as it will not be possible to batch lock-stepped messages at the output level. ## [](#troubleshooting)Troubleshooting If you’re seeing issues writing to or reading from Kafka with this component then it’s worth trying out the newer [`kafka_franz` input](../kafka_franz/). - I’m seeing logs that report `Failed to connect to kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`, but the brokers are definitely reachable. Unfortunately this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double check your authentication configuration and also ensure that you have [enabled TLS](#tlsenabled) if applicable. ## [](#fields)Fields ### [](#addresses)`addresses[]` A list of broker addresses to connect to. If an item of the list contains commas it will be expanded into multiple addresses. **Type**: `array` ```yaml # Examples: addresses: - "localhost:9092" # --- addresses: - "localhost:9041,localhost:9042" # --- addresses: - "localhost:9041" - "localhost:9042" ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages of the same topic and partition that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level to work on individual partitions. Any given offset will not be committed unless all messages under that offset are delivered in order to preserve at least once delivery guarantees. **Type**: `int` **Default**: `1024` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `benthos` ### [](#commit_period)`commit_period` The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. **Type**: `string` **Default**: `1s` ### [](#consumer_group)`consumer_group` An identifier for the consumer group of the connection. This field can be explicitly made empty in order to disable stored offsets for the consumed topic partitions. **Type**: `string` **Default**: `""` ### [](#extract_tracing_map)`extract_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: extract_tracing_map: root = @ # --- extract_tracing_map: root = this.meta.span ``` ### [](#fetch_buffer_cap)`fetch_buffer_cap` The maximum number of unprocessed messages to fetch at a given time. **Type**: `int` **Default**: `256` ### [](#group)`group` Tuning parameters for consumer group synchronization. **Type**: `object` ### [](#group-heartbeat_interval)`group.heartbeat_interval` A period in which heartbeats should be sent out. **Type**: `string` **Default**: `3s` ### [](#group-rebalance_timeout)`group.rebalance_timeout` A period after which rebalancing is abandoned if unresolved. **Type**: `string` **Default**: `60s` ### [](#group-session_timeout)`group.session_timeout` A period after which a consumer of the group is kicked after no heartbeats. **Type**: `string` **Default**: `10s` ### [](#instance_id)`instance_id` When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to help brokers identify each input after restarts and prevent unnecessary rebalances. **Type**: `string` ### [](#max_processing_period)`max_processing_period` A maximum estimate for the time taken to process a message, this is used for tuning consumer group synchronization. **Type**: `string` **Default**: `100ms` ### [](#multi_header)`multi_header` Decode headers into lists to allow handling of multiple values with the same key **Type**: `bool` **Default**: `false` ### [](#rack_id)`rack_id` A rack identifier for this client. **Type**: `string` **Default**: `""` ### [](#sasl)`sasl` Enables SASL authentication. **Type**: `object` ### [](#sasl-access_token)`sasl.access_token` A static OAUTHBEARER access token **Type**: `string` **Default**: `""` ### [](#sasl-mechanism)`sasl.mechanism` The SASL authentication mechanism, if left empty SASL authentication is not used. **Type**: `string` **Default**: `none` | Option | Summary | | --- | --- | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. NOTE: When using plain text auth it is extremely likely that you’ll also need to enable TLS. | | SCRAM-SHA-256 | Authentication using the SCRAM-SHA-256 mechanism. | | SCRAM-SHA-512 | Authentication using the SCRAM-SHA-512 mechanism. | | none | Default, no SASL authentication. | ### [](#sasl-password)`sasl.password` A PLAIN password. It is recommended that you use environment variables to populate this field. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: ${PASSWORD} ``` ### [](#sasl-token_cache)`sasl.token_cache` Instead of using a static `access_token` allows you to query a [`cache`](../../caches/about/) resource to fetch OAUTHBEARER tokens from **Type**: `string` **Default**: `""` ### [](#sasl-token_key)`sasl.token_key` Required when using a `token_cache`, the key to query the cache with for tokens. **Type**: `string` **Default**: `""` ### [](#sasl-user)`sasl.user` A PLAIN username. It is recommended that you use environment variables to populate this field. **Type**: `string` **Default**: `""` ```yaml # Examples: user: ${USER} ``` ### [](#start_from_oldest)`start_from_oldest` Determines whether to consume from the oldest available offset, otherwise messages are consumed from the latest offset. The setting is applied when creating a new consumer group or the saved offset no longer exists. **Type**: `bool` **Default**: `true` ### [](#target_version)`target_version` The version of the Kafka protocol to use. This limits the capabilities used by the client and should ideally match the version of your brokers. Defaults to the oldest supported stable version. **Type**: `string` ```yaml # Examples: target_version: 2.1.0 # --- target_version: 3.1.0 ``` ### [](#timely_nacks_maximum_wait)`timely_nacks_maximum_wait` EXPERIMENTAL: Specify a maximum period of time in which each message can be consumed and awaiting either acknowledgement or rejection before rejection is instead forced. This can be useful for avoiding situations where certain downstream components can result in blocked confirmation of delivery that exceeds SLAs. Accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#topics)`topics[]` A list of topics to consume from. Multiple comma separated topics can be listed in a single element. Partitions are automatically distributed across consumers of a topic. Alternatively, it’s possible to specify explicit partitions to consume from with a colon after the topic name, e.g. `foo:0` would consume the partition 0 of the topic foo. This syntax supports ranges, e.g. `foo:0-10` would consume partitions 0 through to 10 inclusive. **Type**: `array` ```yaml # Examples: topics: - foo - bar # --- topics: - "foo,bar" # --- topics: - "foo:0" - "bar:1" - "bar:3" # --- topics: - "foo:0,bar:1,bar:3" # --- topics: - "foo:0-5" ``` --- # Page 105: microsoft_sql_server_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/microsoft_sql_server_cdc.md --- # microsoft\_sql\_server\_cdc --- title: microsoft_sql_server_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/microsoft_sql_server_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/microsoft_sql_server_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/microsoft_sql_server_cdc.adoc categories: "[Services]" description: Enables Change Data Capture by consuming from Microsoft SQL Server's change tables. page-git-created-date: "2025-10-24" page-git-modified-date: "2025-10-24" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/microsoft_sql_server_cdc/ "View the Self-Managed version of this component") Enables Change Data Capture by consuming from Microsoft SQL Server’s change tables. #### Common ```yaml inputs: label: "" microsoft_sql_server_cdc: connection_string: "" # No default (required) stream_snapshot: false max_parallel_snapshot_tables: 1 snapshot_max_batch_size: 1000 include: [] # No default (required) exclude: [] # No default (optional) checkpoint_cache: "" # No default (optional) checkpoint_cache_table_name: rpcn.CdcCheckpointCache checkpoint_cache_connection_string: "" # No default (optional) checkpoint_cache_key: microsoft_sql_server_cdc checkpoint_limit: 1024 stream_backoff_interval: 5s auto_replay_nacks: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yaml inputs: label: "" microsoft_sql_server_cdc: connection_string: "" # No default (required) stream_snapshot: false max_parallel_snapshot_tables: 1 snapshot_max_batch_size: 1000 include: [] # No default (required) exclude: [] # No default (optional) checkpoint_cache: "" # No default (optional) checkpoint_cache_table_name: rpcn.CdcCheckpointCache checkpoint_cache_connection_string: "" # No default (optional) checkpoint_cache_key: microsoft_sql_server_cdc checkpoint_limit: 1024 stream_backoff_interval: 5s auto_replay_nacks: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Streams changes from a Microsoft SQL Server database for Change Data Capture (CDC). Additionally, if `stream_snapshot` is set to true, then the existing data in the database is also streamed too. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - schema (Schema of the table that the message originated from) - table (Name of the table that the message originated from) - operation (Type of operation that generated the message: "read", "delete", "insert", or "update\_before" and "update\_after". "read" is from messages that are read in the initial snapshot phase.) - lsn (the Log Sequence Number in Microsoft SQL Server) ## [](#permissions)Permissions To use the default Microsoft SQL Server cache, the user must have permissions to create tables and stored procedures. Refer to [`checkpoint_cache_table_name`](#checkpoint_cache_table_name) for additional details. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_cache)`checkpoint_cache` A [cache resource](../../caches/about/) to store the current Log Sequence Number (LSN) position. This enables the connector to resume from the last processed position after restarts, preventing data loss and duplicate processing. The cache stores the highest LSN that has been successfully delivered downstream. **Type**: `string` ### [](#checkpoint_cache_connection_string)`checkpoint_cache_connection_string` An optional connection string for a remote Microsoft SQL Server to use for the checkpoint cache. When set, this creates the checkpoint cache table on the remote server instead of the source database. If `checkpoint_cache` is also set, that takes precedence. **Type**: `string` ```yaml # Examples: checkpoint_cache_connection_string: sqlserver://username:password@remotehost/instance?param1=value¶m2=value ``` ### [](#checkpoint_cache_key)`checkpoint_cache_key` The key to use to store the snapshot position in `checkpoint_cache`. An alternative key can be provided if multiple CDC inputs share the same cache. **Type**: `string` **Default**: `microsoft_sql_server_cdc` ### [](#checkpoint_cache_table_name)`checkpoint_cache_table_name` The multipart identifier for the checkpoint cache table name. If no `checkpoint_cache` field is specified, this input will automatically create a table and stored procedure under the `rpcn` schema to act as a checkpoint cache. This table stores the latest processed Log Sequence Number (LSN) that has been successfully delivered, allowing Redpanda Connect to resume from that point upon restart rather than reconsume the entire change table. **Type**: `string` **Default**: `rpcn.CdcCheckpointCache` ```yaml # Examples: checkpoint_cache_table_name: dbo.checkpoint_cache ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages that can be processed concurrently before applying back pressure. Higher values enable better parallelization and batching but increase memory usage. Messages are processed in LSN order, and a given LSN is only acknowledged after all previous LSNs have been successfully delivered, ensuring at-least-once guarantees. **Type**: `int` **Default**: `1024` ### [](#connection_string)`connection_string` The connection string for the Microsoft SQL Server database. Use the format `sqlserver://username:password@host/instance?param1=value¶m2=value`. For Windows Authentication, use `sqlserver://host/instance?trusted_connection=yes`. Include additional parameters like `TrustServerCertificate=true` for self-signed certificates or `encrypt=disable` to disable encryption. **Type**: `string` ```yaml # Examples: connection_string: sqlserver://username:password@host/instance?param1=value¶m2=value ``` ### [](#exclude)`exclude[]` Regular expressions for tables to exclude from CDC streaming. Use this to filter out specific tables from the include patterns. Table names should follow the `schema.table` format. Exclude patterns are applied after include patterns, allowing you to include broad patterns while excluding specific tables. **Type**: `array` ```yaml # Examples: exclude: dbo.privatetable ``` ### [](#include)`include[]` Regular expressions for tables to include in CDC streaming. Specify table names using the format `schema.table` (such as `dbo.orders`, `sales.customers`). Each pattern is treated as a regular expression, allowing wildcards and pattern matching. All specified tables must have CDC enabled in SQL Server. **Type**: `array` ```yaml # Examples: include: dbo.products ``` ### [](#max_parallel_snapshot_tables)`max_parallel_snapshot_tables` Specifies a number of tables that will be processed in parallel during the snapshot processing stage. **Type**: `int` **Default**: `1` ### [](#snapshot_max_batch_size)`snapshot_max_batch_size` The maximum number of rows to stream in a single batch during the initial snapshot phase. Larger batch sizes can improve throughput for initial data loads but may increase memory usage. This setting only applies when `stream_snapshot` is enabled. **Type**: `int` **Default**: `1000` ### [](#stream_backoff_interval)`stream_backoff_interval` The time interval to wait between polling attempts when no new CDC data is available. For low-traffic tables, increasing this value reduces database load and network traffic. Use Go duration format like `5s`, `30s`, or `1m`. Shorter intervals provide lower latency for new changes but increase server load. **Type**: `string` **Default**: `5s` ```yaml # Examples: stream_backoff_interval: 5s # --- stream_backoff_interval: 1m ``` ### [](#stream_snapshot)`stream_snapshot` Whether to stream a snapshot of all existing data before streaming CDC changes. When enabled, the connector first queries all existing table data, then switches to streaming incremental changes from the transaction log. Set to `false` to start streaming only new changes from the current LSN position. **Type**: `bool` **Default**: `false` ```yaml # Examples: stream_snapshot: true ``` --- # Page 106: mongodb_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/mongodb_cdc.md --- # mongodb\_cdc --- title: mongodb_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/mongodb_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/mongodb_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/mongodb_cdc.adoc page-git-created-date: "2025-03-11" page-git-modified-date: "2025-03-18" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/mongodb_cdc/ "View the Self-Managed version of this component") Streams data changes from a MongoDB replica set, using MongoDB’s [change streams](https://www.mongodb.com/docs/manual/changeStreams/) to capture data updates. #### Common ```yml inputs: label: "" mongodb_cdc: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collections: [] # No default (required) checkpoint_key: mongodb_cdc_checkpoint checkpoint_cache: "" # No default (required) checkpoint_interval: 5s checkpoint_limit: 1000 read_batch_size: 1000 read_max_wait: 1s stream_snapshot: false snapshot_parallelism: 1 auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" mongodb_cdc: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collections: [] # No default (required) checkpoint_key: mongodb_cdc_checkpoint checkpoint_cache: "" # No default (required) checkpoint_interval: 5s checkpoint_limit: 1000 read_batch_size: 1000 read_max_wait: 1s stream_snapshot: false snapshot_parallelism: 1 snapshot_auto_bucket_sharding: false document_mode: update_lookup json_marshal_mode: canonical app_name: benthos auto_replay_nacks: true ``` ## [](#prerequisites)Prerequisites - MongoDB version 6 or later - Network access from the cluster where your Redpanda Connect pipeline is running to the source database environment. For detailed networking information, including how to set up a VPC peering connection, see [Redpanda Cloud Networking](../../../../../networking/). - A MongoDB database running as a [replica set](https://www.mongodb.com/docs/manual/replication/#replication-in-mongodb) or in a [sharded cluster](https://www.mongodb.com/docs/manual/sharding/) using replica set [protocol version 1](https://www.mongodb.com/docs/manual/reference/replica-configuration/#rsconf.protocolVersion). - A MongoDB database using the [WiredTiger](https://www.mongodb.com/docs/manual/core/wiredtiger/#storage-wiredtiger) storage engine. ## [](#enable-connectivity-from-cloud-based-data-sources-byoc)Enable connectivity from cloud-based data sources (BYOC) To establish a secure connection between a cloud-based data source and Redpanda Connect, you must add the NAT Gateway IP address of your Redpanda cluster to the allowlist of your data source. ## [](#data-capture-method)Data capture method The `mongodb_cdc` input uses [change streams](https://www.mongodb.com/docs/manual/changeStreams/) to capture data changes, which does not propagate _all_ changes to Redpanda Connect. To capture all changes in a MongoDB cluster, including deletions, enable pre- and post-image saving for the cluster and [required collections](#collections). For more information, see [`document_mode` options](#document_mode) and the [MongoDB documentation](https://www.mongodb.com/docs/manual/changeStreams/#change-streams-with-document-pre—​and-post-images). ## [](#data-replication)Data replication Redpanda Connect allows you to specify which [database collections](#collections) in your source database to receive changes from. You can also run the `mongodb_cdc` input in one of two modes, depending on whether you need a snapshot of existing data before streaming updates. - Snapshot mode: Redpanda Connect first captures a snapshot of all data in the selected collections and streams the contents before processing changes from the last recorded [operations log (oplog)](https://www.mongodb.com/docs/manual/core/replica-set-oplog/) position. - Streaming mode: Redpanda Connect skips the snapshot and processes only the most recent data changes, starting from the latest oplog position. ### [](#snapshot-mode)Snapshot mode If you set the [`stream_snapshot` field](#stream_snapshot) to `true`, Redpanda Connect connects to your MongoDB database and does the following to capture a snapshot of all data in the selected collections: 1. Records the latest oplog position. 2. Determines the strategy for splitting the snapshot data down into shards or chunks for more efficient processing: 1. If [`snapshot_auto_bucket_sharding`](#snapshot_auto_bucket_sharding) is set to `false`, the internal `$splitVector` command is used to compute shards. 2. If [`snapshot_auto_bucket_sharding`](#snapshot_auto_bucket_sharding) is set to `true`, the [`$bucketAuto`](https://www.mongodb.com/docs/manual/reference/operator/aggregation/bucketAuto/) command is used instead. This setting is for environments, such as MongoDB Atlas, where the `$splitVector` command is not available. 3. This input then uses the number of connections specified in [`snapshot-parallelism`](#snapshot_parallelism) to read the selected collections. > 📝 **NOTE** > > If the pipeline restarts during this process, Redpanda Connect must start the snapshot capture from scratch to store the current oplog position in the [`checkpoint_cache`](#checkpoint_cache). 4. Finally, the input uses the stored oplog position to catch up with changes that occurred during snapshot processing. ### [](#streaming-mode)Streaming mode If you set the [`stream_snapshot` field](#stream_snapshot) to `false`, Redpanda Connect connects to your MongoDB database and starts processing data changes from the latest oplog position. If the pipeline restarts, Redpanda Connect resumes processing updates from the last oplog position written to the [`checkpoint_cache`](#checkpoint_cache). ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `operation`: The type of data change that generated the message: `read`, `create`, `update`, `replace`, `delete`, `update`. A `read` operation occurs when the initial snapshot of the database is processed. - `collection`: The name of the collection from which the message originated. - `operation_time`: The time the data change was written to the [operations log (oplog)](https://www.mongodb.com/docs/manual/core/replica-set-oplog/) in the form of a Binary JSON (BSON) timestamp: `{"t": , "i": }`. ## [](#fields)Fields ### [](#app_name)`app_name` The client application name. **Type**: `string` **Default**: `benthos` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay rejected messages (negative acknowledgements) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#checkpoint_cache)`checkpoint_cache` Specify a [`cache` resource](../../caches/about/) to store the oplog position for the most recent data update streamed to Redpanda Connect. After a restart, Redpanda Connect can continue processing changes from this position, avoiding the need to reprocess all collection updates. **Type**: `string` ### [](#checkpoint_interval)`checkpoint_interval` The interval between writing checkpoints to the cache. **Type**: `string` **Default**: `5s` ### [](#checkpoint_key)`checkpoint_key` The key identifier used to store the oplog position in [`checkpoint_cache`](#checkpoint_cache). If you have multiple `mongodb_cdc` inputs sharing the same cache, you can provide an alternative key. **Type**: `string` **Default**: `mongodb_cdc_checkpoint` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of in-flight messages emitted from this input. Increasing this limit enables parallel processing, and batching at the output level. To preserve at-least-once guarantees, any given oplog position is not acknowledged until all messages under that offset are delivered. **Type**: `int` **Default**: `1000` ### [](#collections)`collections[]` A list of collections to stream changes from. Specify each collection name as a separate item. **Type**: `array` ### [](#database)`database` The name of the MongoDB database to stream changes from. **Type**: `string` ### [](#document_mode)`document_mode` The mode in which MongoDB emits document changes to Redpanda Connect, specifically updates and deletes. **Type**: `string` **Default**: `update_lookup` | Option | Summary | | --- | --- | | partial_update | In this mode update operations only have a description of the update operation, which follows the following schema: { "_id": , "operations": [ # type == set means that the value was updated like so: # root.foo."bar.baz" = "world" {"path": ["foo", "bar.baz"], "type": "set", "value":"world"}, # type == unset means that the value was deleted like so: # root.qux = deleted() {"path": ["qux"], "type": "unset", "value": null}, # type == truncatedArray means that the array at that path was truncated to value number of elements # root.array = this.array.slice(2) {"path": ["array"], "type": "truncatedArray", "value": 2} ] } | | pre_and_post_images | Uses pre and post image collection to emit the full documents for update and delete operations. To use and configure this mode see the setup steps in the ^MongoDB documentation. | | update_lookup | In this mode insert, replace and update operations have the full document emitted and deletes only have the _id field populated. Documents updates lookup the full document. This corresponds to the updateLookup option, see the ^MongoDB documentation for more information. | ### [](#json_marshal_mode)`json_marshal_mode` Controls the format used to convert a message from BSON to JSON when it is received by Redpanda Connect. **Type**: `string` **Default**: `canonical` | Option | Summary | | --- | --- | | canonical | A string format that emphasizes type preservation at the expense of readability and interoperability. That is, conversion from canonical to BSON will generally preserve type information except in certain specific cases. | | relaxed | A string format that emphasizes readability and interoperability at the expense of type preservation.That is, conversion from relaxed format to BSON can lose type information. | ### [](#password)`password` The password to connect to the database. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#read_batch_size)`read_batch_size` The number of documents to fetch in each message batch from MongoDB. **Type**: `int` **Default**: `1000` ### [](#read_max_wait)`read_max_wait` The maximum duration MongoDB waits to accumulate the [`read_batch_size`](#read_batch_size) documents on a change stream before returning the batch to Redpanda Connect. **Type**: `string` **Default**: `1s` ### [](#snapshot_auto_bucket_sharding)`snapshot_auto_bucket_sharding` Uses the [`$bucketAuto`](https://www.mongodb.com/docs/manual/reference/operator/aggregation/bucketAuto/) command instead of the default, `$splitVector`, to split the snapshot data into chunks for processing. This is required for environments, such as MongoDB Atlas, where the `$splitVector` command is not available. To enable parallel processing in these environments: - Set this field to to `true`. - Set `stream_snapshot` to `true`. - Increase `snapshot_parallelism` to a value greater than `1`. **Type**: `bool` **Default**: `false` ### [](#snapshot_parallelism)`snapshot_parallelism` Specifies the number of connections to use when reading the initial snapshot from one or more collections. Increase this number to enable parallel processing of the snapshot. This feature uses the `$splitVector` command to split snapshot data into chunks for more efficient processing. This field is only applicable when `stream_snapshot` is set to `true`. **Type**: `int` **Default**: `1` ### [](#stream_snapshot)`stream_snapshot` When set to `true`, this input streams a snapshot of all existing data in the source collections before streaming data changes. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target MongoDB server. **Type**: `string` ```yaml # Examples: url: mongodb://localhost:27017 ``` ### [](#username)`username` The username to connect to the database. **Type**: `string` **Default**: `""` --- # Page 107: mongodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/mongodb.md --- # mongodb --- title: mongodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/mongodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/mongodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/mongodb.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/mongodb/)[Cache](/redpanda-cloud/develop/connect/components/caches/mongodb/)[Output](/redpanda-cloud/develop/connect/components/outputs/mongodb/)[Processor](/redpanda-cloud/develop/connect/components/processors/mongodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/mongodb/ "View the Self-Managed version of this component") Executes a query and creates a message for each document received. #### Common ```yml inputs: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collection: "" # No default (required) query: "" # No default (required) auto_replay_nacks: true batch_size: "" # No default (optional) sort: "" # No default (optional) limit: "" # No default (optional) ``` #### Advanced ```yml inputs: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" app_name: benthos collection: "" # No default (required) operation: find json_marshal_mode: canonical query: "" # No default (required) auto_replay_nacks: true batch_size: "" # No default (optional) sort: "" # No default (optional) limit: "" # No default (optional) ``` Once the documents from the query are exhausted, this input shuts down, allowing the pipeline to gracefully terminate (or the next input in a [sequence](../sequence/) to execute). ## [](#fields)Fields ### [](#app_name)`app_name` The client application name. **Type**: `string` **Default**: `benthos` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batch_size)`batch_size` A explicit number of documents to batch up before flushing them for processing. Must be greater than `0`. Operations: `find`, `aggregate` **Type**: `int` ```yaml # Examples: batch_size: 1000 ``` ### [](#collection)`collection` The collection to select from. **Type**: `string` ### [](#database)`database` The name of the target MongoDB database. **Type**: `string` ### [](#json_marshal_mode)`json_marshal_mode` The json\_marshal\_mode setting is optional and controls the format of the output message. **Type**: `string` **Default**: `canonical` | Option | Summary | | --- | --- | | canonical | A string format that emphasizes type preservation at the expense of readability and interoperability. That is, conversion from canonical to BSON will generally preserve type information except in certain specific cases. | | relaxed | A string format that emphasizes readability and interoperability at the expense of type preservation.That is, conversion from relaxed format to BSON can lose type information. | ### [](#limit)`limit` An explicit maximum number of documents to return. Operations: `find` **Type**: `int` ### [](#operation)`operation` The mongodb operation to perform. **Type**: `string` **Default**: `find` **Options**: `find`, `aggregate` ### [](#password)`password` The password to connect to the database. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#query)`query` Bloblang expression describing MongoDB query. **Type**: `string` ```yaml # Examples: query: |- root.from = {"$lte": timestamp_unix()} root.to = {"$gte": timestamp_unix()} ``` ### [](#sort)`sort` An object specifying fields to sort by, and the respective sort order (`1` ascending, `-1` descending). Note: The driver currently appears to support only one sorting key. Operations: `find` **Type**: `int` ```yaml # Examples: sort: name: 1 # --- sort: age: -1 ``` ### [](#url)`url` The URL of the target MongoDB server. **Type**: `string` ```yaml # Examples: url: mongodb://localhost:27017 ``` ### [](#username)`username` The username to connect to the database. **Type**: `string` **Default**: `""` --- # Page 108: mqtt **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/mqtt.md --- # mqtt --- title: mqtt latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/mqtt page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/mqtt.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/mqtt.adoc categories: "[\"Services\"]" page-git-created-date: "2024-11-07" page-git-modified-date: "2024-11-07" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/mqtt/)[Output](/redpanda-cloud/develop/connect/components/outputs/mqtt/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/mqtt/ "View the Self-Managed version of this component") Subscribe to topics on MQTT brokers. #### Common ```yml inputs: label: "" mqtt: urls: [] # No default (required) client_id: "" connect_timeout: 30s topics: [] # No default (required) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" mqtt: urls: [] # No default (required) client_id: "" dynamic_client_id_suffix: "" # No default (optional) connect_timeout: 30s will: enabled: false qos: 0 retained: false topic: "" payload: "" user: "" password: "" keepalive: 30 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] topics: [] # No default (required) qos: 1 clean_session: true auto_replay_nacks: true ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: - mqtt\_duplicate - mqtt\_qos - mqtt\_retained - mqtt\_topic - mqtt\_message\_id You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#clean_session)`clean_session` Set whether the connection is non-persistent. **Type**: `bool` **Default**: `true` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `""` ### [](#connect_timeout)`connect_timeout` The maximum amount of time to wait in order to establish a connection before the attempt is abandoned. **Type**: `string` **Default**: `30s` ```yaml # Examples: connect_timeout: 1s # --- connect_timeout: 500ms ``` ### [](#dynamic_client_id_suffix)`dynamic_client_id_suffix` Append a dynamically generated suffix to the specified `client_id` on each run of the pipeline. This can be useful when clustering Redpanda Connect producers. **Type**: `string` | Option | Summary | | --- | --- | | nanoid | append a nanoid of length 21 characters | ### [](#keepalive)`keepalive` Max seconds of inactivity before a keepalive message is sent. **Type**: `int` **Default**: `30` ### [](#password)`password` A password to connect with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#qos)`qos` The level of delivery guarantee to enforce. Has options 0, 1, 2. **Type**: `int` **Default**: `1` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#topics)`topics[]` A list of topics to consume from. **Type**: `array` ### [](#urls)`urls[]` A list of URLs to connect to. Use the format `scheme://host:port`, where: - `scheme` is one of the following: `tcp`, `ssl`, `ws` - `host` is the IP address or hostname - `port` is the port on which the MQTT broker accepts connections If an item in the list contains commas, it is expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "tcp://localhost:1883" ``` ### [](#user)`user` A username to connect with. **Type**: `string` **Default**: `""` ### [](#will)`will` Set last will message in case of Redpanda Connect failure **Type**: `object` ### [](#will-enabled)`will.enabled` Whether to enable last will messages. **Type**: `bool` **Default**: `false` ### [](#will-payload)`will.payload` Set payload for last will message. **Type**: `string` **Default**: `""` ### [](#will-qos)`will.qos` Set QoS for last will message. Valid values are: 0, 1, 2. **Type**: `int` **Default**: `0` ### [](#will-retained)`will.retained` Set retained for last will message. **Type**: `bool` **Default**: `false` ### [](#will-topic)`will.topic` Set topic for last will message. **Type**: `string` **Default**: `""` --- # Page 109: mysql_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/mysql_cdc.md --- # mysql\_cdc --- title: mysql_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/mysql_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/mysql_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/mysql_cdc.adoc page-git-created-date: "2025-02-20" page-git-modified-date: "2025-03-18" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/mysql_cdc/ "View the Self-Managed version of this component") Streams data changes from a MySQL database, using MySQL’s binary log to capture data updates. This input is built on the [`mysql-canal` library](https://github.com/go-mysql-org/go-mysql?tab=readme-ov-file#replication) but uses a custom approach for streaming historical data. #### Common ```yml inputs: label: "" mysql_cdc: flavor: mysql dsn: "" # No default (required) tables: [] # No default (required) checkpoint_cache: "" # No default (required) checkpoint_key: mysql_binlog_position snapshot_max_batch_size: 1000 stream_snapshot: "" # No default (required) auto_replay_nacks: true checkpoint_limit: 1024 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml inputs: label: "" mysql_cdc: flavor: mysql dsn: "" # No default (required) tables: [] # No default (required) checkpoint_cache: "" # No default (required) checkpoint_key: mysql_binlog_position snapshot_max_batch_size: 1000 max_reconnect_attempts: 10 stream_snapshot: "" # No default (required) auto_replay_nacks: true checkpoint_limit: 1024 tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] aws: enabled: false region: "" # No default (optional) endpoint: "" # No default (required) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) roles: [] # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` The `mysql_cdc` input uses MySQL’s [binary log (`binlog`)](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) to capture changes made to a MySQL database in real time and streams them to Redpanda Connect. Redpanda Connect allows you to specify which [database tables](#tables) in your source database to receive changes from. There are also [two replication modes](#choose-a-replication-mode) to choose from. ## [](#prerequisites)Prerequisites - MySQL version 8 or later - Network access from the cluster where your Redpanda Connect pipeline is running to the source database environment. For detailed networking information, including how to set up a VPC peering connection, see [Redpanda Cloud Networking](../../../../../networking/). - A MySQL instance with binary logging enabled ### [](#configuration-resources)Configuration resources #### Cloud platforms - [Change data capture on Amazon RDS for MySQL](https://aws.amazon.com/blogs/database/enable-change-data-capture-on-amazon-rds-for-mysql-applications-that-are-using-xa-transactions/) - [Azure MySQL Database (CDC)](https://learn.microsoft.com/en-us/fabric/real-time-hub/add-source-mysql-database-cdc) - [Google Cloud SQL for MySQL](https://cloud.google.com/datastream/docs/configure-cloudsql-mysql) #### Self-hosted MySQL - [Binary Logging Options and Variables](https://dev.mysql.com/doc/refman/8.4/en/replication-options-binary-log.html) ## [](#choose-a-replication-mode)Choose a replication mode You can run the `mysql_cdc` input in one of two modes, depending on whether you need a snapshot of existing data. - Snapshot mode: Redpanda Connect first captures a snapshot of all data in the selected tables and streams the contents before processing changes from the last recorded binlog position. - Streaming mode: Redpanda Connect skips the snapshot and processes only the most recent data changes, starting from the latest binlog position. ### [](#snapshot-mode)Snapshot mode If you set the [`stream_snapshot` field](#stream_snapshot) to `true`, Redpanda Connect connects to your MySQL database and does the following to capture a snapshot of all data in the selected tables: 1. Executes the `FLUSH TABLES WITH READ LOCK` query to write any outstanding table updates to disk, and locks the tables. 2. Runs the `START TRANSACTION WITH CONSISTENT SNAPSHOT` statement to create a new transaction with a consistent view of all data, capturing the state of the database at the moment the transaction started. 3. Reads the current binlog position. 4. Runs the `UNLOCK TABLES` statement to release the database. 5. Preserves the initial transaction for data integrity. > 📝 **NOTE** > > If the pipeline restarts during this process, Redpanda Connect must start the snapshot capture from scratch to store the current binlog position in the [`checkpoint_cache`](#checkpoint_cache). After the snapshot is taken, the input executes SELECT statements to extract data from the selected tables in two stages: 1. The input finds the primary keys of a table. 2. It selects the data ordered by primary key. Finally, the input uses the stored binlog position to catch up with changes that occurred during snapshot processing. ### [](#streaming-mode)Streaming mode If you set the [`stream_snapshot` field](#stream_snapshot) to `false`, Redpanda Connect connects to your MySQL database and starts processing data changes from the latest binlog position. If the pipeline restarts, Redpanda Connect resumes processing updates from the last binlog position written to the [`checkpoint_cache`](#checkpoint_cache). ## [](#binlog-rotation)Binlog rotation While the `mysql_cdc` input is streaming changes to Redpanda Connect, your MySQL server may rotate the binlog file. When this occurs, Redpanda Connect flushes the existing message batch and stores the new binlog position so that it can resume processing using the latest offset. ## [](#data-mappings)Data mappings The following table shows how selected MySQL data types are mapped to data types supported in Redpanda Connect. All other data types are mapped to string values. | MySQL data type | Bloblang value | | --- | --- | | TEXT, VARCHAR | A string value, for example: "this data" | | BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB | An array of byte values, for example: [byte1,byte2,byte3] | | DECIMAL, NUMERIC, TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, YEAR | A standard numeric type, for example: 123 | | FLOAT, DOUBLE | A 64-bit decimal (float64), for example: 123.1234 | | DATETIME, TIMESTAMP | A Bloblang timestamp, for example:1257894000000 2009-11-10 23:00:00 +0000 UTC | | SET | An array of strings, for example: ["apple", "banana", "orange"] | | JSON | A map object of the JSON, for example: {"red": 1, "blue": 2, "green": 3} | ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `operation`: The type of database operation that generated the message, such as `read`, `insert`, `update`, `delete`. A `read` operation occurs when a snapshot of the database is processed. - `table`: The name of the database table from which the message originated. - `binlog_position`: The [Binary Log (binlog)](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) position of each data update streamed from the source MySQL database. No `binlog_position` is set for data extracted from the initial snapshot. The `binlog` values are strings that you can sort to determine the order in which data updates occurred. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay rejected messages (negative acknowledgements) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#aws)`aws` AWS IAM authentication configuration for MySQL instances. When enabled, IAM credentials are used to generate temporary authentication tokens instead of a static password. **Type**: `object` ### [](#aws-enabled)`aws.enabled` Enable AWS IAM authentication for MySQL. When enabled, an IAM authentication token is generated and used as the password. When using IAM authentication ensure `max_reconnect_attempts` is set to a low value to ensure it can refresh credentials. **Type**: `bool` **Default**: `false` ### [](#aws-endpoint)`aws.endpoint` The MySQL endpoint hostname (e.g., mydb.abc123.us-east-1.rds.amazonaws.com). **Type**: `string` ### [](#aws-id)`aws.id` The ID of credentials to use. **Type**: `string` ### [](#aws-region)`aws.region` The AWS region where the MySQL instance is located. If no region is specified then the environment default will be used. **Type**: `string` ### [](#aws-role)`aws.role` Optional AWS IAM role ARN to assume for authentication. Alternatively, use `roles` array for role chaining instead. **Type**: `string` ### [](#aws-role_external_id)`aws.role_external_id` Optional external ID for the role assumption. Only used with the `role` field. Alternatively, use `roles` array for role chaining instead. **Type**: `string` ### [](#aws-roles)`aws.roles[]` Optional array of AWS IAM roles to assume for authentication. Roles can be assumed in sequence, enabling chaining for purposes such as cross-account access. Each role can optionally specify an external ID. **Type**: `object` ### [](#aws-roles-role)`aws.roles[].role` AWS IAM role ARN to assume. **Type**: `string` **Default**: `""` ### [](#aws-roles-role_external_id)`aws.roles[].role_external_id` Optional external ID for the role assumption. **Type**: `string` **Default**: `""` ### [](#aws-secret)`aws.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#aws-token)`aws.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_cache)`checkpoint_cache` Specify a `cache` resource to store the binlog position of the most recent data update delivered to Redpanda Connect. After a restart, Redpanda Connect can continue processing changes from this last known position, avoiding the need to reprocess all table updates. **Type**: `string` ### [](#checkpoint_key)`checkpoint_key` The key identifier used to store the binlog position in [`checkpoint_cache`](#checkpoint_cache). If you have multiple `mysql_cdc` inputs sharing the same cache, you can provide an alternative key. **Type**: `string` **Default**: `mysql_binlog_position` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages that this input can process at a given time. Increasing this limit enables parallel processing, and batching at the output level. To preserve at-least-once guarantees, any given binlog position is not acknowledged until all messages under that offset are delivered. **Type**: `int` **Default**: `1024` ### [](#dsn)`dsn` The data source name (DSN) of the MySQL database from which you want to stream updates. Use the format `user:password@tcp(localhost:3306)/database`. **Type**: `string` ```yaml # Examples: dsn: user:password@tcp(localhost:3306)/database ``` ### [](#flavor)`flavor` The type of MySQL database to connect to. **Type**: `string` **Default**: `mysql` | Option | Summary | | --- | --- | | mariadb | MariaDB flavored databases. | | mysql | MySQL flavored databases. | ### [](#max_reconnect_attempts)`max_reconnect_attempts` The maximum number of attempts the MySQL driver will try to re-establish a broken connection before Connect attempts reconnection. A zero or negative number means infinite retry attempts. **Type**: `int` **Default**: `10` ### [](#snapshot_max_batch_size)`snapshot_max_batch_size` The maximum number of table rows to fetch in each batch when taking a snapshot. This option is only available when `stream_snapshot` is set to `true`. **Type**: `int` **Default**: `1000` ### [](#stream_snapshot)`stream_snapshot` When set to `true`, this input streams a snapshot of all existing data in the source database before streaming data changes. To use this setting, all database tables that you want to replicate _must_ have a primary key. **Type**: `bool` ### [](#tables)`tables[]` A list of the database table names to stream changes from. Specify each table name as a separate item. **Type**: `array` ```yaml # Examples: tables: - table1 - table2 ``` ### [](#tls)`tls` Using this field overrides the SSL/TLS settings in the environment and DSN. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` --- # Page 110: nats_jetstream **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/nats_jetstream.md --- # nats\_jetstream --- title: nats_jetstream latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/nats_jetstream page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/nats_jetstream.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/nats_jetstream.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/nats_jetstream/)[Output](/redpanda-cloud/develop/connect/components/outputs/nats_jetstream/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/nats_jetstream/ "View the Self-Managed version of this component") Reads messages from NATS JetStream subjects. #### Common ```yml inputs: label: "" nats_jetstream: urls: [] # No default (required) queue: "" # No default (optional) subject: "" # No default (optional) durable: "" # No default (optional) stream: "" # No default (optional) bind: "" # No default (optional) deliver: all ``` #### Advanced ```yml inputs: label: "" nats_jetstream: urls: [] # No default (required) max_reconnects: "" # No default (optional) queue: "" # No default (optional) subject: "" # No default (optional) durable: "" # No default (optional) stream: "" # No default (optional) bind: "" # No default (optional) create_stream: false deliver: all ack_wait: 30s max_ack_pending: 1024 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) extract_tracing_map: "" # No default (optional) ``` ## [](#consume-mirrored-streams)Consume mirrored streams When a stream being consumed is mirrored in a different JetStream domain, the stream cannot be resolved from the subject name alone. You must specify the stream name as well as the subject (if applicable). ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - nats_subject - nats_sequence_stream - nats_sequence_consumer - nats_num_delivered - nats_num_pending - nats_domain - nats_timestamp_unix_nano ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#ack_wait)`ack_wait` The maximum amount of time NATS server should wait for an ack from consumer. **Type**: `string` **Default**: `30s` ```yaml # Examples: ack_wait: 100ms # --- ack_wait: 5m ``` ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#bind)`bind` Indicates that the subscription should use an existing consumer. **Type**: `bool` ### [](#create_stream)`create_stream` Whether to automatically create the stream if it doesn’t exist (requires the stream field to be set). **Type**: `bool` **Default**: `false` ### [](#deliver)`deliver` Determines which messages to deliver when consuming without a durable subscriber. **Type**: `string` **Default**: `all` | Option | Summary | | --- | --- | | all | Deliver all available messages. | | last | Deliver starting with the last published messages. | | last_per_subject | Deliver starting with the last published message per subject. | | new | Deliver starting from now, not taking into account any previous messages. | ### [](#durable)`durable` Preserve the state of your consumer under a durable name. **Type**: `string` ### [](#extract_tracing_map)`extract_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: extract_tracing_map: root = @ # --- extract_tracing_map: root = this.meta.span ``` ### [](#max_ack_pending)`max_ack_pending` The maximum number of outstanding acks to be allowed before consuming is halted. **Type**: `int` **Default**: `1024` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#queue)`queue` An optional queue group to consume as. **Type**: `string` ### [](#stream)`stream` A stream to consume from. Either a subject or stream must be specified. **Type**: `string` ### [](#subject)`subject` A subject to consume from. Supports wildcards for consuming multiple subjects. Either a subject or stream must be specified. **Type**: `string` ```yaml # Examples: subject: foo.bar.baz # --- subject: foo.*.baz # --- subject: foo.bar.* # --- subject: foo.> ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 111: nats_kv **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/nats_kv.md --- # nats\_kv --- title: nats_kv latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/nats_kv page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/nats_kv.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/nats_kv.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/nats_kv/)[Cache](/redpanda-cloud/develop/connect/components/caches/nats_kv/)[Output](/redpanda-cloud/develop/connect/components/outputs/nats_kv/)[Processor](/redpanda-cloud/develop/connect/components/processors/nats_kv/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/nats_kv/ "View the Self-Managed version of this component") Watches for updates in a NATS key-value bucket. #### Common ```yml inputs: label: "" nats_kv: urls: [] # No default (required) bucket: "" # No default (required) key: > auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" nats_kv: urls: [] # No default (required) max_reconnects: "" # No default (optional) bucket: "" # No default (required) key: > auto_replay_nacks: true ignore_deletes: false include_history: false meta_only: false tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - nats_kv_key - nats_kv_bucket - nats_kv_revision - nats_kv_delta - nats_kv_operation - nats_kv_created ``` ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#bucket)`bucket` The name of the KV bucket. **Type**: `string` ```yaml # Examples: bucket: my_kv_bucket ``` ### [](#ignore_deletes)`ignore_deletes` Do not send delete markers as messages. **Type**: `bool` **Default**: `false` ### [](#include_history)`include_history` Include all the history per key, not just the last one. **Type**: `bool` **Default**: `false` ### [](#key)`key` Key to watch for updates, can include wildcards. **Type**: `string` **Default**: `>` ```yaml # Examples: key: foo.bar.baz # --- key: foo.*.baz # --- key: foo.bar.* # --- key: foo.> ``` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#meta_only)`meta_only` Retrieve only the metadata of the entry **Type**: `bool` **Default**: `false` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 112: nats **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/nats.md --- # nats --- title: nats latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/nats page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/nats.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/nats.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/nats/)[Output](/redpanda-cloud/develop/connect/components/outputs/nats/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/nats/ "View the Self-Managed version of this component") Subscribe to a NATS subject. #### Common ```yml inputs: label: "" nats: urls: [] # No default (required) subject: "" # No default (required) queue: "" # No default (optional) auto_replay_nacks: true send_ack: true ``` #### Advanced ```yml inputs: label: "" nats: urls: [] # No default (required) max_reconnects: "" # No default (optional) subject: "" # No default (required) queue: "" # No default (optional) auto_replay_nacks: true send_ack: true nak_delay: "" # No default (optional) prefetch_count: 500000 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) extract_tracing_map: "" # No default (optional) ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - nats_subject - nats_reply_subject - All message headers (when supported by the connection) ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#extract_tracing_map)`extract_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: extract_tracing_map: root = @ # --- extract_tracing_map: root = this.meta.span ``` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#nak_delay)`nak_delay` An optional delay duration on redelivering a message when negatively acknowledged. **Type**: `string` ```yaml # Examples: nak_delay: 1m ``` ### [](#prefetch_count)`prefetch_count` The maximum number of messages to pull at a time. **Type**: `int` **Default**: `500000` ### [](#queue)`queue` An optional queue group to consume as. **Type**: `string` ### [](#send_ack)`send_ack` Whether an automatic acknowledgment is sent as a reply to each message. When enabled, these replies are sent only when data has been delivered to all outputs. **Type**: `bool` **Default**: `true` ### [](#subject)`subject` A subject to consume from. Supports wildcards for consuming multiple subjects. Either a subject or stream must be specified. **Type**: `string` ```yaml # Examples: subject: foo.bar.baz # --- subject: foo.*.baz # --- subject: foo.bar.* # --- subject: foo.> ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 113: oracledb_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/oracledb_cdc.md --- # oracledb\_cdc --- title: oracledb_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/oracledb_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/oracledb_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/oracledb_cdc.adoc categories: "[Services]" description: Enables Change Data Capture by consuming from OracleDB. page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/oracledb_cdc/ "View the Self-Managed version of this component") Enables Change Data Capture by consuming from OracleDB. Streams changes from an Oracle database for Change Data Capture (CDC). Additionally, if `stream_snapshot` is set to true, existing data in the database is also streamed. #### Common ```yml inputs: label: "" oracledb_cdc: connection_string: "" # No default (required) wallet_path: "" # No default (optional) wallet_password: "" # No default (optional) stream_snapshot: false max_parallel_snapshot_tables: 1 snapshot_max_batch_size: 1000 logminer: scn_window_size: 20000 backoff_interval: 5s mining_interval: 300ms strategy: online_catalog max_transaction_events: 0 lob_enabled: true include: [] # No default (required) exclude: [] # No default (optional) checkpoint_cache: "" # No default (optional) checkpoint_cache_table_name: RPCN.CDC_CHECKPOINT_CACHE checkpoint_cache_key: oracledb_cdc checkpoint_limit: 1024 auto_replay_nacks: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml inputs: label: "" oracledb_cdc: connection_string: "" # No default (required) wallet_path: "" # No default (optional) wallet_password: "" # No default (optional) stream_snapshot: false max_parallel_snapshot_tables: 1 snapshot_max_batch_size: 1000 logminer: scn_window_size: 20000 backoff_interval: 5s mining_interval: 300ms strategy: online_catalog max_transaction_events: 0 lob_enabled: true include: [] # No default (required) exclude: [] # No default (optional) checkpoint_cache: "" # No default (optional) checkpoint_cache_table_name: RPCN.CDC_CHECKPOINT_CACHE checkpoint_cache_key: oracledb_cdc checkpoint_limit: 1024 auto_replay_nacks: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: - database\_schema: The database schema for the table where the message originates from. - table\_name: Name of the table that the message originated from. - operation: Type of operation that generated the message: "read", "delete", "insert", or "update". "read" is from messages that are read in the initial snapshot phase. - scn: The System Change Number in Oracle. - schema: The table schema, for use with schema-aware downstream processors such as `schema_registry_encode`. When new columns are detected in CDC events, the schema is automatically refreshed from the Oracle catalog. Dropped columns are reflected after a connector restart. ## [](#permissions)Permissions When using the default Oracle-based cache, the Connect user requires permission to create tables and stored procedures, and the rpcn schema must already exist. See `checkpoint_cache_table_name` for more information. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_cache)`checkpoint_cache` A [cache resource](../../caches/about/) to use for storing the current System Change Number (SCN) that has been successfully delivered. This allows Redpanda Connect to continue from that SCN upon restart, rather than consume the entire state of OracleDB redo logs. If not set, the default Oracle-based cache is used. See `checkpoint_cache_table_name` for more information. **Type**: `string` ### [](#checkpoint_cache_key)`checkpoint_cache_key` The key to use to store the snapshot position in `checkpoint_cache`. An alternative key can be provided if multiple CDC inputs share the same cache. **Type**: `string` **Default**: `oracledb_cdc` ### [](#checkpoint_cache_table_name)`checkpoint_cache_table_name` The identifier for the checkpoint cache table name. If no `checkpoint_cache` field is specified, this input will automatically create a table and stored procedure under the `rpcn` schema to act as a checkpoint cache. This table stores the latest processed System Change Number (SCN) that has been successfully delivered, allowing Redpanda Connect to resume from that point upon restart rather than reconsume the entire redo log. **Type**: `string` **Default**: `RPCN.CDC_CHECKPOINT_CACHE` ```yaml # Examples: checkpoint_cache_table_name: RPCN.CHECKPOINT_CACHE ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level. Any given System Change Number (SCN) will not be acknowledged unless all messages under that offset are delivered in order to preserve at least once delivery guarantees. **Type**: `int` **Default**: `1024` ### [](#connection_string)`connection_string` The connection string of the Oracle database to connect to. You can supply additional connection options as URL query parameters, for example: `oracle://user:password@host:1522/service?WALLET=/opt/oracle/wallet&SSL=true`. **Type**: `string` ```yaml # Examples: connection_string: oracle://username:password@host:port/service_name # --- connection_string: oracle://user:password@host:1522/service?WALLET=/opt/oracle/wallet&SSL=true ``` ### [](#exclude)`exclude[]` Regular expressions for tables to exclude. **Type**: `array` ```yaml # Examples: exclude: SCHEMA.PRIVATETABLE ``` ### [](#include)`include[]` Regular expressions for tables to include. **Type**: `array` ```yaml # Examples: include: SCHEMA.PRODUCTS ``` ### [](#logminer)`logminer` LogMiner configuration settings. **Type**: `object` ### [](#logminer-backoff_interval)`logminer.backoff_interval` The interval between attempts to check for new changes once all data is processed. For low traffic tables increasing this value can reduce network traffic to the server. **Type**: `string` **Default**: `5s` ```yaml # Examples: backoff_interval: 5s # --- backoff_interval: 1m ``` ### [](#logminer-lob_enabled)`logminer.lob_enabled` When enabled, large object (CLOB, BLOB) columns are included in both snapshot and streaming change events. When disabled, these columns are still present but contain no values. Enabling this option introduces additional performance overhead and increases memory requirements. **Type**: `bool` **Default**: `true` ### [](#logminer-max_transaction_events)`logminer.max_transaction_events` The maximum number of events that can be buffered for a single transaction. If a transaction exceeds this limit it is discarded and its events will not be emitted. Set to 0 to disable the limit. **Type**: `int` **Default**: `0` ### [](#logminer-mining_interval)`logminer.mining_interval` The interval between mining cycles during normal operation. Controls how frequently LogMiner polls for new changes when not caught up. **Type**: `string` **Default**: `300ms` ```yaml # Examples: mining_interval: 100ms # --- mining_interval: 1s ``` ### [](#logminer-scn_window_size)`logminer.scn_window_size` The SCN range to mine per cycle. Each cycle reads changes between the current SCN and current SCN + scn\_window\_size. Smaller values mean more frequent queries with lower memory usage but higher overhead; larger values reduce query frequency and improve throughput at the cost of higher memory usage per cycle. **Type**: `int` **Default**: `20000` ### [](#logminer-strategy)`logminer.strategy` Controls how LogMiner retrieves data dictionary information. `online_catalog` uses the current data dictionary for best performance but cannot capture DDL changes. Currently, only `online_catalog` is supported. **Type**: `string` **Default**: `online_catalog` ### [](#max_parallel_snapshot_tables)`max_parallel_snapshot_tables` Specifies a number of tables that will be processed in parallel during the snapshot processing stage. **Type**: `int` **Default**: `1` ### [](#snapshot_max_batch_size)`snapshot_max_batch_size` The maximum number of rows to be streamed in a single batch when taking a snapshot. **Type**: `int` **Default**: `1000` ### [](#stream_snapshot)`stream_snapshot` If set to true, the connector will query all the existing data as a part of snapshot process. Otherwise, it will start from the current System Change Number position. **Type**: `bool` **Default**: `false` ```yaml # Examples: stream_snapshot: true ``` ### [](#wallet_password)`wallet_password` Password for the `ewallet.p12` PKCS#12 wallet file. Only use this when the wallet directory contains `ewallet.p12` rather than `cwallet.sso`. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#wallet_path)`wallet_path` Path to the Oracle Wallet directory. When set, this automatically enables SSL. The directory must contain either `cwallet.sso` (auto-login, does not require a password) or `ewallet.p12` (requires `wallet_password`). **Type**: `string` ```yaml # Examples: wallet_path: /opt/oracle/wallet ``` --- # Page 114: otlp_grpc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/otlp_grpc.md --- # otlp\_grpc --- title: otlp_grpc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/otlp_grpc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/otlp_grpc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/otlp_grpc.adoc page-git-created-date: "2026-01-23" page-git-modified-date: "2026-01-23" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/otlp_grpc/)[Output](/redpanda-cloud/develop/connect/components/outputs/otlp_grpc/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/otlp_grpc/ "View the Self-Managed version of this component") Receive OpenTelemetry traces, logs, and metrics via OTLP/gRPC protocol. Exposes an OpenTelemetry Collector gRPC receiver that accepts traces, logs, and metrics via gRPC. Telemetry data is received in OTLP protobuf format and converted to individual Redpanda OTEL v1 protobuf messages. Each signal (span, log record, or metric) becomes a separate message with embedded Resource and Scope metadata, optimized for Kafka partitioning. #### Common ```yml inputs: label: "" otlp_grpc: encoding: json address: 0.0.0.0:4317 rate_limit: "" ``` #### Advanced ```yml inputs: label: "" otlp_grpc: encoding: json address: 0.0.0.0:4317 tls: enabled: false cert_file: "" key_file: "" auth_token: "" max_recv_msg_size: 4194304 rate_limit: "" tcp: reuse_addr: false reuse_port: false schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} common_subject: "" trace_subject: "" log_subject: "" metric_subject: "" ``` ## [](#protocols)Protocols This input supports OTLP/gRPC on the default port 4317 using the standard OTLP protobuf format for all signal types (traces, logs, metrics). ## [](#output-format)Output format Each OTLP export request is unbatched into individual messages: - **Traces**: One message per span - **Logs**: One message per log record - **Metrics**: One message per metric Messages are encoded in Redpanda OTEL v1 protobuf format. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `signal_type` - The signal type: "trace", "log", or "metric" You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#authentication)Authentication When `auth_token` is configured, clients must include the token in the gRPC metadata. ### [](#go-client-example)Go client example ```go import ( "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" ) exporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithEndpoint("localhost:4317"), otlptracegrpc.WithInsecure(), // or WithTLSCredentials() for TLS otlptracegrpc.WithHeaders(map[string]string{ "authorization": "Bearer your-token-here", }), ) ``` ### [](#environment-variable)Environment variable ```bash export OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer your-token-here" ``` ## [](#rate-limiting)Rate limiting An optional rate limit resource can be specified to throttle incoming requests. When the rate limit is breached, requests will receive a ResourceExhausted gRPC status code. ## [](#fields)Fields ### [](#address)`address` The address to listen on for gRPC connections. **Type**: `string` **Default**: `0.0.0.0:4317` ### [](#auth_token)`auth_token` Optional bearer token for authentication. When set, requests must include 'authorization: Bearer ' metadata. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#encoding)`encoding` Encoding format for messages in the batch. Options: 'protobuf' or 'json'. **Type**: `string` **Default**: `json` **Options**: `protobuf`, `json` ### [](#max_recv_msg_size)`max_recv_msg_size` Maximum size of gRPC messages to receive in bytes. **Type**: `int` **Default**: `4194304` ### [](#rate_limit)`rate_limit` An optional rate limit resource to throttle requests. **Type**: `string` **Default**: `""` ### [](#schema_registry)`schema_registry` Optional Schema Registry configuration for adding Schema Registry wire format headers to messages. **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#schema_registry-common_subject)`schema_registry.common_subject` Schema subject name for the common protobuf schema. Only used when encoding is 'protobuf'. Defaults to 'redpanda-otel-common' for protobuf encoding or 'redpanda-otel-common-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#schema_registry-log_subject)`schema_registry.log_subject` Schema subject name for log data. Defaults to 'redpanda-otel-logs' for protobuf encoding or 'redpanda-otel-logs-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-metric_subject)`schema_registry.metric_subject` Schema subject name for metric data. Defaults to 'redpanda-otel-metrics' for protobuf encoding or 'redpanda-otel-metrics-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth)`schema_registry.oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2)`schema_registry.oauth2` Allows you to specify open authentication via OAuth version 2 using the client credentials token flow. **Type**: `object` ### [](#schema_registry-oauth2-client_key)`schema_registry.oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-client_secret)`schema_registry.oauth2.client_secret` A secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-enabled)`schema_registry.oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2-endpoint_params)`schema_registry.oauth2.endpoint_params` A list of optional endpoint parameters, values should be arrays of strings. **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: audience: - https://example.com resource: - https://api.example.com ``` ### [](#schema_registry-oauth2-scopes)`schema_registry.oauth2.scopes[]` A list of optional requested permissions. **Type**: `array` **Default**: `[]` ### [](#schema_registry-oauth2-token_url)`schema_registry.oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-timeout)`schema_registry.timeout` HTTP client timeout for Schema Registry requests. **Type**: `string` **Default**: `5s` ### [](#schema_registry-tls)`schema_registry.tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-enabled)`schema_registry.tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#schema_registry-trace_subject)`schema_registry.trace_subject` Schema subject name for trace data. Defaults to 'redpanda-otel-traces' for protobuf encoding or 'redpanda-otel-traces-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-url)`schema_registry.url` Schema Registry URL for schema operations. **Type**: `string` ```yaml # Examples: url: http://localhost:8081 ``` ### [](#tcp)`tcp` TCP listener socket configuration. **Type**: `object` ### [](#tcp-reuse_addr)`tcp.reuse_addr` Enable SO\_REUSEADDR, allowing binding to ports in TIME\_WAIT state. Useful for graceful restarts and config reloads where the server needs to rebind to the same port immediately after shutdown. **Type**: `bool` **Default**: `false` ### [](#tcp-reuse_port)`tcp.reuse_port` Enable SO\_REUSEPORT, allowing multiple sockets to bind to the same port for load balancing across multiple processes/threads. **Type**: `bool` **Default**: `false` ### [](#tls)`tls` TLS configuration for gRPC. **Type**: `object` ### [](#tls-cert_file)`tls.cert_file` Path to the TLS certificate file. **Type**: `string` **Default**: `""` ### [](#tls-enabled)`tls.enabled` Enable TLS connections. **Type**: `bool` **Default**: `false` ### [](#tls-key_file)`tls.key_file` Path to the TLS key file. **Type**: `string` **Default**: `""` --- # Page 115: otlp_http **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/otlp_http.md --- # otlp\_http --- title: otlp_http latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/otlp_http page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/otlp_http.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/otlp_http.adoc page-git-created-date: "2026-01-23" page-git-modified-date: "2026-01-23" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/otlp_http/)[Output](/redpanda-cloud/develop/connect/components/outputs/otlp_http/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/otlp_http/ "View the Self-Managed version of this component") Receive OpenTelemetry traces, logs, and metrics via OTLP/HTTP protocol. Exposes an OpenTelemetry Collector HTTP receiver that accepts traces, logs, and metrics via HTTP. Telemetry data is received in OTLP format (both protobuf and JSON) at standard OTLP endpoints and converted to individual Redpanda OTEL v1 protobuf messages. Each signal (span, log record, or metric) becomes a separate message with embedded Resource and Scope metadata, optimized for Kafka partitioning. #### Common ```yml inputs: label: "" otlp_http: encoding: json address: 0.0.0.0:4318 rate_limit: "" ``` #### Advanced ```yml inputs: label: "" otlp_http: encoding: json address: 0.0.0.0:4318 tls: enabled: false cert_file: "" key_file: "" auth_token: "" read_timeout: 10s write_timeout: 10s max_body_size: 4194304 rate_limit: "" tcp: reuse_addr: false reuse_port: false schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} common_subject: "" trace_subject: "" log_subject: "" metric_subject: "" ``` ## [](#endpoints)Endpoints This input exposes the following standard OTLP HTTP endpoints: - `/v1/traces` - OpenTelemetry traces - `/v1/logs` - OpenTelemetry logs - `/v1/metrics` - OpenTelemetry metrics ## [](#protocols)Protocols This input supports OTLP/HTTP on the default port 4318. It accepts both: - `application/x-protobuf` - OTLP protobuf format - `application/json` - OTLP JSON format ## [](#output-format)Output format Each OTLP export request is unbatched into individual messages: - **Traces**: One message per span - **Logs**: One message per log record - **Metrics**: One message per metric Messages are encoded in Redpanda OTEL v1 protobuf format. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `signal_type` - The signal type: "trace", "log", or "metric" You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#authentication)Authentication When `auth_token` is configured, clients must include the token in the HTTP Authorization header. ### [](#go-client-example)Go client example ```go import ( "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" ) exporter, err := otlptracehttp.New(ctx, otlptracehttp.WithEndpoint("localhost:4318"), otlptracehttp.WithInsecure(), // or WithTLSClientConfig() for TLS otlptracehttp.WithHeaders(map[string]string{ "Authorization": "Bearer your-token-here", }), ) ``` ### [](#curl-example)cURL example ```bash curl -X POST http://localhost:4318/v1/traces \ -H "Content-Type: application/x-protobuf" \ -H "Authorization: Bearer your-token-here" \ --data-binary @traces.pb ``` ### [](#environment-variable)Environment variable ```bash export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token-here" ``` ## [](#rate-limiting)Rate limiting An optional rate limit resource can be specified to throttle incoming requests. When the rate limit is breached, requests will receive a 429 (Too Many Requests) response. ## [](#fields)Fields ### [](#address)`address` The address to listen on for HTTP connections. **Type**: `string` **Default**: `0.0.0.0:4318` ### [](#auth_token)`auth_token` Optional bearer token for authentication. When set, requests must include 'Authorization: Bearer ' header. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#encoding)`encoding` Encoding format for messages in the batch. Options: 'protobuf' or 'json'. **Type**: `string` **Default**: `json` **Options**: `protobuf`, `json` ### [](#max_body_size)`max_body_size` Maximum size of HTTP request body in bytes. **Type**: `int` **Default**: `4194304` ### [](#rate_limit)`rate_limit` An optional rate limit resource to throttle requests. **Type**: `string` **Default**: `""` ### [](#read_timeout)`read_timeout` Maximum duration for reading the entire request. **Type**: `string` **Default**: `10s` ### [](#schema_registry)`schema_registry` Optional Schema Registry configuration for adding Schema Registry wire format headers to messages. **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#schema_registry-common_subject)`schema_registry.common_subject` Schema subject name for the common protobuf schema. Only used when encoding is 'protobuf'. Defaults to 'redpanda-otel-common' for protobuf encoding or 'redpanda-otel-common-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#schema_registry-log_subject)`schema_registry.log_subject` Schema subject name for log data. Defaults to 'redpanda-otel-logs' for protobuf encoding or 'redpanda-otel-logs-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-metric_subject)`schema_registry.metric_subject` Schema subject name for metric data. Defaults to 'redpanda-otel-metrics' for protobuf encoding or 'redpanda-otel-metrics-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth)`schema_registry.oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2)`schema_registry.oauth2` Allows you to specify open authentication via OAuth version 2 using the client credentials token flow. **Type**: `object` ### [](#schema_registry-oauth2-client_key)`schema_registry.oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-client_secret)`schema_registry.oauth2.client_secret` A secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-enabled)`schema_registry.oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2-endpoint_params)`schema_registry.oauth2.endpoint_params` A list of optional endpoint parameters, values should be arrays of strings. **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: audience: - https://example.com resource: - https://api.example.com ``` ### [](#schema_registry-oauth2-scopes)`schema_registry.oauth2.scopes[]` A list of optional requested permissions. **Type**: `array` **Default**: `[]` ### [](#schema_registry-oauth2-token_url)`schema_registry.oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-timeout)`schema_registry.timeout` HTTP client timeout for Schema Registry requests. **Type**: `string` **Default**: `5s` ### [](#schema_registry-tls)`schema_registry.tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-enabled)`schema_registry.tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#schema_registry-trace_subject)`schema_registry.trace_subject` Schema subject name for trace data. Defaults to 'redpanda-otel-traces' for protobuf encoding or 'redpanda-otel-traces-json' for JSON encoding. **Type**: `string` **Default**: `""` ### [](#schema_registry-url)`schema_registry.url` Schema Registry URL for schema operations. **Type**: `string` ```yaml # Examples: url: http://localhost:8081 ``` ### [](#tcp)`tcp` TCP listener socket configuration. **Type**: `object` ### [](#tcp-reuse_addr)`tcp.reuse_addr` Enable SO\_REUSEADDR, allowing binding to ports in TIME\_WAIT state. Useful for graceful restarts and config reloads where the server needs to rebind to the same port immediately after shutdown. **Type**: `bool` **Default**: `false` ### [](#tcp-reuse_port)`tcp.reuse_port` Enable SO\_REUSEPORT, allowing multiple sockets to bind to the same port for load balancing across multiple processes/threads. **Type**: `bool` **Default**: `false` ### [](#tls)`tls` TLS configuration for HTTP. **Type**: `object` ### [](#tls-cert_file)`tls.cert_file` Path to the TLS certificate file. **Type**: `string` **Default**: `""` ### [](#tls-enabled)`tls.enabled` Enable TLS connections. **Type**: `bool` **Default**: `false` ### [](#tls-key_file)`tls.key_file` Path to the TLS key file. **Type**: `string` **Default**: `""` ### [](#write_timeout)`write_timeout` Maximum duration for writing the response. **Type**: `string` **Default**: `10s` --- # Page 116: postgres_cdc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/postgres_cdc.md --- # postgres\_cdc --- title: postgres_cdc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/postgres_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/postgres_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/postgres_cdc.adoc page-git-created-date: "2024-12-05" page-git-modified-date: "2025-03-20" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/postgres_cdc/ "View the Self-Managed version of this component") Streams data changes from a PostgreSQL database using logical replication. There is also a configuration option to [stream all existing data](#stream_snapshot) from the database. ```yml inputs: label: "" postgres_cdc: dsn: "" # No default (required) include_transaction_markers: false stream_snapshot: false snapshot_batch_size: 1000 schema: "" # No default (required) tables: [] # No default (required) checkpoint_limit: 1024 temporary_slot: false slot_name: "" # No default (required) pg_standby_timeout: 10s pg_wal_monitor_interval: 3s max_parallel_snapshot_tables: 1 auto_replay_nacks: true batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` The `postgres_cdc` input uses logical replication to capture changes made to a PostgreSQL database in real time and streams them to Redpanda Connect. Redpanda Connect uses this replication method to allow you to choose which database tables in your source database to receive changes from. There are also [two replication modes](#choose-a-replication-mode) to choose from, and an [option to receive TOAST and deleted values](#receive-toast-and-deleted-values) in your data updates. ## [](#prerequisites)Prerequisites - PostgreSQL version 14 or later - Network access from the cluster where your Redpanda Connect pipeline is running to the source database environment. For detailed networking information, including how to set up a VPC peering connection, see [Redpanda Cloud Networking](../../../../../networking/). - Logical replication enabled on your PostgreSQL cluster To check whether logical replication is already enabled, run the following query: ```SQL SHOW wal_level; ``` If the `wal_level` value is `logical`, you can start to use this connector. Otherwise, choose from the following sets of instructions to update your replication settings. ### Cloud platforms - [Amazon RDS for PostgreSQL DB](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Concepts.General.FeatureSupport.LogicalReplication.html) - [Azure Database for PostgreSQL](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-logical#prerequisites-for-logical-replication-and-logical-decoding) - [Google Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication), including creating a user with replication privileges - [Neon](https://neon.tech/docs/guides/logical-replication-guide) ### Self-Hosted PostgreSQL Use an account with sufficient permissions (superuser) to update your replication settings. 1. Open the `postgresql.conf` file. 2. Find the `wal_level` parameter. 3. Update the parameter value to `wal_level = logical`. If you already use replication slots, you may need to increase the limit on replication slots (`max_replication_slots`). The `max_wal_senders` parameter value must also be greater than or equal to `max_replication_slots`. 4. Restart the PostgreSQL server. For this input to make a successful connection to your database, also make sure that it allows replication connections. 1. Open the `pg_hba.conf` file. 2. Update this line. ```yaml host replication /32 md5 ``` Replace the following placeholders with your own values: - ``: The username from an account with superuser privileges. - ``: The IP address of the server where you are running Redpanda Connect. 3. Restart the PostgreSQL server. ## [](#choose-a-replication-mode)Choose a replication mode When you run a pipeline that uses the `postgres_cdc` input, Redpanda Connect connects to your PostgreSQL database and creates a replication slot. The replication slot uses a copy of the Write-Ahead Log (WAL) file to subscribe to changes in your database records as they are applied to the database. There are two replication modes you can choose from: snapshot mode and streaming mode. In snapshot mode, Redpanda Connect first takes a snapshot of the database and streams the contents before processing changes from the WAL. In streaming mode, Redpanda Connect directly processes changes from the WAL starting from the most recent changes without taking a snapshot first. For local testing, you can use the [example pipeline on this page](#example-pipeline), which runs in snapshot mode. ### [](#snapshot-mode)Snapshot mode If you set the [`stream_snapshot` field](#stream_snapshot) to `true`, Redpanda Connect: 1. Creates a snapshot of your database. 2. Streams the contents of the tables specified in the `postgres_cdc` input. 3. Starts processing changes in the WAL that occurred since the snapshot was taken, and streams them to Redpanda Connect. Once the initial replication process is complete, the snapshot is removed and the input keeps a connection open to the database so that it can receive data updates. If the pipeline restarts during the replication process, Redpanda Connect resumes processing data changes from where it left off. If there are other interruptions while the snapshot is taken, you may need to restart the snapshot process. For more information, see [Troubleshoot replication failures](#troubleshoot_replication_failures). ### [](#streaming-mode)Streaming mode If you set the [`stream_snapshot` field](#stream_snapshot) to `false`, Redpanda Connect starts processing data changes from the end of the WAL. If the pipeline restarts, Redpanda Connect resumes processing data changes from the last acknowledged position in the WAL. ## [](#monitor-the-replication-process)Monitor the replication process You can monitor the initial replication of data using the following metrics: | Metric name | Description | | --- | --- | | replication_lag_bytes | Indicates how far the connector is lagging behind the source database when processing the transaction log. | | postgres_snapshot_progress | Shows the progress of snapshot processing for each table. | ## [](#troubleshoot-replication-failures)Troubleshoot replication failures If the database snapshot fails, the replication slot has only an incomplete record of the existing data in your database. To maintain data integrity, you must drop the replication slot manually in your source database and run the Redpanda Connect pipeline again. ```SQL SELECT pg_drop_replication_slot(SLOT_NAME); ``` ## [](#receive-toast-and-deleted-values)Receive TOAST and deleted values For full visibility of all data updates, you can also choose to stream [TOAST](https://www.postgresql.org/docs/current/storage-toast.html) and deleted values. To enable this option, run the following query on your source database: ```SQL ALTER TABLE large_data REPLICA IDENTITY FULL; ``` ## [](#data-mappings)Data mappings The following table shows how selected PostgreSQL data types are mapped to data types supported in Redpanda Connect. All other data types are mapped to string values. | PostgreSQL data type | Bloblang value | | --- | --- | | TEXT, TIMESTAMP, UUID, VARCHAR | JSON strings, for example: this data | | BOOL | Boolean JSON fields, for example: true or false | | Numeric types (INT4) | JSON number types, for example: 1. | | JSONB | JSON objects, for example: { "message": "message text" } | | INTEGER[] | An array of integer values, for example: [1,2,3] | | TEXT[] | An array of string values, for example: ["value1", "value2", "value3"] | | INET | A string that contains an IP address, for example: "192.168.1.1" | | POINT | A string that represents a point in a two-dimensional plane, for example: (x, y) | | TSRANGE | A string that includes range bounds, for example: [2010-01-01 14:30, 2010-01-01 15:30) | | TSVECTOR | A string that includes vector data, for example: "'the':2 'question':3 'is':4" | ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `table`: The name of the database table from which the message originated. - `operation`: The type of database operation that generated the message, such as `read`, `insert`, `update`, `delete`, `begin` and `commit`. A `read` operation occurs when a snapshot of the database is processed. The `begin` and `commit` operations are only included if the `include_transaction_markers` field is set to `true`. - `lsn`: The [Log Sequence Number](https://www.postgresql.org/docs/current/datatype-pg-lsn.html) of each data update from the source PostgreSQL database. The `lsn` values are strings that can be sorted to determine the order in which data updates were written to the WAL. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay rejected messages (negative acknowledgements) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#aws)`aws` AWS IAM authentication configuration for PostgreSQL instances. When enabled, IAM credentials are used to generate temporary authentication tokens instead of a static password. This is useful for connecting to Amazon RDS or Aurora PostgreSQL instances with IAM database authentication enabled. The generated tokens are valid for 15 minutes and are automatically refreshed. For more information about AWS credentials configuration, see the [credentials for AWS](../../../guides/cloud/aws/) guide. **Type**: `object` ### [](#aws-enabled)`aws.enabled` Enable AWS IAM authentication for PostgreSQL. When enabled, an IAM authentication token is generated and used as the password. **Type**: `bool` **Default**: `false` ### [](#aws-endpoint)`aws.endpoint` The PostgreSQL endpoint hostname (e.g., mydb.abc123.us-east-1.rds.amazonaws.com). **Type**: `string` ### [](#aws-id)`aws.id` The ID of credentials to use. **Type**: `string` ### [](#aws-region)`aws.region` The AWS region where the PostgreSQL instance is located. If no region is specified then the environment default will be used. **Type**: `string` ### [](#aws-role)`aws.role` Optional AWS IAM role ARN to assume for authentication. Alternatively, use `roles` array for role chaining instead. **Type**: `string` ### [](#aws-role_external_id)`aws.role_external_id` Optional external ID for the role assumption. Only used with the `role` field. Alternatively, use `roles` array for role chaining instead. **Type**: `string` ### [](#aws-roles)`aws.roles[]` Optional array of AWS IAM roles to assume for authentication. Roles can be assumed in sequence, enabling chaining for purposes such as cross-account access. Each role can optionally specify an external ID. **Type**: `object` ### [](#aws-roles-role)`aws.roles[].role` AWS IAM role ARN to assume. **Type**: `string` **Default**: `""` ### [](#aws-roles-role_external_id)`aws.roles[].role_external_id` Optional external ID for the role assumption. **Type**: `string` **Default**: `""` ### [](#aws-secret)`aws.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#aws-token)`aws.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#checkpoint_limit)`checkpoint_limit` The maximum number of messages that this input can process at a given time. Increasing this limit enables parallel processing, and batching at the output level. To preserve at-least-once guarantees, any given log sequence number (LSN) is not acknowledged until all messages under that offset are delivered. **Type**: `int` **Default**: `1024` ### [](#dsn)`dsn` The data source name (DSN) of the PostgreSQL database from which you want to stream updates. Use the format `postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​]`. For example, if you wanted to disable SSL in a secure environment, you would add `sslmode=disable` to the connection string. **Type**: `string` ```yaml # Examples: dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable ``` ### [](#heartbeat_interval)`heartbeat_interval` The interval between heartbeat messages, which Redpanda Connect writes to the WAL using the `pg_logical_emit_message` function. Heartbeat messages are useful when you subscribe to data changes from tables with low activity, while other tables in the database have higher-frequency updates. Heartbeat messages allow Redpanda Connect to periodically acknowledge new messages even when no data updates occur. Each acknowledgement advances the committed point in the WAL, which ensures that PostgreSQL can safely reclaim older log segments, preventing excessive disk space usage. Set `heartbeat_interval` to `0s` to disable heartbeats. **Type**: `string` **Default**: `1h` ```yaml # Examples: heartbeat_interval: 0s # --- heartbeat_interval: 24h ``` ### [](#include_transaction_markers)`include_transaction_markers` When set to `true`, creates empty messages for `BEGIN` and `COMMIT` operations which start and complete each transaction. Messages with the `operation` metadata field set to `BEGIN` or `COMMIT` have null message payloads. **Type**: `bool` **Default**: `false` ### [](#max_parallel_snapshot_tables)`max_parallel_snapshot_tables` Specify the maximum number of tables that are processed in parallel when the initial snapshot of the source database is taken. **Type**: `int` **Default**: `1` ### [](#pg_standby_timeout)`pg_standby_timeout` Specify the standby timeout after which an idle connection is refreshed to keep the connection alive. **Type**: `string` **Default**: `10s` ```yaml # Examples: pg_standby_timeout: 30s ``` ### [](#pg_wal_monitor_interval)`pg_wal_monitor_interval` How often to report changes to the replication lag and write them to Redpanda Connect metrics. **Type**: `string` **Default**: `3s` ```yaml # Examples: pg_wal_monitor_interval: 6s ``` ### [](#schema)`schema` The PostgreSQL schema from which to replicate data. **Type**: `string` ```yaml # Examples: schema: public # --- schema: "MyCaseSensitiveSchemaNeedingQuotes" ``` ### [](#slot_name)`slot_name` The name of the PostgreSQL logical replication slot to use. If not provided, a random name is generated unless you create a replication slot manually before starting replication. **Type**: `string` ```yaml # Examples: slot_name: my_test_slot ``` ### [](#snapshot_batch_size)`snapshot_batch_size` The number of table rows to fetch in each batch when querying the snapshot. This option is only available when `stream_snapshot` is set to `true`. **Type**: `int` **Default**: `1000` ```yaml # Examples: snapshot_batch_size: 10000 ``` ### [](#stream_snapshot)`stream_snapshot` When set to `true`, this input streams a snapshot of all existing data in the source database before streaming data changes. To use this setting, all database tables that you want to replicate _must_ have a primary key. **Type**: `bool` **Default**: `false` ```yaml # Examples: stream_snapshot: true ``` ### [](#tables)`tables[]` A list of database table names to include in the snapshot and logical replication. Specify each table name as a separate item. **Type**: `array` ```yaml # Examples: tables: - my_table_1 - "MyCaseSensitiveTableNeedingQuotes" ``` ### [](#temporary_slot)`temporary_slot` If set to `true`, the input creates a temporary replication slot that is automatically dropped when the connection to your source database is closed. You might use this option to: - Avoid data accumulating in the replication slot when a pipeline is paused or stopped - Test the connector If the pipeline is restarted, another data snapshot is taken before data updates are streamed. **Type**: `bool` **Default**: `false` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#unchanged_toast_value)`unchanged_toast_value` Specify the value to emit when unchanged [TOAST values](#receive-toast-and-deleted-values) appear in the message stream. Unchanged values occur for data updates and deletes when `REPLICA IDENTITY` is not set to `FULL`. **Type**: `unknown` **Default**: ```yaml null ``` ```yaml # Examples: unchanged_toast_value: __redpanda_connect_unchanged_toast_value__ ``` ## [](#example-pipeline)Example pipeline You can run the following pipeline locally to check that data updates are streamed from your source database to Redpanda Connect. All transactions are written to stdout. ```yml input: label: "postgres_cdc" postgres_cdc: dsn: postgres://user:password@host:port/dbname include_transaction_markers: false slot_name: test_slot_native_decoder snapshot_batch_size: 100000 stream_snapshot: true temporary_slot: true schema: schema_name tables: - table_name cache_resources: - label: data_caching file: directory: /tmp/cache output: label: main stdout: {} ``` --- # Page 117: read_until **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/read_until.md --- # read\_until --- title: read_until latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/read_until page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/read_until.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/read_until.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/read_until/ "View the Self-Managed version of this component") Reads messages from a child input until a consumed message passes a [Bloblang query](../../../guides/bloblang/about/), at which point the input closes. It is also possible to configure a timeout after which the input is closed if no new messages arrive in that period. ```yml inputs: label: "" read_until: input: "" # No default (required) check: "" # No default (optional) idle_timeout: "" # No default (optional) restart_input: false ``` Messages are read continuously while the query check returns false, when the query returns true the message that triggered the check is sent out and the input is closed. Use this to define inputs where the stream should end once a certain message appears. If the idle timeout is configured, the input will be closed if no new messages arrive after that period of time. Use this field if you want to empty out and close an input that doesn’t have a logical end. Sometimes inputs close themselves. For example, when the `file` input type reaches the end of a file it will shut down. By default this type will also shut down. If you wish for the input type to be restarted every time it shuts down until the query check is met then set `restart_input` to `true`. ## [](#metadata)Metadata A metadata key `benthos_read_until` containing the value `final` is added to the first part of the message that triggers the input to stop. ## [](#fields)Fields ### [](#check)`check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether the input should now be closed. **Type**: `string` ```yaml # Examples: check: this.type == "foo" # --- check: count("messages") >= 100 ``` ### [](#idle_timeout)`idle_timeout` The maximum amount of time without receiving new messages after which the input is closed. **Type**: `string` ```yaml # Examples: idle_timeout: 5s ``` ### [](#input)`input` The child input to consume from. **Type**: `input` ### [](#restart_input)`restart_input` Whether the input should be reopened if it closes itself before the condition has resolved to true. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#consume-n-messages)Consume N Messages A common reason to use this input is to consume only N messages from an input and then stop. This can easily be done with the [`count` function](../../../guides/bloblang/functions/#count): ```yaml # Only read 100 messages, and then exit. input: read_until: check: count("messages") >= 100 input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup ``` ### [](#read-from-a-kafka-and-close-when-empty)Read from a kafka and close when empty A common reason to use this input is a job that consumes all messages and exits once its empty: ```yaml # Consumes all messages and exit when the last message was consumed 5s ago. input: read_until: idle_timeout: 5s input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup ``` --- # Page 118: redis_list **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redis_list.md --- # redis\_list --- title: redis_list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redis_list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redis_list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redis_list.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redis_list/)[Output](/redpanda-cloud/develop/connect/components/outputs/redis_list/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redis_list/ "View the Self-Managed version of this component") Pops messages from the beginning of a Redis list using the BLPop command. #### Common ```yml inputs: label: "" redis_list: url: "" # No default (required) key: "" # No default (required) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" redis_list: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] key: "" # No default (required) auto_replay_nacks: true max_in_flight: 0 timeout: 5s command: blpop ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#command)`command` The command used to pop elements from the Redis list **Type**: `string` **Default**: `blpop` **Options**: `blpop`, `brpop` ### [](#key)`key` The key of a list to read from. **Type**: `string` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#max_in_flight)`max_in_flight` Optionally sets a limit on the number of messages that can be flowing through a Redpanda Connect stream pending acknowledgment from the input at any given time. Once a message has been either acknowledged or rejected (nacked) it is no longer considered pending. If the input produces logical batches then each batch is considered a single count against the maximum. **WARNING**: Batching policies at the output level will stall if this field limits the number of messages below the batching threshold. Zero (default) or lower implies no limit. **Type**: `int` **Default**: `0` ### [](#timeout)`timeout` The length of time to poll for new messages before reattempting. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 119: redis_pubsub **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redis_pubsub.md --- # redis\_pubsub --- title: redis_pubsub latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redis_pubsub page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redis_pubsub.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redis_pubsub.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redis_pubsub/)[Output](/redpanda-cloud/develop/connect/components/outputs/redis_pubsub/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redis_pubsub/ "View the Self-Managed version of this component") Consume from a Redis publish/subscribe channel using either the SUBSCRIBE or PSUBSCRIBE commands. #### Common ```yml inputs: label: "" redis_pubsub: url: "" # No default (required) channels: [] # No default (required) use_patterns: false auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" redis_pubsub: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] channels: [] # No default (required) use_patterns: false auto_replay_nacks: true ``` In order to subscribe to channels using the `PSUBSCRIBE` command set the field `use_patterns` to `true`, then you can include glob-style patterns in your channel names. For example: - `h?llo` subscribes to hello, hallo and hxllo - `h*llo` subscribes to hllo and heeeello - `h[ae]llo` subscribes to hello and hallo, but not hillo Use `\` to escape special characters if you want to match them verbatim. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#channels)`channels[]` A list of channels to consume from. **Type**: `array` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` ### [](#use_patterns)`use_patterns` Whether to use the PSUBSCRIBE command, allowing for glob-style patterns within target channel names. **Type**: `bool` **Default**: `false` --- # Page 120: redis_scan **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redis_scan.md --- # redis\_scan --- title: redis_scan latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redis_scan page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redis_scan.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redis_scan.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redis_scan/ "View the Self-Managed version of this component") Scans the set of keys in the current selected database and gets their values, using the Scan and Get commands. #### Common ```yml inputs: label: "" redis_scan: url: "" # No default (required) auto_replay_nacks: true match: "" ``` #### Advanced ```yml inputs: label: "" redis_scan: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] auto_replay_nacks: true match: "" ``` Optionally, iterates only elements matching a blob-style pattern. For example: - `**foo**` iterates only keys which contain `foo` in it. - `foo*` iterates only keys starting with `foo`. This input generates a message for each key value pair in the following format: ```json {"key":"foo","value":"bar"} ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#match)`match` Iterates only elements matching the optional glob-style pattern. By default, it matches all elements. **Type**: `string` **Default**: `""` ```yaml # Examples: match: * # --- match: 1* # --- match: foo* # --- match: foo # --- match: *4* ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 121: redis_streams **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redis_streams.md --- # redis\_streams --- title: redis_streams latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redis_streams page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redis_streams.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redis_streams.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redis_streams/)[Output](/redpanda-cloud/develop/connect/components/outputs/redis_streams/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redis_streams/ "View the Self-Managed version of this component") Pulls messages from Redis (v5.0+) streams with the XREADGROUP command. The `client_id` should be unique for each consumer of a group. #### Common ```yml inputs: label: "" redis_streams: url: "" # No default (required) body_key: body streams: [] # No default (required) auto_replay_nacks: true limit: 10 client_id: "" consumer_group: "" ``` #### Advanced ```yml inputs: label: "" redis_streams: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] body_key: body streams: [] # No default (required) auto_replay_nacks: true limit: 10 client_id: "" consumer_group: "" create_streams: true start_from_oldest: true commit_period: 1s timeout: 1s ``` Redis stream entries are key/value pairs, as such it is necessary to specify the key that contains the body of the message. All other keys/value pairs are saved as metadata fields. ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#body_key)`body_key` The field key to extract the raw message from. All other keys will be stored in the message as metadata. **Type**: `string` **Default**: `body` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `""` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#commit_period)`commit_period` The period of time between each commit of the current offset. Offsets are always committed during shutdown. **Type**: `string` **Default**: `1s` ### [](#consumer_group)`consumer_group` An identifier for the consumer group of the stream. **Type**: `string` **Default**: `""` ### [](#create_streams)`create_streams` Create subscribed streams if they do not exist (MKSTREAM option). **Type**: `bool` **Default**: `true` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#limit)`limit` The maximum number of messages to consume from a single request. **Type**: `int` **Default**: `10` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#start_from_oldest)`start_from_oldest` If an offset is not found for a stream, determines whether to consume from the oldest available offset, otherwise messages are consumed from the latest offset. **Type**: `bool` **Default**: `true` ### [](#streams)`streams[]` A list of streams to consume from. **Type**: `array` ### [](#timeout)`timeout` The length of time to poll for new messages before reattempting. **Type**: `string` **Default**: `1s` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 122: redpanda_common **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redpanda_common.md --- # redpanda\_common --- title: redpanda_common latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redpanda_common page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redpanda_common.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redpanda_common.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redpanda_common/)[Output](/redpanda-cloud/develop/connect/components/outputs/redpanda_common/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redpanda_common/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../redpanda/) and [`redpanda` output](../../outputs/redpanda/) components. Consumes data from a Redpanda (Kafka) broker, using credentials from a common `redpanda` configuration block. To avoid duplicating Redpanda cluster credentials in your `redpanda_common` input, output, or any other components in your data pipeline, you can use a single [`redpanda` configuration block](../../redpanda/about/). For more details, see the [Pipeline example](#pipeline-example). > 📝 **NOTE** > > If you need to move topic data between Redpanda clusters or other Apache Kafka clusters, consider using the [`redpanda` input](../redpanda/) and [output](../../outputs/redpanda/) instead. #### Common ```yml inputs: label: "" redpanda_common: topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" redpanda_common: topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) rack_id: "" instance_id: "" rebalance_timeout: 45s session_timeout: 1m heartbeat_interval: 3s start_offset: earliest fetch_max_bytes: 50MiB fetch_max_wait: 5s fetch_min_bytes: 1B fetch_max_partition_bytes: 1MiB transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) commit_period: 5s partition_buffer_bytes: 1MB topic_lag_refresh_period: 5s max_yield_batch_bytes: 32KB auto_replay_nacks: true timely_nacks_maximum_wait: "" # No default (optional) ``` ## [](#pipeline-example)Pipeline example This data pipeline reads data from `topic_A` and `topic_B` on a Redpanda cluster, and then writes the data to `topic_C` on the same cluster. The cluster details are configured within the `redpanda` configuration block, so you only need to configure them once. This is a useful feature when you have multiple inputs and outputs in the same data pipeline that need to connect to the same cluster. ```none input: redpanda_common: topics: [ topic_A, topic_B ] output: redpanda_common: topic: topic_C key: ${! @id } redpanda: seed_brokers: [ "127.0.0.1:9092" ] tls: enabled: true sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ## [](#consumer-groups)Consumer groups When you specify a consumer group in your configuration, this input consumes one or more topics and automatically balances the topic partitions across any other connected clients with the same consumer group. Otherwise, topics are consumed in their entirety or with explicit partitions. ### [](#delivery-guarantees)Delivery guarantees If you choose to use consumer groups, the offsets of records received by Redpanda Connect are committed automatically. In the event of restarts, this input uses the committed offsets to resume data consumption where it left off. Redpanda Connect guarantees at-least-once delivery. Records are only confirmed as delivered when all downstream outputs that a record is routed to have also confirmed delivery. ## [](#ordering)Ordering To preserve the order of topic partitions: - Records consumed from each partition are processed and delivered in the order that they are received - Only one batch of records of a given partition is processed at a time This approach means that although records from different partitions may be processed in parallel, records from the same partition are processed in sequential order. ### [](#delivery-errors)Delivery errors The order in which records are delivered may be disrupted by delivery errors and any error-handling mechanisms that start up. Redpanda Connect uses at-least-once delivery unless instructed otherwise, and this includes reattempting delivery of data when the ordering of that data is no longer guaranteed. For example, a batch of records is sent to an output broker and only a subset of records are delivered. In this scenario, Redpanda Connect (by default) attempts to deliver the records that failed, even though these delivery failures may have been sent before records that were delivered successfully. #### [](#use-a-fallback-output)Use a fallback output To prevent delivery errors from disrupting the order of records, you must specify a [`fallback`](../../outputs/fallback/) output in your pipeline configuration. When adding a `fallback` output, it is good practice to set the `auto_retry_nacks` field to `false`. This also improves the throughput of your pipeline. For example, the following configuration includes a `fallback` output. If Redpanda Connect fails to write delivery errors to the `foo` topic, it then attempts to write them into a dead letter queue topic (`foo_dlq`), which is retried indefinitely as a way to apply back pressure. ```yaml output: fallback: - redpanda_common: topic: foo - retry: output: redpanda_common: topic: foo_dlq ``` ## [](#batching)Batching Records are processed and delivered from each partition in the same batches as they are received from brokers. Batch sizes are dynamically sized in order to optimize throughput, but you can tune them further using the following configuration fields: - `fetch_max_partition_bytes` - `fetch_max_bytes` You can break batches down further using the [`split`](../../processors/split/) processor. ## [](#metrics)Metrics This input emits a `redpanda_lag` metric with `topic` and `partition` labels for each consumed topic. The metric records the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `kafka_key` - `kafka_topic` - `kafka_partition` - `kafka_offset` - `kafka_lag` - `kafka_timestamp_ms` - `kafka_timestamp_unix` - `kafka_tombstone_message` - All record headers ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#commit_period)`commit_period` The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. **Type**: `string` **Default**: `5s` ### [](#consumer_group)`consumer_group` An optional consumer group. When this value is specified: - The partitions of any topics, specified in the `topics` field, are automatically distributed across consumers sharing a consumer group - Partition offsets are automatically committed and resumed under this name Consumer groups are not supported when you specify explicit partitions to consume from in the `topics` field. **Type**: `string` ### [](#fetch_max_bytes)`fetch_max_bytes` The maximum number of bytes that a broker tries to send during a fetch. If individual records are larger than the `fetch_max_bytes` value, brokers will still send them. **Type**: `string` **Default**: `50MiB` ### [](#fetch_max_partition_bytes)`fetch_max_partition_bytes` The maximum number of bytes that are consumed from a single partition in a fetch request. This field is equivalent to the Java setting `fetch.max.partition.bytes`. If a single batch is larger than the `fetch_max_partition_bytes` value, the batch is still sent so that the client can make progress. **Type**: `string` **Default**: `1MiB` ### [](#fetch_max_wait)`fetch_max_wait` The maximum period of time a broker can wait for a fetch response to reach the required minimum number of bytes (`fetch_min_bytes`). **Type**: `string` **Default**: `5s` ### [](#fetch_min_bytes)`fetch_min_bytes` The minimum number of bytes that a broker tries to send during a fetch. This field is equivalent to the Java setting `fetch.min.bytes`. **Type**: `string` **Default**: `1B` ### [](#heartbeat_interval)`heartbeat_interval` When you specify a `consumer_group`, `heartbeat_interval` sets how frequently a consumer group member should send heartbeats to Apache Kafka. Apache Kafka uses heartbeats to make sure that a group member’s session is active. You must set `heartbeat_interval` to less than one-third of `session_timeout`. This field is equivalent to the Java `heartbeat.interval.ms` setting and accepts Go duration format strings such as `10s` or `2m`. **Type**: `string` **Default**: `3s` ### [](#instance_id)`instance_id` When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to define the group’s static membership, which can prevent unnecessary rebalances during reconnections. When you assign an instance ID, the client does not automatically leave the consumer group when it disconnects. To remove the client, you must use an external admin command on behalf of the instance ID. **Type**: `string` **Default**: `""` ### [](#max_yield_batch_bytes)`max_yield_batch_bytes` The maximum size (in bytes) for each batch yielded by this input. This value must be less than or equal to the `partition_buffer_bytes`. If using Redpanda output, this value should not be greater than the `max_message_bytes` option value (1MB by default), and for high-throughput scenarios they should be equal. **Type**: `string` **Default**: `32KB` ### [](#partition_buffer_bytes)`partition_buffer_bytes` A buffer size (in bytes) for each consumed partition, which allows the internal queuing of records before they are flushed. Increasing this value may improve throughput but results in higher memory utilization. Each buffer can grow slightly beyond this value. **Type**: `string` **Default**: `1MB` ### [](#rack_id)`rack_id` A rack specifies where the client is physically located, and changes fetch requests to consume from the closest replica as opposed to the leader replica. **Type**: `string` **Default**: `""` ### [](#rebalance_timeout)`rebalance_timeout` When you specify a [`consumer_group`](#consumer_group), `rebalance_timeout` sets a time limit for all consumer group members to complete their work and commit offsets after a rebalance has begun. The timeout excludes the time taken to detect a failed or late heartbeat, which indicates a rebalance is required. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `45s` ### [](#regexp_topics_exclude)`regexp_topics_exclude[]` A list of regular expression patterns for excluding topics when regex mode is enabled (using `regexp_topics_include` or the deprecated `regexp_topics` boolean). Topics matching any of these patterns will be excluded from consumption, even if they match include patterns. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so use `^` and `$` for exact matching. Exclude patterns are applied after include patterns, providing fine-grained control over topic selection. Example: `regexp_topics_exclude: ["^_", ".**-temp$", ".**-test.*"]` excludes topics starting with underscore, ending with `-temp`, or containing `-test`. **Type**: `array` ### [](#regexp_topics_include)`regexp_topics_include[]` A list of regular expression patterns for matching topics to consume from. When specified, the client will periodically refresh the list of matching topics based on the `metadata_max_age` interval. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so `logs_.` **matches `my-logs_events` and `logs_errors`. Use `^logs_.`**`$` to match only topics starting with `logs_`. This field enables regex mode (replacing the deprecated `regexp_topics` boolean) and cannot be used together with explicit `topics` lists. Use `regexp_topics_exclude` to filter out specific patterns from the matched topics. Example: `regexp_topics_include: ["events_.**", "logs_.**"]` consumes from all topics starting with `events_` or `logs_`. **Type**: `array` ```yaml # Examples: regexp_topics_include: - logs_.* - metrics_.* # --- regexp_topics_include: - "events_[0-9]+" ``` ### [](#session_timeout)`session_timeout` When you specify a `consumer_group`, `session_timeout` sets the maximum interval between heartbeats sent by a consumer group member to the broker. If a broker doesn’t receive a heartbeat from a group member before the timeout expires, it removes the member from the consumer group and initiates a rebalance. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1m` ### [](#start_offset)`start_offset` Specify the offset from which this input starts or restarts consuming messages. Restarts occur when the `OffsetOutOfRange` error is seen during a fetch. **Type**: `string` **Default**: `earliest` | Option | Summary | | --- | --- | | committed | Prevents consuming a partition in a group if the partition has no prior commits. Corresponds to Kafka’s auto.offset.reset=none option | | earliest | Start from the earliest offset. Corresponds to Kafka’s auto.offset.reset=earliest option. | | latest | Start from the latest offset. Corresponds to Kafka’s auto.offset.reset=latest option. | ### [](#timely_nacks_maximum_wait)`timely_nacks_maximum_wait` EXPERIMENTAL: Specify a maximum period of time in which each message can be consumed and awaiting either acknowledgement or rejection before rejection is instead forced. This can be useful for avoiding situations where certain downstream components can result in blocked confirmation of delivery that exceeds SLAs. Accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` ### [](#topic_lag_refresh_period)`topic_lag_refresh_period` The interval between refresh cycles. During each cycle, this input queries the Redpanda Connect server to calculate the topic lag minus the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `5s` ### [](#topics)`topics[]` A list of topics to consume from. Use commas to separate multiple topics in a single element. When a `consumer_group` is specified, partitions are automatically distributed across consumers of a topic. Otherwise, all partitions are consumed. Alternatively, you can specify explicit partitions to consume by using a colon after the topic name. For example, `foo:0` would consume the partition `0` of the topic foo. This syntax supports ranges. For example, `foo:0-10` would consume partitions `0` through to `10` inclusive. It is also possible to specify an explicit offset to consume from by adding another colon after the partition. For example, `foo:0:10` would consume the partition `0` of the topic `foo` starting from the offset `10`. If the offset is not present (or remains unspecified) then the field `start_offset` determines which offset to start from. **Type**: `array` ```yaml # Examples: topics: - foo - bar # --- topics: - things.* # --- topics: - "foo,bar" # --- topics: - "foo:0" - "bar:1" - "bar:3" # --- topics: - "foo:0,bar:1,bar:3" # --- topics: - "foo:0-5" ``` ### [](#transaction_isolation_level)`transaction_isolation_level` The isolation level for handling transactional messages. This setting determines how transactions are processed and affects data consistency guarantees. **Type**: `string` **Default**: `read_uncommitted` | Option | Summary | | --- | --- | | read_committed | If set, only committed transactional records are processed. | | read_uncommitted | If set, then uncommitted records are processed. | --- # Page 123: redpanda_migrator **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redpanda_migrator.md --- # redpanda\_migrator --- title: redpanda_migrator page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redpanda_migrator page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redpanda_migrator.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redpanda_migrator.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-10-02" page-git-modified-date: "2025-01-28" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redpanda_migrator/)[Output](/redpanda-cloud/develop/connect/components/outputs/redpanda_migrator/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redpanda_migrator/ "View the Self-Managed version of this component") Unified Kafka consumer for migrating data between Kafka/Redpanda clusters. Use this input with the [`redpanda_migrator` output](../../outputs/redpanda_migrator/) to safely transfer topic data, ACLs, schemas, and consumer group offsets between clusters. This component is designed for migration scenarios. #### Common ```yml inputs: label: "" redpanda_migrator: seed_brokers: [] # No default (required) topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" redpanda_migrator: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) rack_id: "" instance_id: "" rebalance_timeout: 45s session_timeout: 1m heartbeat_interval: 3s start_offset: earliest fetch_max_bytes: 50MiB fetch_max_wait: 5s fetch_min_bytes: 1B fetch_max_partition_bytes: 1MiB transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) commit_period: 5s partition_buffer_bytes: 1MB topic_lag_refresh_period: 5s max_yield_batch_bytes: 32KB schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} auto_replay_nacks: true ``` The `redpanda_migrator` input: - Reads a batch of messages from a broker. - Waits for the `redpanda_migrator` output to acknowledge the writes before updating the Kafka consumer group offset. - Provides the same delivery guarantees and ordering semantics as the [`redpanda` input](../redpanda/). Specify a consumer group to make this input consume one or more topics and automatically balance the topic partitions across any other connected clients with the same consumer group. Otherwise, topics are consumed in their entirety or with explicit partitions. This input requires a corresponding `redpanda_migrator` output in the same pipeline. Each pipeline must have both input and output components configured. For capabilities, guarantees, scheduling, and examples, see the output documentation. ## [](#requirements)Requirements - Must be paired with a `redpanda_migrator` output in the same pipeline. - Requires access to a source Kafka or Redpanda cluster. - Consumer group configuration is recommended for partition balancing. ## [](#multiple-migrator-pairs)Multiple migrator pairs When using multiple migrator pairs in a single pipeline, coordination is based on the `label` field. The label of the input and output must match exactly for correct pairing. If labels do not match, migration fails for that pair. ## [](#performance-tuning-for-high-throughput)Performance tuning for high throughput For workloads with high message rates or large messages, adjust the following settings to optimize throughput: On this input component: - `partition_buffer_bytes`: Set to 2MB to increase per-partition buffer size - `max_yield_batch_bytes`: Set to 1MB to allow larger batches to be yielded On the paired `redpanda_migrator` output component: - `max_in_flight`: Set to the total number of partitions being copied in parallel (up to all partitions in the cluster) > 📝 **NOTE** > > Setting `max_yield_batch_bytes` over 1MB is counter-productive unless you change the broker settings to allow bigger messages or batches. The `partition_buffer_bytes` setting allows for partition readahead. ## [](#metrics)Metrics This input emits an `input_redpanda_migrator_lag` metric with `topic` and `partition` labels for each consumed topic. This metric records the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. Monitor this metric to track migration progress and detect bottlenecks. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - kafka\_key - kafka\_topic - kafka\_partition - kafka\_offset - kafka\_lag - kafka\_timestamp\_ms - kafka\_timestamp\_unix - All record headers ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#commit_period)`commit_period` The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. **Type**: `string` **Default**: `5s` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#consumer_group)`consumer_group` An optional consumer group. When specified, the partitions of specified topics are automatically distributed across consumers sharing a consumer group, and partition offsets are automatically committed and resumed under this name. Consumer groups are not supported when explicit partitions are specified to consume from in the `topics` field. **Type**: `string` ### [](#fetch_max_bytes)`fetch_max_bytes` The maximum number of bytes that a broker tries to send during a fetch. If individual records are larger than the `fetch_max_bytes` value, brokers still send them. **Type**: `string` **Default**: `50MiB` ### [](#fetch_max_partition_bytes)`fetch_max_partition_bytes` The maximum number of bytes that are consumed from a single partition in a fetch request. This field is equivalent to the Java setting `fetch.max.partition.bytes`. If a single batch is larger than the `fetch_max_partition_bytes` value, the batch is still sent so that the client can make progress. **Type**: `string` **Default**: `1MiB` ### [](#fetch_max_wait)`fetch_max_wait` The maximum period of time a broker can wait for a fetch response to reach the required minimum number of bytes (`fetch_min_bytes`). **Type**: `string` **Default**: `5s` ### [](#fetch_min_bytes)`fetch_min_bytes` The minimum number of bytes that a broker tries to send during a fetch. This field is equivalent to the Java setting `fetch.min.bytes`. **Type**: `string` **Default**: `1B` ### [](#heartbeat_interval)`heartbeat_interval` When you specify a `consumer_group`, `heartbeat_interval` sets how frequently a consumer group member should send heartbeats to Apache Kafka. Apache Kafka uses heartbeats to make sure that a group member’s session is active. You must set `heartbeat_interval` to less than one-third of `session_timeout`. This field is equivalent to the Java `heartbeat.interval.ms` setting and accepts Go duration format strings such as `10s` or `2m`. **Type**: `string` **Default**: `3s` ### [](#instance_id)`instance_id` When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to define the group’s static membership, which can prevent unnecessary rebalances during reconnections. When you assign an instance ID, the client does not automatically leave the consumer group when it disconnects. To remove the client, you must use an external admin command on behalf of the instance ID. **Type**: `string` **Default**: `""` ### [](#max_yield_batch_bytes)`max_yield_batch_bytes` The maximum size (in bytes) for each batch yielded by this input. This value must be less than or equal to the `partition_buffer_bytes`. If using Redpanda output, this value should not be greater than the `max_message_bytes` option value (1MB by default), and for high-throughput scenarios they should be equal. **Type**: `string` **Default**: `32KB` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#partition_buffer_bytes)`partition_buffer_bytes` A buffer size (in bytes) for each consumed partition, which allows the internal queuing of records before they are flushed. Increasing this value may improve throughput but results in higher memory utilization. Each buffer can grow slightly beyond this value. **Type**: `string` **Default**: `1MB` ### [](#rack_id)`rack_id` A rack identifier for this client. **Type**: `string` **Default**: `""` ### [](#rebalance_timeout)`rebalance_timeout` When you specify a [`consumer_group`](#consumer_group), `rebalance_timeout` sets a time limit for all consumer group members to complete their work and commit offsets after a rebalance has begun. The timeout excludes the time taken to detect a failed or late heartbeat, which indicates a rebalance is required. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `45s` ### [](#regexp_topics_exclude)`regexp_topics_exclude[]` A list of regular expression patterns for excluding topics when regex mode is enabled (using `regexp_topics_include` or the deprecated `regexp_topics` boolean). Topics matching any of these patterns will be excluded from consumption, even if they match include patterns. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so use `^` and `$` for exact matching. Exclude patterns are applied after include patterns, providing fine-grained control over topic selection. Example: `regexp_topics_exclude: ["^_", ".**-temp$", ".**-test.*"]` excludes topics starting with underscore, ending with `-temp`, or containing `-test`. **Type**: `array` ### [](#regexp_topics_include)`regexp_topics_include[]` A list of regular expression patterns for matching topics to consume from. When specified, the client will periodically refresh the list of matching topics based on the `metadata_max_age` interval. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so `logs_.` **matches `my-logs_events` and `logs_errors`. Use `^logs_.`**`$` to match only topics starting with `logs_`. This field enables regex mode (replacing the deprecated `regexp_topics` boolean) and cannot be used together with explicit `topics` lists. Use `regexp_topics_exclude` to filter out specific patterns from the matched topics. Example: `regexp_topics_include: ["events_.**", "logs_.**"]` consumes from all topics starting with `events_` or `logs_`. **Type**: `array` ```yaml # Examples: regexp_topics_include: - logs_.* - metrics_.* # --- regexp_topics_include: - "events_[0-9]+" ``` ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods of SASL authentication, which are tried in order. If the broker supports the first mechanism, all connections use that mechanism. If the first mechanism fails, the client picks the first supported mechanism. Connections fail if the broker does not support any client mechanisms. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#schema_registry)`schema_registry` Configuration for schema registry integration. Enables migration of schema subjects, versions, and compatibility settings between clusters. **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth)`schema_registry.oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-timeout)`schema_registry.timeout` HTTP client timeout for schema registry requests. **Type**: `string` **Default**: `5s` ### [](#schema_registry-tls)`schema_registry.tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-enabled)`schema_registry.tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#schema_registry-url)`schema_registry.url` The base URL of the schema registry service. Required for schema migration functionality. **Type**: `string` ```yaml # Examples: url: http://localhost:8081 # --- url: https://schema-registry.example.com:8081 ``` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#session_timeout)`session_timeout` When you specify a `consumer_group`, `session_timeout` sets the maximum interval between heartbeats sent by a consumer group member to the broker. If a broker doesn’t receive a heartbeat from a group member before the timeout expires, it removes the member from the consumer group and initiates a rebalance. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1m` ### [](#start_offset)`start_offset` Specify the offset from which this input starts or restarts consuming messages. Restarts occur when the `OffsetOutOfRange` error is seen during a fetch. **Type**: `string` **Default**: `earliest` | Option | Summary | | --- | --- | | committed | Prevents consuming a partition in a group if the partition has no prior commits. Corresponds to Kafka’s auto.offset.reset=none option | | earliest | Start from the earliest offset. Corresponds to Kafka’s auto.offset.reset=earliest option. | | latest | Start from the latest offset. Corresponds to Kafka’s auto.offset.reset=latest option. | ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic_lag_refresh_period)`topic_lag_refresh_period` The interval between refresh cycles. During each cycle, this input queries the Redpanda Connect server to calculate the topic lag minus the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `5s` ### [](#topics)`topics[]` A list of topics to consume from. Use commas to separate multiple topics in a single element. When a `consumer_group` is specified, partitions are automatically distributed across consumers of a topic. Otherwise, all partitions are consumed. Alternatively, you can specify explicit partitions to consume by using a colon after the topic name. For example, `foo:0` would consume the partition `0` of the topic foo. This syntax supports ranges. For example, `foo:0-10` would consume partitions `0` through to `10` inclusive. It is also possible to specify an explicit offset to consume from by adding another colon after the partition. For example, `foo:0:10` would consume the partition `0` of the topic `foo` starting from the offset `10`. If the offset is not present (or remains unspecified) then the field `start_offset` determines which offset to start from. **Type**: `array` ```yaml # Examples: topics: - foo - bar # --- topics: - things.* # --- topics: - "foo,bar" # --- topics: - "foo:0" - "bar:1" - "bar:3" # --- topics: - "foo:0,bar:1,bar:3" # --- topics: - "foo:0-5" ``` ### [](#transaction_isolation_level)`transaction_isolation_level` The isolation level for handling transactional messages. This setting determines how transactions are processed and affects data consistency guarantees. **Type**: `string` **Default**: `read_uncommitted` | Option | Summary | | --- | --- | | read_committed | If set, only committed transactional records are processed. | | read_uncommitted | If set, then uncommitted records are processed. | ## [](#troubleshooting)Troubleshooting - Ensure the input and output `label` fields match exactly. - Both input and output must be present in the pipeline. - Verify consumer group configuration for partition balancing. - Monitor the lag metric for stalled migration. ## [](#suggested-reading)Suggested reading - [`redpanda_migrator` output](../../outputs/redpanda_migrator/) - [Migrating from legacy components](../../../guides/migrate-unified-redpanda-migrator/) --- # Page 124: redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/redpanda.md --- # redpanda --- title: redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/redpanda page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/redpanda.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/redpanda.adoc page-git-created-date: "2024-11-19" page-git-modified-date: "2025-04-25" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/redpanda/)[Cache](/redpanda-cloud/develop/connect/components/caches/redpanda/)[Output](/redpanda-cloud/develop/connect/components/outputs/redpanda/)[Tracer](/redpanda-cloud/develop/connect/components/tracers/redpanda/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/redpanda/ "View the Self-Managed version of this component") Consumes topic data from one or more Kafka brokers. #### Common ```yml inputs: label: "" redpanda: seed_brokers: [] # No default (optional) topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" redpanda: seed_brokers: [] # No default (optional) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topics: [] # No default (optional) regexp_topics_include: [] # No default (optional) regexp_topics_exclude: [] # No default (optional) rack_id: "" instance_id: "" rebalance_timeout: 45s session_timeout: 1m heartbeat_interval: 3s start_offset: earliest fetch_max_bytes: 50MiB fetch_max_wait: 5s fetch_min_bytes: 1B fetch_max_partition_bytes: 1MiB transaction_isolation_level: read_uncommitted consumer_group: "" # No default (optional) commit_period: 5s partition_buffer_bytes: 1MB topic_lag_refresh_period: 5s max_yield_batch_bytes: 32KB unordered_processing: enabled: false checkpoint_limit: 1024 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) auto_replay_nacks: true timely_nacks_maximum_wait: "" # No default (optional) extract_tracing_map: "" # No default (optional) ``` ## [](#consumer-groups)Consumer groups When you specify a consumer group in your configuration, this input consumes one or more topics and automatically balances the topic partitions across any other connected clients with the same consumer group. Otherwise, topics are consumed in their entirety or with explicit partitions. ## [](#delivery-guarantees)Delivery guarantees If you choose to use consumer groups, the offsets of records received by Redpanda Connect are committed automatically. In the event of restarts, this input uses the committed offsets to resume data consumption where it left off. Redpanda Connect guarantees at-least-once delivery. Records are only confirmed as delivered when all downstream outputs that a record is routed to have also confirmed delivery. ## [](#ordering)Ordering To preserve the order of topic partitions: - Records consumed from each partition are processed and delivered in the order that they are received - Only one batch of records of a given partition is processed at a time This approach means that although records from different partitions may be processed in parallel, records from the same partition are processed in sequential order. ### [](#delivery-errors)Delivery errors The order in which records are delivered may be disrupted by delivery errors and any error-handling mechanisms that start up. Redpanda Connect leans towards at-least-once delivery unless instructed otherwise, and this includes reattempting delivery of data when the ordering of that data is no longer guaranteed. For example, a batch of records is sent to an output broker and only a subset of records are delivered. In this scenario, Redpanda Connect (by default) attempts to deliver the records that failed, even though these delivery failures may have been sent before records that were delivered successfully. #### [](#use-a-fallback-output)Use a fallback output To prevent delivery errors from disrupting the order of records, you must specify a [`fallback`](../../outputs/fallback/) output in your pipeline configuration. When adding a `fallback` output, it is good practice to set the `auto_retry_nacks` field to `false`. This also improves the throughput of your pipeline. For example, the following configuration includes a `fallback` output. If Redpanda Connect fails to write delivery errors to the `foo` topic, it then attempts to write them into a dead letter queue topic (`foo_dlq`), which is retried indefinitely as a way to apply back pressure. ```yaml output: fallback: - redpanda_common: topic: foo - retry: output: redpanda_common: topic: foo_dlq ``` ## [](#batching)Batching Records are processed and delivered from each partition in the same batches as they are received from brokers. Batch sizes are dynamically sized in order to optimize throughput, but you can tune them further using the following configuration fields: - `fetch_max_partition_bytes` - `fetch_max_bytes` You can break batches down further using the [`split`](../../processors/split/) processor. ## [](#metrics)Metrics This input emits a `redpanda_lag` metric with `topic` and `partition` labels for each consumed topic. The metric records the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. ## [](#metadata)Metadata This input adds the following metadata fields to each message: - `kafka_key` - `kafka_topic` - `kafka_partition` - `kafka_offset` - `kafka_lag` - `kafka_timestamp_ms` - `kafka_timestamp_unix` - `kafka_tombstone_message` - All record headers ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#commit_period)`commit_period` The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. **Type**: `string` **Default**: `5s` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#consumer_group)`consumer_group` An optional consumer group. When this value is specified: - The partitions of any topics, specified in the `topics` field, are automatically distributed across consumers sharing a consumer group - Partition offsets are automatically committed and resumed under this name Consumer groups are not supported when you specify explicit partitions to consume from in the `topics` field. **Type**: `string` ### [](#extract_tracing_map)`extract_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: extract_tracing_map: root = @ # --- extract_tracing_map: root = this.meta.span ``` ### [](#fetch_max_bytes)`fetch_max_bytes` The maximum number of bytes that a broker tries to send during a fetch. If individual records are larger than the `fetch_max_bytes` value, brokers will still send them. **Type**: `string` **Default**: `50MiB` ### [](#fetch_max_partition_bytes)`fetch_max_partition_bytes` The maximum number of bytes that are consumed from a single partition in a fetch request. This field is equivalent to the Java setting `fetch.max.partition.bytes`. If a single batch is larger than the `fetch_max_partition_bytes` value, the batch is still sent so that the client can make progress. **Type**: `string` **Default**: `1MiB` ### [](#fetch_max_wait)`fetch_max_wait` The maximum period of time a broker can wait for a fetch response to reach the required minimum number of bytes (`fetch_min_bytes`). **Type**: `string` **Default**: `5s` ### [](#fetch_min_bytes)`fetch_min_bytes` The minimum number of bytes that a broker tries to send during a fetch. This field is equivalent to the Java setting `fetch.min.bytes`. **Type**: `string` **Default**: `1B` ### [](#heartbeat_interval)`heartbeat_interval` When you specify a `consumer_group`, `heartbeat_interval` sets how frequently a consumer group member should send heartbeats to Apache Kafka. Apache Kafka uses heartbeats to make sure that a group member’s session is active. You must set `heartbeat_interval` to less than one-third of `session_timeout`. This field is equivalent to the Java `heartbeat.interval.ms` setting and accepts Go duration format strings such as `10s` or `2m`. **Type**: `string` **Default**: `3s` ### [](#instance_id)`instance_id` When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to define the group’s static membership, which can prevent unnecessary rebalances during reconnections. When you assign an instance ID, the client does not automatically leave the consumer group when it disconnects. To remove the client, you must use an external admin command on behalf of the instance ID. **Type**: `string` **Default**: `""` ### [](#max_yield_batch_bytes)`max_yield_batch_bytes` The maximum size (in bytes) for each batch yielded by this input. This value must be less than or equal to the `partition_buffer_bytes`. If using Redpanda output, this value should not be greater than the `max_message_bytes` option value (1MB by default), and for high-throughput scenarios they should be equal. **Type**: `string` **Default**: `32KB` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#partition_buffer_bytes)`partition_buffer_bytes` A buffer size (in bytes) for each consumed partition, which allows the internal queuing of records before they are flushed. Increasing this value may improve throughput but results in higher memory utilization. Each buffer can grow slightly beyond this value. **Type**: `string` **Default**: `1MB` ### [](#rack_id)`rack_id` A rack specifies where the client is physically located, and changes fetch requests to consume from the closest replica as opposed to the leader replica. **Type**: `string` **Default**: `""` ### [](#rebalance_timeout)`rebalance_timeout` When you specify a [`consumer_group`](#consumer_group), `rebalance_timeout` sets a time limit for all consumer group members to complete their work and commit offsets after a rebalance has begun. The timeout excludes the time taken to detect a failed or late heartbeat, which indicates a rebalance is required. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `45s` ### [](#regexp_topics_exclude)`regexp_topics_exclude[]` A list of regular expression patterns for excluding topics when regex mode is enabled (using `regexp_topics_include` or the deprecated `regexp_topics` boolean). Topics matching any of these patterns will be excluded from consumption, even if they match include patterns. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so use `^` and `$` for exact matching. Exclude patterns are applied after include patterns, providing fine-grained control over topic selection. Example: `regexp_topics_exclude: ["^_", ".**-temp$", ".**-test.*"]` excludes topics starting with underscore, ending with `-temp`, or containing `-test`. **Type**: `array` ### [](#regexp_topics_include)`regexp_topics_include[]` A list of regular expression patterns for matching topics to consume from. When specified, the client will periodically refresh the list of matching topics based on the `metadata_max_age` interval. Each pattern is a full regular expression evaluated against the complete topic name. Patterns are not anchored by default, so `logs_.` **matches `my-logs_events` and `logs_errors`. Use `^logs_.`**`$` to match only topics starting with `logs_`. This field enables regex mode (replacing the deprecated `regexp_topics` boolean) and cannot be used together with explicit `topics` lists. Use `regexp_topics_exclude` to filter out specific patterns from the matched topics. Example: `regexp_topics_include: ["events_.**", "logs_.**"]` consumes from all topics starting with `events_` or `logs_`. **Type**: `array` ```yaml # Examples: regexp_topics_include: - logs_.* - metrics_.* # --- regexp_topics_include: - "events_[0-9]+" ``` ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods or mechanisms of SASL authentication. They are tried in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. Optional when `seed_brokers` is configured in a top-level `redpanda` block. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#session_timeout)`session_timeout` When you specify a `consumer_group`, `session_timeout` sets the maximum interval between heartbeats sent by a consumer group member to the broker. If a broker doesn’t receive a heartbeat from a group member before the timeout expires, it removes the member from the consumer group and initiates a rebalance. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1m` ### [](#start_offset)`start_offset` Specify the offset from which this input starts or restarts consuming messages. Restarts occur when the `OffsetOutOfRange` error is seen during a fetch. **Type**: `string` **Default**: `earliest` | Option | Summary | | --- | --- | | committed | Prevents consuming a partition in a group if the partition has no prior commits. Corresponds to Kafka’s auto.offset.reset=none option | | earliest | Start from the earliest offset. Corresponds to Kafka’s auto.offset.reset=earliest option. | | latest | Start from the latest offset. Corresponds to Kafka’s auto.offset.reset=latest option. | ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timely_nacks_maximum_wait)`timely_nacks_maximum_wait` EXPERIMENTAL: Specify a maximum period of time in which each message can be consumed and awaiting either acknowledgement or rejection before rejection is instead forced. This can be useful for avoiding situations where certain downstream components can result in blocked confirmation of delivery that exceeds SLAs. Accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic_lag_refresh_period)`topic_lag_refresh_period` The interval between refresh cycles. During each cycle, this input queries the Redpanda Connect server to calculate the topic lag minus the number of produced messages that remain to be read from each topic/partition pair by the specified consumer group. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `5s` ### [](#topics)`topics[]` A list of topics to consume from. Use commas to separate multiple topics in a single element. When a `consumer_group` is specified, partitions are automatically distributed across consumers of a topic. Otherwise, all partitions are consumed. Alternatively, you can specify explicit partitions to consume by using a colon after the topic name. For example, `foo:0` would consume the partition `0` of the topic foo. This syntax supports ranges. For example, `foo:0-10` would consume partitions `0` through to `10` inclusive. It is also possible to specify an explicit offset to consume from by adding another colon after the partition. For example, `foo:0:10` would consume the partition `0` of the topic `foo` starting from the offset `10`. If the offset is not present (or remains unspecified) then the field `start_offset` determines which offset to start from. **Type**: `array` ```yaml # Examples: topics: - foo - bar # --- topics: - things.* # --- topics: - "foo,bar" # --- topics: - "foo:0" - "bar:1" - "bar:3" # --- topics: - "foo:0,bar:1,bar:3" # --- topics: - "foo:0-5" ``` ### [](#transaction_isolation_level)`transaction_isolation_level` The isolation level for handling transactional messages. This setting determines how transactions are processed and affects data consistency guarantees. **Type**: `string` **Default**: `read_uncommitted` | Option | Summary | | --- | --- | | read_committed | If set, only committed transactional records are processed. | | read_uncommitted | If set, then uncommitted records are processed. | ### [](#unordered_processing)`unordered_processing` Allows consumers to process messages of any given partition in parallel, which may result in unordered processing. This option enables asynchronous publishing at the output level. The maximum parallelization of each partition is determined by the `checkpoint_limit` field. **Type**: `object` ### [](#unordered_processing-batching)`unordered_processing.batching` Allows you to configure a [batching policy](../../../configuration/batching/) that applies to individual topic partitions in order to batch messages together before flushing them for processing. Batching can be beneficial for performance and useful for windowed processing, and doing so preserves the ordering of topic partitions. **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#unordered_processing-batching-byte_size)`unordered_processing.batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#unordered_processing-batching-check)`unordered_processing.batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#unordered_processing-batching-count)`unordered_processing.batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#unordered_processing-batching-period)`unordered_processing.batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#unordered_processing-batching-processors)`unordered_processing.batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#unordered_processing-checkpoint_limit)`unordered_processing.checkpoint_limit` Determines how many messages of the same partition can be processed in parallel before applying back pressure. When a message of a given offset is delivered to the output the offset is only allowed to be committed when all messages of prior offsets have also been delivered, this ensures at-least-once delivery guarantees. However, this mechanism also increases the likelihood of duplicates in the event of crashes or server faults, reducing the checkpoint limit will mitigate this. **Type**: `int` **Default**: `1024` ### [](#unordered_processing-enabled)`unordered_processing.enabled` Whether to enable the unordered processing of messages from a given partition. **Type**: `bool` **Default**: `false` --- # Page 125: resource **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/resource.md --- # resource --- title: resource latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/resource page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/resource.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/resource.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/resource/)[Output](/redpanda-cloud/develop/connect/components/outputs/resource/)[Processor](/redpanda-cloud/develop/connect/components/processors/resource/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/resource/ "View the Self-Managed version of this component") Resource is an input type that channels messages from a resource input, identified by its name. ```yml inputs: label: "" resource: "" ``` Resources allow you to tidy up deeply nested configs. For example, the config: ```yaml input: broker: inputs: - kafka: addresses: [ TODO ] topics: [ foo ] consumer_group: foogroup - gcp_pubsub: project: bar subscription: baz ``` Could also be expressed as: ```yaml input: broker: inputs: - resource: foo - resource: bar input_resources: - label: foo kafka: addresses: [ TODO ] topics: [ foo ] consumer_group: foogroup - label: bar gcp_pubsub: project: bar subscription: baz ``` Resources also allow you to reference a single input in multiple places, such as multiple streams mode configs, or multiple entries in a broker input. However, when a resource is referenced more than once the messages it produces are distributed across those references, so each message will only be directed to a single reference, not all of them. --- # Page 126: schema_registry **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/schema_registry.md --- # schema\_registry --- title: schema_registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/schema_registry page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/schema_registry.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/schema_registry.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/schema_registry/)[Output](/redpanda-cloud/develop/connect/components/outputs/schema_registry/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/schema_registry/ "View the Self-Managed version of this component") Reads schemas from a schema registry. You can use this connector to extract and back up schemas during a data migration. This input uses the [Franz Kafka Schema Registry client](https://github.com/twmb/franz-go/tree/master/pkg/sr). #### Common ```yml inputs: label: "" schema_registry: url: "" # No default (required) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" schema_registry: url: "" # No default (required) include_deleted: false subject_filter: "" fetch_in_order: true tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] auto_replay_nacks: true oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} ``` ## [](#metadata)Metadata The `schema_registry` input adds the following metadata fields to each message: ```text - schema_registry_subject - schema_registry_version ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#example)Example This example reads all schemas from a schema registry that are associated with subjects matching the `^foo.*` filter, including deleted schemas. ```yaml input: schema_registry: url: http://localhost:8081 include_deleted: true subject_filter: ^foo.* ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#basic_auth)`basic_auth` Configure basic authentication for requests from this component to your schema registry. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` The username of the account credentials to authenticate as. Used together with `password` for basic authentication. **Type**: `string` **Default**: `""` ### [](#fetch_in_order)`fetch_in_order` Indicate whether to fetch all schemas from the schema registry service and sort them by ID. Set this value to `true` if you use schemas that refer to other schemas (schema references). **Type**: `bool` **Default**: `true` ### [](#include_deleted)`include_deleted` Include deleted entities. **Type**: `bool` **Default**: `false` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication for secure data transmission from your schema registry to this component. This feature is in beta and may change in future releases. **Type**: `object` ### [](#jwt-claims)`jwt.claims` Values used to pass the identity of the authenticated entity to the service provider. In this case, between this component and the schema registry. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` The key/value pairs that identify the type of token and signing algorithm. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` A PEM-encoded file containing a private key that is formatted using either PKCS1 or PKCS8 standards. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The method used to sign the token, such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#oauth)`oauth` Configure OAuth version 1.0 to give this component authorized access to your schema registry. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` The value this component can use to gain access to the data in the schema registry. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` The value used to identify this component or client to your schema registry. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#subject_filter)`subject_filter` Include only subjects which match the regular expression filter, or leave this field value blank to select all subjects. **Type**: `string` **Default**: `""` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#url)`url` The base URL of the schema registry service. **Type**: `string` --- # Page 127: sequence **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/sequence.md --- # sequence --- title: sequence latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/sequence page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/sequence.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/sequence.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/sequence/ "View the Self-Managed version of this component") Reads messages from a sequence of child inputs, starting with the first and once that input gracefully terminates starts consuming from the next, and so on. #### Common ```yml inputs: label: "" sequence: inputs: [] # No default (required) ``` #### Advanced ```yml inputs: label: "" sequence: sharded_join: type: none id_path: "" iterations: 1 merge_strategy: array inputs: [] # No default (required) ``` This input is useful for consuming from inputs that have an explicit end but must not be consumed in parallel. ## [](#examples)Examples ### [](#end-of-stream-message)End of Stream Message A common use case for sequence might be to generate a message at the end of our main input. With the following config once the records within `./dataset.csv` are exhausted our final payload `{"status":"finished"}` will be routed through the pipeline. ```yaml input: sequence: inputs: - file: paths: [ ./dataset.csv ] scanner: csv: {} - generate: count: 1 mapping: 'root = {"status":"finished"}' ``` ### [](#joining-data-simple)Joining Data (Simple) Redpanda Connect can be used to join unordered data from fragmented datasets in memory by specifying a common identifier field and a number of sharded iterations. For example, given two CSV files, the first called "main.csv", which contains rows of user data: ```csv uuid,name,age AAA,Melanie,34 BBB,Emma,28 CCC,Geri,45 ``` And the second called "hobbies.csv" that, for each user, contains zero or more rows of hobbies: ```csv uuid,hobby CCC,pokemon go AAA,rowing AAA,golf ``` We can parse and join this data into a single dataset: ```json {"uuid":"AAA","name":"Melanie","age":34,"hobbies":["rowing","golf"]} {"uuid":"BBB","name":"Emma","age":28} {"uuid":"CCC","name":"Geri","age":45,"hobbies":["pokemon go"]} ``` With the following config: ```yaml input: sequence: sharded_join: type: full-outer id_path: uuid merge_strategy: array inputs: - file: paths: - ./hobbies.csv - ./main.csv scanner: csv: {} ``` ### [](#joining-data-advanced)Joining Data (Advanced) In this example we are able to join unordered and fragmented data from a combination of CSV files and newline-delimited JSON documents by specifying multiple sequence inputs with their own processors for extracting the structured data. The first file "main.csv" contains straight forward CSV data: ```csv uuid,name,age AAA,Melanie,34 BBB,Emma,28 CCC,Geri,45 ``` And the second file called "hobbies.ndjson" contains JSON documents, one per line, that associate an identifier with an array of hobbies. However, these data objects are in a nested format: ```json {"document":{"uuid":"CCC","hobbies":[{"type":"pokemon go"}]}} {"document":{"uuid":"AAA","hobbies":[{"type":"rowing"},{"type":"golf"}]}} ``` And so we will want to map these into a flattened structure before the join, and then we will end up with a single dataset that looks like this: ```json {"uuid":"AAA","name":"Melanie","age":34,"hobbies":["rowing","golf"]} {"uuid":"BBB","name":"Emma","age":28} {"uuid":"CCC","name":"Geri","age":45,"hobbies":["pokemon go"]} ``` With the following config: ```yaml input: sequence: sharded_join: type: full-outer id_path: uuid iterations: 10 merge_strategy: array inputs: - file: paths: [ ./main.csv ] scanner: csv: {} - file: paths: [ ./hobbies.ndjson ] scanner: lines: {} processors: - mapping: | root.uuid = this.document.uuid root.hobbies = this.document.hobbies.map_each(this.type) ``` ## [](#fields)Fields ### [](#inputs)`inputs[]` An array of inputs to read from sequentially. **Type**: `input` ### [](#sharded_join)`sharded_join` EXPERIMENTAL: Provides a way to perform outer joins of arbitrarily structured and unordered data resulting from the input sequence, even when the overall size of the data surpasses the memory available on the machine. When configured the sequence of inputs will be consumed one or more times according to the number of iterations, and when more than one iteration is specified each iteration will process an entirely different set of messages by sharding them by the ID field. Increasing the number of iterations reduces the memory consumption at the cost of needing to fully parse the data each time. Each message must be structured (JSON or otherwise processed into a structured form) and the fields will be aggregated with those of other messages sharing the ID. At the end of each iteration the joined messages are flushed downstream before the next iteration begins, hence keeping memory usage limited. **Type**: `object` ### [](#sharded_join-id_path)`sharded_join.id_path` A [dot path](../../../configuration/field_paths/) that points to a common field within messages of each fragmented data set and can be used to join them. Messages that are not structured or are missing this field will be dropped. This field must be set in order to enable joins. **Type**: `string` **Default**: `""` ### [](#sharded_join-iterations)`sharded_join.iterations` The total number of iterations (shards), increasing this number will increase the overall time taken to process the data, but reduces the memory used in the process. The real memory usage required is significantly higher than the real size of the data and therefore the number of iterations should be at least an order of magnitude higher than the available memory divided by the overall size of the dataset. **Type**: `int` **Default**: `1` ### [](#sharded_join-merge_strategy)`sharded_join.merge_strategy` The chosen strategy to use when a data join would otherwise result in a collision of field values. The strategy `array` means non-array colliding values are placed into an array and colliding arrays are merged. The strategy `replace` replaces old values with new values. The strategy `keep` keeps the old value. **Type**: `string` **Default**: `array` **Options**: `array`, `replace`, `keep` ### [](#sharded_join-type)`sharded_join.type` The type of join to perform. A `full-outer` ensures that all identifiers seen in any of the input sequences are sent, and is performed by consuming all input sequences before flushing the joined results. An `outer` join consumes all input sequences but only writes data joined from the last input in the sequence, similar to a left or right outer join. With an `outer` join if an identifier appears multiple times within the final sequence input it will be flushed each time it appears. `full-outter` and `outter` have been deprecated in favour of `full-outer` and `outer`. **Type**: `string` **Default**: `none` **Options**: `none`, `full-outer`, `outer`, `full-outter`, `outter` --- # Page 128: sftp **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/sftp.md --- # sftp --- title: sftp latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/sftp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/sftp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/sftp.adoc categories: "[\"Network\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/sftp/)[Output](/redpanda-cloud/develop/connect/components/outputs/sftp/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/sftp/ "View the Self-Managed version of this component") Consumes files from an SFTP server. #### Common ```yml inputs: label: "" sftp: address: "" # No default (required) credentials: username: "" password: "" host_public_key_file: "" # No default (optional) host_public_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key: "" # No default (optional) private_key_pass: "" paths: [] # No default (required) auto_replay_nacks: true scanner: to_the_end: {} watcher: enabled: false minimum_age: 1s poll_interval: 1s cache: "" ``` #### Advanced ```yml inputs: label: "" sftp: address: "" # No default (required) connection_timeout: 30s credentials: username: "" password: "" host_public_key_file: "" # No default (optional) host_public_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key: "" # No default (optional) private_key_pass: "" max_sftp_sessions: 10 paths: [] # No default (required) auto_replay_nacks: true scanner: to_the_end: {} delete_on_finish: false watcher: enabled: false minimum_age: 1s poll_interval: 1s cache: "" ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: - sftp\_path You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#address)`address` The address (hostname or IP address) of the SFTP server to connect to. **Type**: `string` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#connection_timeout)`connection_timeout` The connection timeout to use when connecting to the target server. **Type**: `string` **Default**: `30s` ### [](#credentials)`credentials` The credentials required to log in to the SFTP server. This can include a username and password, or a private key for secure access. **Type**: `object` ### [](#credentials-host_public_key)`credentials.host_public_key` The raw contents of the SFTP server’s public key, used for host key verification. **Type**: `string` ### [](#credentials-host_public_key_file)`credentials.host_public_key_file` The path to the SFTP server’s public key file, used for host key verification. **Type**: `string` ### [](#credentials-password)`credentials.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#credentials-private_key)`credentials.private_key` The private key used to authenticate with the SFTP server. This field provides an alternative to the [`private_key_file`](#credentials-private_key_file). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-private_key_file)`credentials.private_key_file` The path to a private key file used to authenticate with the SFTP server. You can also provide a private key using the [`private_key`](#credentials-private_key) field. **Type**: `string` ### [](#credentials-private_key_pass)`credentials.private_key_pass` A passphrase for the private key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#credentials-username)`credentials.username` The username required to authenticate with the SFTP server. **Type**: `string` **Default**: `""` ### [](#delete_on_finish)`delete_on_finish` Whether to delete files from the server once they are processed. **Type**: `bool` **Default**: `false` ### [](#max_sftp_sessions)`max_sftp_sessions` The maximum number of SFTP sessions. **Type**: `int` **Default**: `10` ### [](#paths)`paths[]` A list of paths to consume sequentially. Glob patterns are supported. **Type**: `array` ### [](#scanner)`scanner` The [scanner](../../scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` ### [](#watcher)`watcher` An experimental mode whereby the input will periodically scan the target paths for new files and consume them, when all files are consumed the input will continue polling for new files. **Type**: `object` ### [](#watcher-cache)`watcher.cache` A [cache resource](../../caches/about/) for storing the paths of files already consumed. **Type**: `string` **Default**: `""` ### [](#watcher-enabled)`watcher.enabled` Whether file watching is enabled. **Type**: `bool` **Default**: `false` ### [](#watcher-minimum_age)`watcher.minimum_age` The minimum period of time since a file was last updated before attempting to consume it. Increasing this period decreases the likelihood that a file will be consumed whilst it is still being written to. **Type**: `string` **Default**: `1s` ```yaml # Examples: minimum_age: 10s # --- minimum_age: 1m # --- minimum_age: 10m ``` ### [](#watcher-poll_interval)`watcher.poll_interval` The interval between each attempt to scan the target paths for new files. **Type**: `string` **Default**: `1s` ```yaml # Examples: poll_interval: 100ms # --- poll_interval: 1s ``` --- # Page 129: slack_users **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/slack_users.md --- # slack\_users --- title: slack_users latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/slack_users page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/slack_users.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/slack_users.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/slack_users/ "View the Self-Managed version of this component") Returns [the full profile](https://api.slack.com/methods/users.list#examples) of all users in your Slack organization using the API method [users.list](https://api.slack.com/methods/users.list). Optionally, you can filter the list of returned users by team ID. This input is useful when you need to: - Join user information to Slack posts. - Ingest user information into a data lakehouse to create joins with other fields. ```yml inputs: label: "" slack_users: bot_token: "" # No default (required) team_id: "" auto_replay_nacks: true ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#bot_token)`bot_token` Your [Slack bot user’s OAuth token](https://api.slack.com/concepts/token-types), which must have the [`users.read` scope](https://api.slack.com/scopes/users:read) to access your Slack organization. **Type**: `string` ### [](#team_id)`team_id` The encoded ID of a Slack team by which to filter the list of returned users, which you can get from the [`team.info` Slack API method](https://api.slack.com/methods/team.info). If `team_id` is left empty, users from all teams within the organization are returned. **Type**: `string` **Default**: `""` --- # Page 130: slack **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/slack.md --- # slack --- title: slack latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/slack page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/slack.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/slack.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/slack/ "View the Self-Managed version of this component") Connects to Slack using [Socket Mode](https://api.slack.com/apis/socket-mode), and can receive events, interactions (automated and user-initiated), and slash commands. This input is useful for: - Building bots that can query or write data. - Sending events to data warehouses. You could also try pairing this input with Redpanda Connect’s AI processors, which use the prefixes `cohere`, `openai`, and `ollama`. ```yml inputs: label: "" slack: app_token: "" # No default (required) bot_token: "" # No default (required) auto_replay_nacks: true ``` See also: [Examples](#examples) ## [](#metadata)Metadata Each message emitted from this input has an `@type` metadata flag to indicate the event type, either `"events_api"`, `"interactions"`, or `"slash_commands"`. ## [](#fields)Fields ### [](#app_token)`app_token` The app-level token to use to authenticate and connect to Slack. **Type**: `string` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether to automatically replay messages that are rejected (nacked) at the output level. If the cause of rejections is persistent, leaving this option enabled can result in back pressure. Set `auto_replay_nacks` to `false` to delete rejected messages. Disabling auto replays can greatly improve memory efficiency of high throughput streams, as the original shape of the data is discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#bot_token)`bot_token` Your Slack bot user’s OAuth token, which must have the [`connections.write` scope](https://api.slack.com/scopes/connections:write) to access your Slack app’s [Socket Mode WebSocket URL](https://api.slack.com/methods/apps.connections.open). **Type**: `string` ## [](#examples)Examples ### [](#echo-slackbot)Echo Slackbot A slackbot that echo messages from other users ```yaml input: slack: app_token: "${APP_TOKEN:xapp-demo}" bot_token: "${BOT_TOKEN:xoxb-demo}" pipeline: processors: - mutation: | # ignore hidden or non message events if this.event.type != "message" || (this.event.hidden | false) { root = deleted() } # Don't respond to our own messages if this.authorizations.any(auth -> auth.user_id == this.event.user) { root = deleted() } output: slack_post: bot_token: "${BOT_TOKEN:xoxb-demo}" channel_id: "${!this.event.channel}" thread_ts: "${!this.event.ts}" text: "ECHO: ${!this.event.text}" ``` --- # Page 131: spicedb_watch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/spicedb_watch.md --- # spicedb\_watch --- title: spicedb_watch page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/spicedb_watch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/spicedb_watch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/spicedb_watch.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-11-19" page-git-modified-date: "2024-11-19" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/spicedb_watch/ "View the Self-Managed version of this component") Consumes messages from the [Watch API](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.WatchService.Watch) of a [SpiceDB](https://authzed.com/docs/spicedb/getting-started/discovering-spicedb) instance. This input is useful if you have downstream applications that need to react to real-time changes in data managed by SpiceDB. #### Common ```yml inputs: label: "" spicedb_watch: endpoint: "" # No default (required) bearer_token: "" cache: "" # No default (required) ``` #### Advanced ```yml inputs: label: "" spicedb_watch: endpoint: "" # No default (required) bearer_token: "" max_receive_message_bytes: 4MB cache: "" # No default (required) cache_key: authzed.com/spicedb/watch/last_zed_token tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] ``` ## [](#authentication)Authentication For this input to authenticate with your SpiceDB instance, you must provide: - The [`endpoint`](#endpoint) of the SpiceDB instance - A [bearer token](#bearer_token) ## [](#configure-a-cache)Configure a cache You must use a cache resource to store the [ZedToken](https://authzed.com/docs/spicedb/concepts/consistency#zedtokens) (ID) of the latest message consumed and acknowledged by this input. Ideally, the cache should persist across restarts. This means that every time the input is initialized, it starts reading from the newest data updates. The following example uses a [`redis` cache](../../rate_limits/redis/). ```yml # Example input: label: "" spicedb_watch: endpoint: grpc.authzed.com:443 bearer_token: "" cache: "spicedb_cache" cache_resources: - label: "spicedb_cache" redis: url: redis://:6379 ``` To learn more about cache configuration, see the [Caches section](../../caches/about/), which includes a range of cache components. ## [](#fields)Fields ### [](#bearer_token)`bearer_token` The SpiceDB bearer token to use to authenticate with your SpiceDB instance. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: bearer_token: t_your_token_here_1234567deadbeef ``` ### [](#cache)`cache` The [cache resource](#configure-a-cache) that you must configure to store the ZedToken (ID) of the last message processed. The ZedToken is stored in the cache within the `ACK` function of the message. This means that a ZedToken is only stored when a message is successfully routed through all processors and outputs in the data pipeline. **Type**: `string` ### [](#cache_key)`cache_key` The key identifier to use when storing the ZedToken (ID) of the last message received. **Type**: `string` **Default**: `authzed.com/spicedb/watch/last_zed_token` ### [](#endpoint)`endpoint` The endpoint of your SpiceDB instance. **Type**: `string` ```yaml # Examples: endpoint: grpc.authzed.com:443 ``` ### [](#max_receive_message_bytes)`max_receive_message_bytes` The maximum message size (in bytes) this input can receive. If a message exceeds this limit, an `rpc error` is written to the Redpanda Connect logs. **Type**: `string` **Default**: `4MB` ```yaml # Examples: max_receive_message_bytes: 100MB # --- max_receive_message_bytes: 50mib ``` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` --- # Page 132: splunk **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/splunk.md --- # splunk --- title: splunk latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/splunk page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/splunk.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/splunk.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/splunk/ "View the Self-Managed version of this component") Consumes messages from Splunk. #### Common ```yml inputs: label: "" splunk: url: "" # No default (required) user: "" # No default (required) password: "" # No default (required) query: "" # No default (required) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" splunk: url: "" # No default (required) user: "" # No default (required) password: "" # No default (required) query: "" # No default (required) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] auto_replay_nacks: true ``` ## [](#fields)Fields ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#password)`password` Splunk account password. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#query)`query` Splunk search query. **Type**: `string` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` Full HTTP Search API endpoint URL. **Type**: `string` ```yaml # Examples: url: https://foobar.splunkcloud.com/services/search/v2/jobs/export ``` ### [](#user)`user` Splunk account user. **Type**: `string` --- # Page 133: sql_raw **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/sql_raw.md --- # sql\_raw --- title: sql_raw latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/sql_raw page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/sql_raw.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/sql_raw.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/sql_raw/)[Output](/redpanda-cloud/develop/connect/components/outputs/sql_raw/)[Processor](/redpanda-cloud/develop/connect/components/processors/sql_raw/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/sql_raw/ "View the Self-Managed version of this component") Executes a select query and creates a message for each row received. #### Common ```yml inputs: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (required) args_mapping: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (required) args_mapping: "" # No default (optional) auto_replay_nacks: true init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` When the rows from the query are exhausted, this input shuts down, allowing the pipeline to gracefully terminate or for the next input in a [sequence](../sequence/) to execute. ## [](#examples)Examples ### [](#consumes-an-sql-table-using-a-query-as-an-input)Consumes an SQL table using a query as an input. Here we perform an aggregate over a list of names in a table that are less than 3600 seconds old. ```yaml input: sql_raw: driver: postgres dsn: postgres://foouser:foopass@localhost:5432/testdb?sslmode=disable query: "SELECT name, count(*) FROM person WHERE last_updated < $1 GROUP BY name;" args_mapping: | root = [ now().ts_unix() - 3600 ] ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) that includes the same number of values in an array as the placeholder arguments in the [`query`](#query) field. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#query)`query` The query to execute. The style of placeholder to use depends on the driver, some drivers require question marks (`?`) whereas others expect incrementing dollar signs (`$1`, `$2`, and so on) or colons (`:1`, `:2` and so on). The style to use is outlined in this table: | Driver | Placeholder Style | | --- | --- | | clickhouse | Dollar sign ($) | | gocosmos | Colon (:) | | mysql | Question mark (?) | | mssql | Question mark (?) | | oracle | Colon (:) | | postgres | Dollar sign ($) | | snowflake | Question mark (?) | | spanner | Question mark (?) | | sqlite | Question mark (?) | | trino | Question mark (?) | **Type**: `string` ```yaml # Examples: query: SELECT * FROM footable WHERE user_id = $1; ``` --- # Page 134: sql_select **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/sql_select.md --- # sql\_select --- title: sql_select latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/sql_select page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/sql_select.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/sql_select.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/sql_select/)[Processor](/redpanda-cloud/develop/connect/components/processors/sql_select/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/sql_select/ "View the Self-Managed version of this component") Executes a select query and creates a message for each row received. #### Common ```yml inputs: label: "" sql_select: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) where: "" # No default (optional) args_mapping: "" # No default (optional) auto_replay_nacks: true ``` #### Advanced ```yml inputs: label: "" sql_select: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) where: "" # No default (optional) args_mapping: "" # No default (optional) prefix: "" # No default (optional) suffix: "" # No default (optional) auto_replay_nacks: true init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` Once the rows from the query are exhausted this input shuts down, allowing the pipeline to gracefully terminate (or the next input in a [sequence](../sequence/) to execute). ## [](#examples)Examples ### [](#consume-a-table-postgresql)Consume a Table (PostgreSQL) Here we define a pipeline that will consume all rows from a table created within the last hour by comparing the unix timestamp stored in the row column "created\_at": ```yaml input: sql_select: driver: postgres dsn: postgres://foouser:foopass@localhost:5432/testdb?sslmode=disable table: footable columns: [ '*' ] where: created_at >= ? args_mapping: | root = [ now().ts_unix() - 3600 ] ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `where`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ "article", now().ts_format("2006-01-02") ] ``` ### [](#auto_replay_nacks)`auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. **Type**: `bool` **Default**: `true` ### [](#columns)`columns[]` A list of columns to select. **Type**: `array` ```yaml # Examples: columns: - "*" # --- columns: - foo - bar - baz ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#prefix)`prefix` An optional prefix to prepend to the select query (before SELECT). **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the select query. **Type**: `string` ### [](#table)`table` The table to select from. **Type**: `string` ```yaml # Examples: table: foo ``` ### [](#where)`where` An optional where clause to add. Placeholder arguments are populated with the `args_mapping` field. Placeholders should always be question marks, and will automatically be converted to dollar syntax when the postgres or clickhouse drivers are used. **Type**: `string` ```yaml # Examples: where: type = ? and created_at > ? # --- where: user_id = ? ``` --- # Page 135: timeplus **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/inputs/timeplus.md --- # timeplus --- title: timeplus latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/inputs/timeplus page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/inputs/timeplus.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/timeplus.adoc categories: "[\"Services\"]" page-git-created-date: "2024-11-19" page-git-modified-date: "2024-11-19" --- **Type:** Input ▼ [Input](/redpanda-cloud/develop/connect/components/inputs/timeplus/)[Output](/redpanda-cloud/develop/connect/components/outputs/timeplus/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/inputs/timeplus/ "View the Self-Managed version of this component") Executes a streaming or table query on [Timeplus Enterprise (Cloud or Self-Hosted)](https://docs.timeplus.com/) or the `timeplusd` component, and creates a structured message for each table row received. If you execute a streaming query, this input runs until the query terminates. For table queries, it shuts down after all rows returned by the query are exhausted. ```yml inputs: label: "" timeplus: query: "" # No default (required) url: tcp://localhost:8463 workspace: "" # No default (optional) apikey: "" # No default (optional) username: "" # No default (optional) password: "" # No default (optional) ``` ## [](#examples)Examples ### [](#from-timeplus-enterprise-cloud-via-http)From Timeplus Enterprise Cloud via HTTP You will need to create API Key on Timeplus Enterprise Cloud Web console first and then set the `apikey` field. ```yaml input: timeplus: url: https://us-west-2.timeplus.cloud workspace: my_workspace_id query: select * from iot apikey: ``` ### [](#from-timeplus-enterprise-self-hosted-via-http)From Timeplus Enterprise (self-hosted) via HTTP For self-hosted Timeplus Enterprise, you will need to specify the username and password as well as the URL of the App server ```yaml input: timeplus: url: http://localhost:8000 workspace: my_workspace_id query: select * from iot username: username password: pw ``` ### [](#from-timeplus-enterprise-self-hosted-via-tcp)From Timeplus Enterprise (self-hosted) via TCP Make sure the the schema of url is tcp ```yaml input: timeplus: url: tcp://localhost:8463 query: select * from iot username: timeplus password: timeplus ``` ## [](#fields)Fields ### [](#apikey)`apikey` The API key for the Timeplus Enterprise REST API. You need to generate the key in the web console of Timeplus Enterprise (Cloud). This field is required if you are reading messages from Timeplus Enterprise (Cloud). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#password)`password` The password for the Timeplus application server. This field is required if you are reading messages from Timeplus Enterprise (Self-Hosted) or `timeplusd`. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#query)`query` The query to execute on Timeplus Enterprise (Cloud or Self-Hosted) or `timeplusd`. **Type**: `string` ```yaml # Examples: query: select * from iot # --- query: select count(*) from table(iot) ``` ### [](#url)`url` The URL of your Timeplus instance, which should always include the schema and host. **Type**: `string` **Default**: `tcp://localhost:8463` ### [](#username)`username` The username for the Timeplus application server. This field is required if you are reading messages from Timeplus Enterprise (Self-Hosted) or `timeplusd`. **Type**: `string` ### [](#workspace)`workspace` The ID of the workspace you want to read messages from. This field is required if you are connecting to Timeplus Enterprise (Cloud or Self-Hosted) using HTTP. **Type**: `string` --- # Page 136: Logger **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/logger/about.md --- # Logger --- title: Logger latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/logger/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/logger/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/logger/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect logging prints to stdout (or stderr if your output is stdout) and is formatted as [logfmt](https://brandur.org/logfmt) by default. Use these configuration options to change both the logging formats as well as the destination of logs. #### Common ```yaml # Common config fields, showing default values logger: level: INFO format: logfmt add_timestamp: false static_fields: '@service': redpanda-connect ``` #### Advanced ```yaml # All config fields, showing default values logger: level: INFO format: logfmt add_timestamp: false level_name: level timestamp_name: time message_name: msg static_fields: '@service': redpanda-connect file: path: "" rotate: false rotate_max_age_days: 0 ``` ## [](#fields)Fields The schema of the `logger` section is as follows: ### [](#level)`level` Set the minimum severity level for emitting logs. **Type**: `string` **Default**: `"INFO"` Options: `OFF` , `FATAL` , `ERROR` , `WARN` , `INFO` , `DEBUG` , `TRACE` , `ALL` , `NONE` ### [](#format)`format` Set the format of emitted logs. **Type**: `string` **Default**: `"logfmt"` Options: `json` , `logfmt` ### [](#add_timestamp)`add_timestamp` Whether to include timestamps in logs. **Type**: `bool` **Default**: `false` ### [](#level_name)`level_name` The name of the level field added to logs when the `format` is `json`. **Type**: `string` **Default**: `"level"` ### [](#timestamp_name)`timestamp_name` The name of the timestamp field added to logs when `add_timestamp` is set to `true` and the `format` is `json`. **Type**: `string` **Default**: `"time"` ### [](#message_name)`message_name` The name of the message field added to logs when the `format` is `json`. **Type**: `string` **Default**: `"msg"` ### [](#static_fields)`static_fields` A map of key/value pairs to add to each structured log. **Type**: `object` **Default**: `{"@service":"redpanda-connect"}` ### [](#file)`file` Experimental: Specify fields for optionally writing logs to a file. **Type**: `object` ### [](#file-path)`file.path` The file path to write logs to, if the file does not exist it will be created. Leave this field empty or unset to disable file based logging. **Type**: `string` **Default**: `""` ### [](#file-rotate)`file.rotate` Whether to rotate log files automatically. **Type**: `bool` **Default**: `false` ### [](#file-rotate_max_age_days)`file.rotate_max_age_days` The maximum number of days to retain old log files based on the timestamp encoded in their filename, after which they are deleted. Setting to zero disables this mechanism. **Type**: `int` **Default**: `0` --- # Page 137: Metrics **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/metrics/about.md --- # Metrics --- title: Metrics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/metrics/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/metrics/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/metrics/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect emits lots of metrics in order to expose how components configured within your pipeline are behaving. You can configure exactly where these metrics end up with the config field `metrics`, which describes a metrics format and destination. For example, if you wished to push them via the StatsD protocol you could use this configuration: ```yaml metrics: statsd: address: localhost:8125 flush_period: 100ms ``` Redpanda Connect automatically [exports detailed metrics](../../../configuration/monitor-connect/) for each component of your data pipeline to a Prometheus endpoint. ## [](#timings)Timings It’s worth noting that timing metrics within Redpanda Connect are measured in nanoseconds and are therefore named with a `_ns` suffix. However, some exporters do not support this level of precision and are downgraded, or have the unit converted for convenience. In these cases the exporter documentation outlines the conversion and why it is made. ## [](#metric-names)Metric names Each major Redpanda Connect component type emits one or more metrics with the name prefixed by the type. These metrics are intended to provide an overview of behavior, performance and health. Some specific component implementations may provide their own unique metrics on top of these standardized ones, these extra metrics can be found listed on their respective documentation pages. ## [](#inputs)Inputs - `input_received`: A count of the number of messages received by the input. - `input_latency_ns`: Measures the roundtrip latency in nanoseconds from the point at which a message is read up to the moment the message has either been acknowledged by an output, has been stored within a buffer, or has been rejected (nacked). - `batch_created`: A count of each time an input-level batch has been created using a batching policy. Includes a label `mechanism` describing the particular mechanism that triggered it, one of; `count`, `size`, `period`, `check`. - `input_connection_up`: For continuous stream based inputs represents a count of the number of the times the input has successfully established a connection to the target source. For poll based inputs that do not retain an active connection this value will increment once. - `input_connection_failed`: For continuous stream based inputs represents a count of the number of times the input has failed to establish a connection to the target source. - `input_connection_lost`: For continuous stream based inputs represents a count of the number of times the input has lost a previously established connection to the target source. > ⚠️ **CAUTION** > > The behavior of connection metrics may differ based on input type due to certain libraries and protocols obfuscating the concept of a single connection. ### [](#buffers)Buffers - `buffer_received`: A count of the number of messages written to the buffer. - `buffer_batch_received`: A count of the number of message batches written to the buffer. - `buffer_sent`: A count of the number of messages read from the buffer. - `buffer_batch_sent`: A count of the number of message batches read from the buffer. - `buffer_latency_ns`: Measures the roundtrip latency in nanoseconds from the point at which a message is read from the buffer up to the moment it has been acknowledged by the output. - `batch_created`: A count of each time a buffer-level batch has been created using a batching policy. Includes a label `mechanism` describing the particular mechanism that triggered it, one of; `count`, `size`, `period`, `check`. ### [](#processors)Processors - `processor_received`: A count of the number of messages the processor has been executed upon. - `processor_batch_received`: A count of the number of message batches the processor has been executed upon. - `processor_sent`: A count of the number of messages the processor has returned. - `processor_batch_sent`: A count of the number of message batches the processor has returned. - `processor_error`: A count of the number of times the processor has errored. In cases where an error is batch-wide the count is incremented by one, and therefore would not match the number of messages. - `processor_latency_ns`: Latency of message processing in nanoseconds. When a processor acts upon a batch of messages this latency measures the time taken to process all messages of the batch. ### [](#outputs)Outputs - `output_sent`: A count of the number of messages sent by the output. - `output_batch_sent`: A count of the number of message batches sent by the output. - `output_error`: A count of the number of send attempts that have failed. On failed batched sends this count is incremented once only. - `output_latency_ns`: Latency of writes in nanoseconds. This metric may not be populated by outputs that are pull-based such as the `http_server`. - `batch_created`: A count of each time an output-level batch has been created using a batching policy. Includes a label `mechanism` describing the particular mechanism that triggered it, one of; `count`, `size`, `period`, `check`. - `output_connection_up`: For continuous stream based outputs represents a count of the number of the times the output has successfully established a connection to the target sink. For poll based outputs that do not retain an active connection this value will increment once. - `output_connection_failed`: For continuous stream based outputs represents a count of the number of times the output has failed to establish a connection to the target sink. - `output_connection_lost`: For continuous stream based outputs represents a count of the number of times the output has lost a previously established connection to the target sink. > ⚠️ **CAUTION** > > The behavior of connection metrics may differ based on output type due to certain libraries and protocols obfuscating the concept of a single connection. ### [](#caches)Caches All cache metrics have a label `operation` denoting the operation that triggered the metric series, one of; `add`, `get`, `set` or `delete`. - `cache_success`: A count of the number of successful cache operations. - `cache_error`: A count of the number of cache operations that resulted in an error. - `cache_latency_ns`: Latency of operations in nanoseconds. - `cache_not_found`: A count of the number of get operations that yielded no value due to the item not being found. This count is separate from `cache_error`. - `cache_duplicate`: A count of the number of add operations that were aborted due to the key already existing. This count is separate from `cache_error`. ### [](#rate-limits)Rate limits - `rate_limit_checked`: A count of the number of times the rate limit has been probed. - `rate_limit_triggered`: A count of the number of times the rate limit has been triggered by a probe. - `rate_limit_error`: A count of the number of times the rate limit has errored when probed. ## [](#metric-labels)Metric labels The standard metric names are unique to the component type, but a benthos config may consist of any number of component instantiations. In order to provide a metrics series that is unique for each instantiation Redpanda Connect adds labels (or tags) that uniquely identify the instantiation. These labels are as follows: ### [](#path)`path` The `path` label contains a string representation of the position of a component instantiation within a config in a format that would locate it within a Bloblang mapping, beginning at `root`. This path is a best attempt and may not exactly represent the source component position in all cases and is intended to be used for assisting observability only. This is the highest cardinality label since paths will change as configs are updated and expanded. It is therefore worth removing this label with a [mapping](#metric-mapping) in cases where you wish to restrict the number of unique metric series. ### [](#label)`label` The `label` label contains the unique label configured for a component emitting the metric series, or is empty for components that do not have a configured label. This is the most useful label for uniquely identifying a series for a component. ### [](#stream)`stream` The `stream` label is present in a metric series emitted from a stream config executed when Redpanda Connect is running in streams mode, and is populated with the stream name. ## [](#example)Example The following Redpanda Connect configuration: ```yaml input: label: foo http_server: {} pipeline: processors: - mapping: | root.message = this root.meta.link_count = this.links.length() root.user.age = this.user.age.number() output: label: bar stdout: {} metrics: prometheus: {} ``` Would produce the following metrics series: ```text input_latency_ns{label="foo",path="root.input"} input_received{endpoint="post",label="foo",path="root.input"} input_received{endpoint="websocket",label="foo",path="root.input"} processor_batch_received{label="",path="root.pipeline.processors.0"} processor_batch_sent{label="",path="root.pipeline.processors.0"} processor_error{label="",path="root.pipeline.processors.0"} processor_latency_ns{label="",path="root.pipeline.processors.0"} processor_received{label="",path="root.pipeline.processors.0"} processor_sent{label="",path="root.pipeline.processors.0"} output_batch_sent{label="bar",path="root.output"} output_connection_failed{label="bar",path="root.output"} output_connection_lost{label="bar",path="root.output"} output_connection_up{label="bar",path="root.output"} output_error{label="bar",path="root.output"} output_latency_ns{label="bar",path="root.output"} output_sent{label="bar",path="root.output"} ``` ## [](#metric-mapping)Metric mapping Since Redpanda Connect emits a large variety of metrics it is often useful to restrict or modify the metrics that are emitted. This can be done using the [Bloblang mapping language](../../../guides/bloblang/about/) in the field `metrics.mapping`. This is a mapping executed for each metric that is registered within the Redpanda Connect service and allows you to delete an entire series, modify the series name and delete or modify individual labels. Within the mapping the input document (referenced by the keyword `this`) is a string value containing the metric name, and the resulting document (referenced by the keyword `root`) must be a string value containing the resulting name. As is standard in Bloblang mappings, if the value of `root` is not assigned within the mapping then the metric name remains unchanged. If the value of `root` is `deleted()` then the metric series is dropped. Labels can be referenced as metadata values with the function `meta`, where if the label does not exist in the series being mapped the value `null` is returned. Labels can be changed by using meta assignments, and can be assigned `deleted()` in order to remove them. For example, the following mapping removes all but the `label` label entirely, which reduces the cardinality of each series. It also renames the `label` (for some reason) so that labels containing meows now contain woofs. Finally, the mapping restricts the metrics emitted to only three series; one for the input count, one for processor errors, and one for the output count, it does this by looking up metric names in a static array of allowed names, and if not present the `root` is assigned `deleted()`: ```yaml metrics: mapping: | # Delete all pre-existing labels meta = deleted() # Re-add the `label` label with meows replaced with woofs meta label = meta("label").replace("meow", "woof") # Delete all metric series that aren't in our list root = if ![ "input_received", "processor_error", "output_sent", ].contains(this) { deleted() } prometheus: use_histogram_timing: false ``` --- # Page 138: none **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/metrics/none.md --- # none --- title: none latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/metrics/none page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/metrics/none.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/metrics/none.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Buffer ▼ [Buffer](/redpanda-cloud/develop/connect/components/buffers/none/)[Metric](/redpanda-cloud/develop/connect/components/metrics/none/)[Tracer](/redpanda-cloud/develop/connect/components/tracers/none/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/buffers/none/ "View the Self-Managed version of this component") Disable metrics entirely. ```yml # Config fields, showing default values metrics: none: {} mapping: "" ``` --- # Page 139: prometheus **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/metrics/prometheus.md --- # prometheus --- title: prometheus latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/metrics/prometheus page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/metrics/prometheus.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/metrics/prometheus.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/metrics/prometheus/ "View the Self-Managed version of this component") Host endpoints (`/metrics` and `/stats`) for Prometheus scraping. #### Common ```yml metrics: prometheus: ``` #### Advanced ```yml metrics: prometheus: use_histogram_timing: false histogram_buckets: [] summary_quantiles_objectives: - error: 0.05 quantile: 0.5 - error: 0.01 quantile: 0.9 - error: 0.001 quantile: 0.99 add_process_metrics: false add_go_metrics: false push_url: "" # No default (optional) push_interval: "" # No default (optional) push_job_name: benthos_push push_basic_auth: username: "" password: "" file_output_path: "" ``` ## [](#fields)Fields ### [](#add_go_metrics)`add_go_metrics` Whether to export Go runtime metrics such as GC pauses in addition to Redpanda Connect metrics. **Type**: `bool` **Default**: `false` ### [](#add_process_metrics)`add_process_metrics` Whether to export process metrics such as CPU and memory usage in addition to Redpanda Connect metrics. **Type**: `bool` **Default**: `false` ### [](#file_output_path)`file_output_path` An optional file path to write all prometheus metrics on service shutdown. **Type**: `string` **Default**: `""` ### [](#histogram_buckets)`histogram_buckets[]` Timing metrics histogram buckets (in seconds). If left empty defaults to DefBuckets ([https://pkg.go.dev/github.com/prometheus/client\_golang/prometheus#pkg-variables](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables)). Applicable when `use_histogram_timing` is set to `true`. **Type**: `float` **Default**: `[]` ### [](#push_basic_auth)`push_basic_auth` The Basic Authentication credentials. **Type**: `object` ### [](#push_basic_auth-password)`push_basic_auth.password` The Basic Authentication password. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#push_basic_auth-username)`push_basic_auth.username` The Basic Authentication username. **Type**: `string` **Default**: `""` ### [](#push_interval)`push_interval` The period of time between each push when sending metrics to a Push Gateway. **Type**: `string` ### [](#push_job_name)`push_job_name` An identifier for push jobs. **Type**: `string` **Default**: `benthos_push` ### [](#push_url)`push_url` An optional [Push Gateway URL](#push-gateway) to push metrics to. **Type**: `string` ### [](#summary_quantiles_objectives)`summary_quantiles_objectives[]` A list of timing metrics summary buckets (as quantiles). Applicable when `use_histogram_timing` is set to `false`. **Type**: `object` **Default**: ```yaml - error: 0.05 quantile: 0.5 - error: 0.01 quantile: 0.9 - error: 0.001 quantile: 0.99 ``` ```yaml # Examples: summary_quantiles_objectives: - error: 0.05 quantile: 0.5 - error: 0.01 quantile: 0.9 - error: 0.001 quantile: 0.99 ``` ### [](#summary_quantiles_objectives-error)`summary_quantiles_objectives[].error` Permissible margin of error for quantile calculations. Precise calculations in a streaming context (without prior knowledge of the full dataset) can be resource-intensive. To balance accuracy with computational efficiency, an error margin is introduced. For instance, if the 90th quantile (`0.9`) is determined to be `100ms` with a 1% error margin (`0.01`), the true value will fall within the `[99ms, 101ms]` range.) **Type**: `float` **Default**: `0` ### [](#summary_quantiles_objectives-quantile)`summary_quantiles_objectives[].quantile` Quantile value. **Type**: `float` **Default**: `0` ### [](#use_histogram_timing)`use_histogram_timing` Whether to export timing metrics as a histogram, if `false` a summary is used instead. When exporting histogram timings the delta values are converted from nanoseconds into seconds in order to better fit within bucket definitions. For more information on histograms and summaries refer to: [https://prometheus.io/docs/practices/histograms/](https://prometheus.io/docs/practices/histograms/). **Type**: `bool` **Default**: `false` ## [](#push-gateway)Push gateway The field `push_url` is optional and when set will trigger a push of metrics to a [Prometheus Push Gateway](https://prometheus.io/docs/instrumenting/pushing/) once Redpanda Connect shuts down. It is also possible to specify a `push_interval` which results in periodic pushes. The Push Gateway is useful for when Redpanda Connect instances are short lived. Do not include the "/metrics/jobs/…​" path in the push URL. If the Push Gateway requires HTTP Basic Authentication it can be configured with `push_basic_auth`. --- # Page 140: Outputs **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/about.md --- # Outputs --- title: Outputs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- An output config section looks like this: ```yaml output: label: my_s3_output aws_s3: bucket: TODO path: '${! meta("kafka_topic") }/${! json("message.id") }.json' # Optional list of processing steps processors: - mapping: '{"message":this,"meta":{"link_count":this.links.length()}}' ``` ## [](#back-pressure)Back pressure Redpanda Connect outputs apply back pressure to components upstream. This means if your output target starts blocking traffic Redpanda Connect will gracefully stop consuming until the issue is resolved. ## [](#retries)Retries When a Redpanda Connect output fails to send a message the error is propagated back up to the input, where depending on the protocol it will either be pushed back to the source as a Noack (e.g. AMQP) or will be reattempted indefinitely with the commit withheld until success (e.g. Kafka). It’s possible to instead have Redpanda Connect indefinitely retry an output until success with a [`retry`](../retry/) output. Some other outputs, such as the [`broker`](../broker/), might also retry indefinitely depending on their configuration. ## [](#dead-letter-queues)Dead letter queues It’s possible to create fallback outputs for when an output target fails using a [`fallback`](../fallback/) output: ```yaml output: fallback: - aws_sqs: url: https://sqs.us-west-2.amazonaws.com/TODO/TODO max_in_flight: 20 - http_client: url: http://backup:1234/dlq verb: POST ``` ## [](#multiplexing-outputs)Multiplexing outputs There are a few different ways of multiplexing in Redpanda Connect, here’s a quick run through: ### [](#interpolation-multiplexing)Interpolation multiplexing Some output fields support [field interpolation](../../../configuration/interpolation/), which is a super easy way to multiplex messages based on their contents in situations where you are multiplexing to the same service. For example, multiplexing against Kafka topics is a common pattern: ```yaml output: kafka: addresses: [ TODO:6379 ] topic: ${! meta("target_topic") } ``` Refer to the field documentation for a given output to see if it support interpolation. ### [](#switch-multiplexing)Switch multiplexing A more advanced form of multiplexing is to route messages to different output configurations based on a query. This is easy with the [`switch` output](../switch/): ```yaml output: switch: cases: - check: this.type == "foo" output: amqp_1: urls: [ amqps://guest:guest@localhost:5672/ ] target_address: queue:/the_foos - check: this.type == "bar" output: gcp_pubsub: project: dealing_with_mike topic: mikes_bars - output: redis_streams: url: tcp://localhost:6379 stream: everything_else processors: - mapping: | root = this root.type = this.type.not_null() | "unknown" ``` ## [](#labels)Labels Outputs have an optional field `label` that can uniquely identify them in observability data such as metrics and logs. This can be useful when running configs with multiple outputs, otherwise their metrics labels will be generated based on their composition. For more information check out the [metrics documentation](../../metrics/about/). --- # Page 141: amqp_0_9 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/amqp_0_9.md --- # amqp\_0\_9 --- title: amqp_0_9 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/amqp_0_9 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/amqp_0_9.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/amqp_0_9.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/amqp_0_9/)[Input](/redpanda-cloud/develop/connect/components/inputs/amqp_0_9/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/amqp_0_9/ "View the Self-Managed version of this component") Sends messages to an AMQP (0.91) exchange. AMQP is a messaging protocol used by various message brokers, including RabbitMQ. #### Common ```yml outputs: label: "" amqp_0_9: urls: [] # No default (required) exchange: "" # No default (required) key: "" type: "" metadata: exclude_prefixes: [] max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" amqp_0_9: urls: [] # No default (required) exchange: "" # No default (required) exchange_declare: enabled: false type: direct durable: true arguments: "" # No default (optional) key: "" type: "" content_type: application/octet-stream content_encoding: "" correlation_id: "" reply_to: "" expiration: "" message_id: "" user_id: "" app_id: "" metadata: exclude_prefixes: [] priority: "" max_in_flight: 64 persistent: false mandatory: false immediate: false timeout: "" tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] ``` The metadata fields from each message are delivered as headers. TLS is automatically enabled when connecting to an `amqps` URL. However, you can customize [TLS settings](#tls) if required. You can use [function interpolations](../../../configuration/interpolation/#bloblang-queries) to dynamically set values for the following fields: `key`, `exchange`, and `type`. ## [](#fields)Fields ### [](#app_id)`app_id` Set an application ID for each message using a dynamic interpolated expression. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_encoding)`content_encoding` The content encoding attribute of each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_type)`content_type` The MIME type of each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `application/octet-stream` ### [](#correlation_id)`correlation_id` Set a unique correlation ID for each message using a dynamic interpolated expression to help match messages to responses. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#exchange)`exchange` The AMQP exchange to publish messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#exchange_declare)`exchange_declare` Passively declares the [target exchange](#exchange) to check whether an exchange with the specified name exists and is configured correctly. If the exchange exists, then the passive declaration verifies that fields specified in this object match its properties. If the target exchange does not exist, this output creates it. **Type**: `object` ### [](#exchange_declare-arguments)`exchange_declare.arguments` Arguments for server-specific implementations of the exchange (optional). You can use arguments to configure additional parameters for exchange types that require them. **Type**: `string` ```yaml # Examples: arguments: alternate-exchange: my-ae ``` ### [](#exchange_declare-durable)`exchange_declare.durable` Whether the declared exchange is durable. **Type**: `bool` **Default**: `true` ### [](#exchange_declare-enabled)`exchange_declare.enabled` Whether to enable exchange declaration. **Type**: `bool` **Default**: `false` ### [](#exchange_declare-type)`exchange_declare.type` The type of the exchange, which determines how messages are routed to queues. > 📝 **NOTE** > > Dots (`.`) in message keys are only enforced in routing keys and message types for `topic` exchanges. **Type**: `string` **Default**: `direct` **Options**: `direct`, `fanout`, `topic`, `headers`, `x-custom` ### [](#expiration)`expiration` Set the TTL of each message in milliseconds. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#immediate)`immediate` Whether to set the immediate flag on published messages. When set to `true`, if there are no active consumers for a queue, the message is dropped instead of waiting. **Type**: `bool` **Default**: `false` ### [](#key)`key` The binding key to set for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#mandatory)`mandatory` Whether to set the mandatory flag on published messages. When set to `true`, a published message that cannot be routed to any queues is returned to the sender. **Type**: `bool` **Default**: `false` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput. **Type**: `int` **Default**: `64` ### [](#message_id)`message_id` Set a message ID for each message using a dynamic interpolated expression. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#metadata)`metadata` Configure which metadata values are added to messages as headers. This allows you to pass additional context information along with your messages. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to exclude when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#persistent)`persistent` Whether to store delivered messages on disk. By default, message delivery is transient. **Type**: `bool` **Default**: `false` ### [](#priority)`priority` Set the priority of each message using a dynamic interpolated expression. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: priority: 0 # --- priority: ${! meta("amqp_priority") } # --- priority: ${! json("doc.priority") } ``` ### [](#reply_to)`reply_to` Set the name of the queue to which responses are sent using a dynamic interpolated expression. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#timeout)`timeout` The maximum period to wait for a message acknowledgment before abandoning it and attempting a resend. If this value is not set, the system waits indefinitely. **Type**: `string` **Default**: `""` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#type)`type` A custom message type to set for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#urls)`urls[]` A list of URLs to connect to. This input attempts to connect to each URL in the list, in order, until a successful connection is established. It then continues to use that URL until the connection is closed. If an item in the list contains commas, it is split into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "amqp://guest:guest@127.0.0.1:5672/" # --- urls: - "amqp://127.0.0.1:5672/,amqp://127.0.0.2:5672/" # --- urls: - "amqp://127.0.0.1:5672/" - "amqp://127.0.0.2:5672/" ``` ### [](#user_id)`user_id` Set the user ID to the name of the publisher. If this property is set by a publisher, its value must match the name of the user that opened the connection. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` --- # Page 142: aws_dynamodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_dynamodb.md --- # aws\_dynamodb --- title: aws_dynamodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_dynamodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_dynamodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_dynamodb.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/aws_dynamodb/)[Cache](/redpanda-cloud/develop/connect/components/caches/aws_dynamodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_dynamodb/ "View the Self-Managed version of this component") Inserts items into a DynamoDB table. #### Common ```yml outputs: label: "" aws_dynamodb: table: "" # No default (required) string_columns: {} json_map_columns: {} max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" aws_dynamodb: table: "" # No default (required) string_columns: {} json_map_columns: {} ttl: "" ttl_key: "" max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) max_retries: 3 backoff: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` The field `string_columns` is a map of column names to string values, where the values are [function interpolated](../../../configuration/interpolation/#bloblang-queries) per message of a batch. This allows you to populate string columns of an item by extracting fields within the document payload or metadata like follows: ```yml string_columns: id: ${!json("id")} title: ${!json("body.title")} topic: ${!meta("kafka_topic")} full_content: ${!content()} ``` The field `json_map_columns` is a map of column names to json paths, where the [dot path](../../../configuration/field_paths/) is extracted from each document and converted into a map value. Both an empty path and the path `.` are interpreted as the root of the document. This allows you to populate map columns of an item like follows: ```yml json_map_columns: user: path.to.user whole_document: . ``` A column name can be empty: ```yml json_map_columns: "": . ``` In which case the top level document fields will be written at the root of the item, potentially overwriting previously defined column values. If a path is not found within a document the column will not be populated. ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1s` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum period to wait before retry attempts are abandoned. If zero then no limit is used. **Type**: `string` **Default**: `30s` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts. **Type**: `string` **Default**: `5s` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#json_map_columns)`json_map_columns` A map of column keys to [field paths](../../../configuration/field_paths/) pointing to value data within messages. **Type**: `string` **Default**: `{}` ```yaml # Examples: json_map_columns: user: path.to.user whole_document: . # --- json_map_columns: "": . ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `3` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#string_columns)`string_columns` A map of column keys to string values to store. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: string_columns: full_content: ${!content()} id: ${!json("id")} title: ${!json("body.title")} topic: ${!meta("kafka_topic")} ``` ### [](#table)`table` The table to store messages in. **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#ttl)`ttl` An optional TTL to set for items, calculated from the moment the message is sent. **Type**: `string` **Default**: `""` ### [](#ttl_key)`ttl_key` The column key to place the TTL value within. **Type**: `string` **Default**: `""` --- # Page 143: aws_kinesis_firehose **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_kinesis_firehose.md --- # aws\_kinesis\_firehose --- title: aws_kinesis_firehose latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_kinesis_firehose page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_kinesis_firehose.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_kinesis_firehose.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_kinesis_firehose/ "View the Self-Managed version of this component") Sends messages to a Kinesis Firehose delivery stream. #### Common ```yml outputs: label: "" aws_kinesis_firehose: stream: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" aws_kinesis_firehose: stream: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) max_retries: 0 backoff: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1s` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum period to wait before retry attempts are abandoned. If zero then no limit is used. **Type**: `string` **Default**: `30s` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts. **Type**: `string` **Default**: `5s` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `0` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#stream)`stream` The stream to publish messages to. **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 144: aws_kinesis **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_kinesis.md --- # aws\_kinesis --- title: aws_kinesis latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_kinesis page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_kinesis.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_kinesis.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/aws_kinesis/)[Input](/redpanda-cloud/develop/connect/components/inputs/aws_kinesis/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_kinesis/ "View the Self-Managed version of this component") Sends messages to a Kinesis stream. #### Common ```yml outputs: label: "" aws_kinesis: stream: "" # No default (required) partition_key: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" aws_kinesis: stream: "" # No default (required) partition_key: "" # No default (required) hash_key: "" # No default (optional) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) max_retries: 0 backoff: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` Both the `partition_key`(required) and `hash_key` (optional) fields can be dynamically set using function interpolations described [here](../../../configuration/interpolation/#bloblang-queries). When sending batched messages the interpolations are performed per message part. ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1s` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum period to wait before retry attempts are abandoned. If zero then no limit is used. **Type**: `string` **Default**: `30s` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts. **Type**: `string` **Default**: `5s` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#hash_key)`hash_key` A optional hash key for partitioning messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of parallel message batches to have in flight at any given time. **Type**: `int` **Default**: `64` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `0` ### [](#partition_key)`partition_key` A required key for partitioning messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#stream)`stream` The stream to publish messages to. Streams can either be specified by their name or full ARN. **Type**: `string` ```yaml # Examples: stream: foo # --- stream: arn:aws:kinesis:*:111122223333:stream/my-stream ``` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` --- # Page 145: aws_s3 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_s3.md --- # aws\_s3 --- title: aws_s3 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_s3 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_s3.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_s3.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/aws_s3/)[Cache](/redpanda-cloud/develop/connect/components/caches/aws_s3/)[Input](/redpanda-cloud/develop/connect/components/inputs/aws_s3/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_s3/ "View the Self-Managed version of this component") Uploads messages to an Amazon S3 bucket as objects, using the path specified in the `path` field. #### Common ```yml outputs: label: "" aws_s3: bucket: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt tags: {} content_type: application/octet-stream metadata: exclude_prefixes: [] max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" aws_s3: bucket: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt tags: {} content_type: application/octet-stream content_encoding: "" cache_control: "" content_disposition: "" content_language: "" website_redirect_location: "" metadata: exclude_prefixes: [] storage_class: STANDARD kms_key_id: "" checksum_algorithm: "" server_side_encryption: "" force_path_style_urls: false max_in_flight: 64 timeout: 5s object_canned_acl: private batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` To use a different path for each object, use [function interpolation](../../../configuration/interpolation/#bloblang-queries), which is evaluated for each message in a batch. ## [](#metadata)Metadata Redpanda Connect sends metadata fields as headers. To mutate or remove these values, see the [metadata docs](../../../configuration/metadata/). ## [](#tags)Tags The `tags` field accepts key/value pairs to attach to objects as tags, and the values support [interpolation functions](../../../configuration/interpolation/#bloblang-queries): ```yaml output: aws_s3: bucket: TODO path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz tags: Key1: Value1 Timestamp: ${!meta("Timestamp")} ``` ## [](#credentials)Credentials By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level to transfer data across accounts. You can find out more in [AWS credentials](../../../guides/cloud/aws/). ## [](#batching)Batching It’s common to want to upload messages to S3 as batched archives. The easiest way to do this is to batch your messages at the output level and join the batch of messages with an [`archive`](../../processors/archive/) or [`compress`](../../processors/compress/) processor. For example, the following configuration uploads messages as a `.tar.gz` archive of documents: ```yaml output: aws_s3: bucket: TODO path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz batching: count: 100 period: 10s processors: - archive: format: tar - compress: algorithm: gzip ``` This configuration uploads JSON documents as a single large document containing an array of objects: ```yaml output: aws_s3: bucket: TODO path: ${!counter()}-${!timestamp_unix_nano()}.json batching: count: 100 processors: - archive: format: json_array ``` ## [](#bucket-name-format)Bucket name format The `bucket` field accepts a bucket name only, not an ARN. For example, use `my-bucket`, not `arn:aws:s3:::my-bucket`. ## [](#s3-compatible-storage)S3-compatible storage The `endpoint` and `force_path_style_urls` fields let you connect to S3-compatible storage services such as Cloudflare R2, MinIO, or DigitalOcean Spaces. For Cloudflare R2, set `endpoint` to your account endpoint URL and enable `force_path_style_urls`: ```yaml output: aws_s3: bucket: r2-bucket path: ${!uuid_v4()}.json endpoint: https://.r2.cloudflarestorage.com force_path_style_urls: true region: auto credentials: id: secret: ``` Find your account ID in the Cloudflare dashboard under **R2 > Overview > Account Details**. Generate API credentials under **R2 > Manage R2 API Tokens**. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#batching-2)`batching` Configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#bucket)`bucket` The bucket to upload messages to. **Type**: `string` ### [](#cache_control)`cache_control` The cache control to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#checksum_algorithm)`checksum_algorithm` The algorithm used to validate each object during its upload to the Amazon S3 bucket. **Type**: `string` **Default**: `""` **Options**: `CRC32`, `CRC32C`, `SHA1`, `SHA256` ### [](#content_disposition)`content_disposition` The content disposition to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_encoding)`content_encoding` An optional content encoding to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_language)`content_language` The content language to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_type)`content_type` The content type to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `application/octet-stream` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#force_path_style_urls)`force_path_style_urls` Forces the client API to use path style URLs, which helps when connecting to custom endpoints. **Type**: `bool` **Default**: `false` ### [](#kms_key_id)`kms_key_id` An optional server-side encryption key. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#metadata-2)`metadata` Specify criteria for which metadata values are attached to objects as headers. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#object_canned_acl)`object_canned_acl` The object canned ACL value. **Type**: `string` **Default**: `private` **Options**: `private`, `public-read`, `public-read-write`, `authenticated-read`, `aws-exec-read`, `bucket-owner-read`, `bucket-owner-full-control` ### [](#path)`path` The path of each message to upload. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${!counter()}-${!timestamp_unix_nano()}.txt` ```yaml # Examples: path: ${!counter()}-${!timestamp_unix_nano()}.txt # --- path: ${!meta("kafka_key")}.json # --- path: ${!json("doc.namespace")}/${!json("doc.id")}.json ``` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#server_side_encryption)`server_side_encryption` An optional server-side encryption algorithm. **Type**: `string` **Default**: `""` ### [](#storage_class)`storage_class` The storage class to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `STANDARD` **Options**: `STANDARD`, `REDUCED_REDUNDANCY`, `GLACIER`, `STANDARD_IA`, `ONEZONE_IA`, `INTELLIGENT_TIERING`, `DEEP_ARCHIVE` ### [](#tags-2)`tags` Key/value pairs to store with the object as tags. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: tags: Key1: Value1 Timestamp: ${!meta("Timestamp")} ``` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period to wait on an upload before abandoning it and reattempting. **Type**: `string` **Default**: `5s` ### [](#website_redirect_location)`website_redirect_location` The website redirect location to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` --- # Page 146: aws_sns **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_sns.md --- # aws\_sns --- title: aws_sns latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_sns page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_sns.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_sns.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_sns/ "View the Self-Managed version of this component") Sends messages to an AWS SNS topic. #### Common ```yml outputs: label: "" aws_sns: topic_arn: "" # No default (required) message_group_id: "" # No default (optional) message_deduplication_id: "" # No default (optional) subject: "" # No default (optional) max_in_flight: 64 metadata: exclude_prefixes: [] ``` #### Advanced ```yml outputs: label: "" aws_sns: topic_arn: "" # No default (required) message_group_id: "" # No default (optional) message_deduplication_id: "" # No default (optional) subject: "" # No default (optional) max_in_flight: 64 metadata: exclude_prefixes: [] timeout: 5s region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#message_deduplication_id)`message_deduplication_id` An optional deduplication ID to set for messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#message_group_id)`message_group_id` An optional group ID to set for messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#metadata)`metadata` Specify criteria for which metadata values are sent as headers. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#subject)`subject` An optional subject to set for messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period to wait on an upload before abandoning it and reattempting. **Type**: `string` **Default**: `5s` ### [](#topic_arn)`topic_arn` The topic to publish to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 147: aws_sqs **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/aws_sqs.md --- # aws\_sqs --- title: aws_sqs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/aws_sqs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/aws_sqs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/aws_sqs.adoc categories: "[\"Services\",\"AWS\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/aws_sqs/)[Input](/redpanda-cloud/develop/connect/components/inputs/aws_sqs/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/aws_sqs/ "View the Self-Managed version of this component") Sends messages to an SQS queue. #### Common ```yml outputs: label: "" aws_sqs: url: "" # No default (required) message_group_id: "" # No default (optional) message_deduplication_id: "" # No default (optional) delay_seconds: "" # No default (optional) max_in_flight: 64 metadata: exclude_prefixes: [] batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" aws_sqs: url: "" # No default (required) message_group_id: "" # No default (optional) message_deduplication_id: "" # No default (optional) delay_seconds: "" # No default (optional) max_in_flight: 64 metadata: exclude_prefixes: [] batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_records_per_request: 10 region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) max_retries: 0 backoff: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s ``` Metadata values are sent along with the payload as attributes with the data type String. If the number of metadata values in a message exceeds the message attribute limit (10) then the top ten keys ordered alphabetically will be selected. The fields `message_group_id`, `message_deduplication_id` and `delay_seconds` can be set dynamically using [function interpolations](../../../configuration/interpolation/#bloblang-queries), which are resolved individually for each message of a batch. ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `1s` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum period to wait before retry attempts are abandoned. If zero then no limit is used. **Type**: `string` **Default**: `30s` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts. **Type**: `string` **Default**: `5s` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#delay_seconds)`delay_seconds` An optional delay time in seconds for message. Value between 0 and 900 This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of parallel message batches to have in flight at any given time. **Type**: `int` **Default**: `64` ### [](#max_records_per_request)`max_records_per_request` The maximum number of records delivered in a single SQS request. Enter only values from `0` to `10`. **Type**: `int` **Default**: `10` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `0` ### [](#message_deduplication_id)`message_deduplication_id` An optional deduplication ID to set for messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#message_group_id)`message_group_id` An optional group ID to set for messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#metadata)`metadata` Specify criteria for which metadata values are sent as headers. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#url)`url` The URL of the target SQS queue. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 148: azure_blob_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/azure_blob_storage.md --- # azure\_blob\_storage --- title: azure_blob_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/azure_blob_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/azure_blob_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/azure_blob_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/azure_blob_storage/)[Input](/redpanda-cloud/develop/connect/components/inputs/azure_blob_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/azure_blob_storage/ "View the Self-Managed version of this component") Sends message parts as objects to an Azure Blob Storage Account container. Each object is uploaded with the filename specified with the `container` field. #### Common ```yml outputs: label: "" azure_blob_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" container: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" azure_blob_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" container: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt blob_type: BLOCK public_access_level: PRIVATE max_in_flight: 64 ``` In order to have a different path for each object you should use function interpolations described [here](../../../configuration/interpolation/#bloblang-queries), which are calculated per message of a batch. Supports multiple authentication methods but only one of the following is required: - `storage_connection_string` - `storage_account` and `storage_access_key` - `storage_account` and `storage_sas_token` - `storage_account` to access via [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) If multiple are set then the `storage_connection_string` is given priority. If the `storage_connection_string` does not contain the `AccountName` parameter, please specify it in the `storage_account` field. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#blob_type)`blob_type` Block and Append blobs are comprized of blocks, and each blob can support up to 50,000 blocks. The default value is ``"`BLOCK`"``.\` This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `BLOCK` **Options**: `BLOCK`, `APPEND` ### [](#container)`container` The container for uploading the messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: container: messages-${!timestamp("2006")} ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#path)`path` The path of each message to upload. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${!counter()}-${!timestamp_unix_nano()}.txt` ```yaml # Examples: path: ${!counter()}-${!timestamp_unix_nano()}.json # --- path: ${!meta("kafka_key")}.json # --- path: ${!json("doc.namespace")}/${!json("doc.id")}.json ``` ### [](#public_access_level)`public_access_level` The container’s public access level. The default value is `PRIVATE`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `PRIVATE` **Options**: `PRIVATE`, `BLOB`, `CONTAINER` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The storage account SAS token. This field is ignored if `storage_connection_string` or `storage_access_key` are set. **Type**: `string` **Default**: `""` --- # Page 149: azure_cosmosdb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/azure_cosmosdb.md --- # azure\_cosmosdb --- title: azure_cosmosdb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/azure_cosmosdb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/azure_cosmosdb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/azure_cosmosdb.adoc categories: "[\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/azure_cosmosdb/)[Input](/redpanda-cloud/develop/connect/components/inputs/azure_cosmosdb/)[Processor](/redpanda-cloud/develop/connect/components/processors/azure_cosmosdb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/azure_cosmosdb/ "View the Self-Managed version of this component") Creates or updates messages as JSON documents in [Azure CosmosDB](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction). #### Common ```yml outputs: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) operation: Create item_id: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) operation: Create patch_operations: [] # No default (optional) patch_condition: "" # No default (optional) auto_id: true item_id: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 64 ``` When creating documents, each message must have the `id` property (case-sensitive) set (or use `auto_id: true`). It is the unique name that identifies the document, that is, no two documents share the same `id` within a logical partition. The `id` field must not exceed 255 characters. [See details](https://learn.microsoft.com/en-us/rest/api/cosmos-db/documents). The `partition_keys` field must resolve to the same value(s) across the entire message batch. ## [](#credentials)Credentials You can use one of the following authentication mechanisms: - Set the `endpoint` field and the `account_key` field - Set only the `endpoint` field to use [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) - Set the `connection_string` field ## [](#batching)Batching CosmosDB limits the maximum batch size to 100 messages and the payload must not exceed 2MB ([details here](https://learn.microsoft.com/en-us/azure/cosmos-db/concepts-limits#per-request-limits)). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#examples)Examples ### [](#create-documents)Create documents Create new documents in the `blobfish` container with partition key `/habitat`. ```yaml output: azure_cosmosdb: endpoint: http://localhost:8080 account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== database: blobbase container: blobfish partition_keys_map: root = json("habitat") operation: Create ``` ### [](#patch-documents)Patch documents Execute the Patch operation on documents from the `blobfish` container. ```yaml output: azure_cosmosdb: endpoint: http://localhost:8080 account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== database: testdb container: blobfish partition_keys_map: root = json("habitat") item_id: ${! json("id") } operation: Patch patch_operations: # Add a new /diet field - operation: Add path: /diet value_map: root = json("diet") # Remove the first location from the /locations array field - operation: Remove path: /locations/0 # Add new location at the end of the /locations array field - operation: Add path: /locations/- value_map: root = "Challenger Deep" ``` ## [](#fields)Fields ### [](#account_key)`account_key` Account key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== ``` ### [](#auto_id)`auto_id` Automatically set the item `id` field to a random UUID v4. If the `id` field is already set, then it will not be overwritten. Setting this to `false` can improve performance, since the messages will not have to be parsed. **Type**: `bool` **Default**: `true` ### [](#batching-2)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#connection_string)`connection_string` Connection string. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: connection_string: AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==; ``` ### [](#container)`container` Container. **Type**: `string` ```yaml # Examples: container: testcontainer ``` ### [](#database)`database` Database. **Type**: `string` ```yaml # Examples: database: testdb ``` ### [](#endpoint)`endpoint` CosmosDB endpoint. **Type**: `string` ```yaml # Examples: endpoint: https://localhost:8081 ``` ### [](#item_id)`item_id` ID of item to replace or delete. Only used by the Replace and Delete operations This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: item_id: ${! json("id") } ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#operation)`operation` Operation. **Type**: `string` **Default**: `Create` | Option | Summary | | --- | --- | | Create | Create operation. | | Delete | Delete operation. | | Patch | Patch operation. | | Replace | Replace operation. | | Upsert | Upsert operation. | ### [](#partition_keys_map)`partition_keys_map` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to a single partition key value or an array of partition key values of type string, integer or boolean. Currently, hierarchical partition keys are not supported so only one value may be provided. **Type**: `string` ```yaml # Examples: partition_keys_map: root = "blobfish" # --- partition_keys_map: root = 41 # --- partition_keys_map: root = true # --- partition_keys_map: root = null # --- partition_keys_map: root = json("blobfish").depth ``` ### [](#patch_condition)`patch_condition` Patch operation condition. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: patch_condition: from c where not is_defined(c.blobfish) ``` ### [](#patch_operations)`patch_operations[]` Patch operations to be performed when `operation: Patch` . **Type**: `object` ### [](#patch_operations-operation)`patch_operations[].operation` Operation. **Type**: `string` **Default**: `Add` | Option | Summary | | --- | --- | | Add | Add patch operation. | | Increment | Increment patch operation. | | Remove | Remove patch operation. | | Replace | Replace patch operation. | | Set | Set patch operation. | ### [](#patch_operations-path)`patch_operations[].path` Path. **Type**: `string` ```yaml # Examples: path: /foo/bar/baz ``` ### [](#patch_operations-value_map)`patch_operations[].value_map` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to a value of any type that is supported by CosmosDB. **Type**: `string` ```yaml # Examples: value_map: root = "blobfish" # --- value_map: root = 41 # --- value_map: root = true # --- value_map: root = json("blobfish").depth # --- value_map: root = [1, 2, 3] ``` ## [](#cosmosdb-emulator)CosmosDB emulator If you wish to run the CosmosDB emulator that is referenced in the documentation [here](https://learn.microsoft.com/en-us/azure/cosmos-db/linux-emulator), the following Docker command should do the trick: ```bash > docker run --rm -it -p 8081:8081 --name=cosmosdb -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator ``` Note: `AZURE_COSMOS_EMULATOR_PARTITION_COUNT` controls the number of partitions that will be supported by the emulator. The bigger the value, the longer it takes for the container to start up. Additionally, instead of installing the container self-signed certificate which is exposed via `[https://localhost:8081/_explorer/emulator.pem](https://localhost:8081/_explorer/emulator.pem)`, you can run [mitmproxy](https://mitmproxy.org/) like so: ```bash > mitmproxy -k --mode "reverse:https://localhost:8081" ``` Then you can access the CosmosDB UI via `[http://localhost:8080/_explorer/index.html](http://localhost:8080/_explorer/index.html)` and use `[http://localhost:8080](http://localhost:8080)` as the CosmosDB endpoint. --- # Page 150: azure_data_lake_gen2 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/azure_data_lake_gen2.md --- # azure\_data\_lake\_gen2 --- title: azure_data_lake_gen2 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/azure_data_lake_gen2 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/azure_data_lake_gen2.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/azure_data_lake_gen2.adoc page-git-created-date: "2024-11-05" page-git-modified-date: "2024-11-05" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/azure_data_lake_gen2/ "View the Self-Managed version of this component") Sends message parts as files to an [Azure Data Lake Gen2](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) file system. Each file is uploaded with the file name specified in the `path` field. ```yml outputs: label: "" azure_data_lake_gen2: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" filesystem: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt max_in_flight: 64 ``` To specify a different [`path` value](#path) (file name) for each file, use [function interpolations](../../../configuration/interpolation/#bloblang-queries). Function interpolations are calculated for each message in a batch. ## [](#authentication-methods)Authentication methods This output supports multiple authentication methods. You must configure at least one method from the following list: - `storage_connection_string` - `storage_account` and `storage_access_key` - `storage_account` and `storage_sas_token` - `storage_account` to access using [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) If you configure multiple authentication methods, the `storage_connection_string` takes precedence. ## [](#performance)Performance Sends multiple messages in flight in parallel for improved performance. You can tune the number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#filesystem)`filesystem` The name of the data lake storage file system you want to upload messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: filesystem: messages-${!timestamp("2006")} ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput until performance plateaus. **Type**: `int` **Default**: `64` ### [](#path)`path` The path (file name) of each message to upload to the data lake storage file system. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${!counter()}-${!timestamp_unix_nano()}.txt` ```yaml # Examples: path: ${!counter()}-${!timestamp_unix_nano()}.json # --- path: ${!meta("kafka_key")}.json # --- path: ${!json("doc.namespace")}/${!json("doc.id")}.json ``` ### [](#storage_access_key)`storage_access_key` The access key for the storage account. Use this field along with `storage_account` for authentication. This field is ignored when the `storage_connection_string` field is populated. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored when the `storage_connection_string` field is populated. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` The connection string for the storage account. You must enter a value for this field if no other authentication method is specified. > 📝 **NOTE** > > If the `storage_connection_string` field does not contain the `AccountName` parameter value, specify it in the `storage_account` field. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The SAS token for the storage account. Use this field along with `storage_account` for authentication. This field is ignored when either the `storage_connection_string` or `storage_access_key` fields are populated. **Type**: `string` **Default**: `""` --- # Page 151: azure_queue_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/azure_queue_storage.md --- # azure\_queue\_storage --- title: azure_queue_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/azure_queue_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/azure_queue_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/azure_queue_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/azure_queue_storage/)[Input](/redpanda-cloud/develop/connect/components/inputs/azure_queue_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/azure_queue_storage/ "View the Self-Managed version of this component") Sends messages to an Azure Storage Queue. #### Common ```yml outputs: label: "" azure_queue_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" queue_name: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" azure_queue_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" queue_name: "" # No default (required) ttl: "" max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Only one authentication method is required, `storage_connection_string` or `storage_account` and `storage_access_key`. If both are set then the `storage_connection_string` is given priority. In order to set the `queue_name` you can use function interpolations described [here](../../../configuration/interpolation/#bloblang-queries), which are calculated per message of a batch. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#max_in_flight)`max_in_flight` The maximum number of parallel message batches to have in flight at any given time. **Type**: `int` **Default**: `64` ### [](#queue_name)`queue_name` The name of the target Queue Storage queue. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The storage account SAS token. This field is ignored if `storage_connection_string` or `storage_access_key` are set. **Type**: `string` **Default**: `""` ### [](#ttl)`ttl` The TTL of each individual message as a duration string. Defaults to 0, meaning no retention period is set This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: ttl: 60s # --- ttl: 5m # --- ttl: 36h ``` --- # Page 152: azure_table_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/azure_table_storage.md --- # azure\_table\_storage --- title: azure_table_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/azure_table_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/azure_table_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/azure_table_storage.adoc categories: "[\"Services\",\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/azure_table_storage/)[Input](/redpanda-cloud/develop/connect/components/inputs/azure_table_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/azure_table_storage/ "View the Self-Managed version of this component") Stores messages in an Azure Table Storage table. #### Common ```yml outputs: label: "" azure_table_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" table_name: "" # No default (required) partition_key: "" row_key: "" properties: {} max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" azure_table_storage: storage_account: "" storage_access_key: "" storage_connection_string: "" storage_sas_token: "" table_name: "" # No default (required) partition_key: "" row_key: "" properties: {} transaction_type: INSERT max_in_flight: 64 timeout: 5s batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` Only one authentication method is required, `storage_connection_string` or `storage_account` and `storage_access_key`. If both are set then the `storage_connection_string` is given priority. In order to set the `table_name`, `partition_key` and `row_key` you can use function interpolations described [here](../../../configuration/interpolation/#bloblang-queries), which are calculated per message of a batch. If the `properties` are not set in the config, all the `json` fields are marshalled and stored in the table, which will be created if it does not exist. The `object` and `array` fields are marshaled as strings. e.g.: The JSON message: ```json { "foo": 55, "bar": { "baz": "a", "bez": "b" }, "diz": ["a", "b"] } ``` Will store in the table the following properties: ```yml foo: '55' bar: '{ "baz": "a", "bez": "b" }' diz: '["a", "b"]' ``` It’s also possible to use function interpolations to get or transform the properties values, e.g.: ```yml properties: device: '${! json("device") }' timestamp: '${! json("timestamp") }' ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#max_in_flight)`max_in_flight` The maximum number of parallel message batches to have in flight at any given time. **Type**: `int` **Default**: `64` ### [](#partition_key)`partition_key` The partition key. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: partition_key: ${! json("date") } ``` ### [](#properties)`properties` A map of properties to store into the table. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ### [](#row_key)`row_key` The row key. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: row_key: ${! json("device")}-${!uuid_v4() } ``` ### [](#storage_access_key)`storage_access_key` The storage account access key. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_account)`storage_account` The storage account to access. This field is ignored if `storage_connection_string` is set. **Type**: `string` **Default**: `""` ### [](#storage_connection_string)`storage_connection_string` A storage account connection string. This field is required if `storage_account` and `storage_access_key` / `storage_sas_token` are not set. **Type**: `string` **Default**: `""` ### [](#storage_sas_token)`storage_sas_token` The storage account SAS token. This field is ignored if `storage_connection_string` or `storage_access_key` are set. **Type**: `string` **Default**: `""` ### [](#table_name)`table_name` The table to store messages into. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: table_name: ${! meta("kafka_topic") } # --- table_name: ${! json("table") } ``` ### [](#timeout)`timeout` The maximum period to wait on an upload before abandoning it and reattempting. **Type**: `string` **Default**: `5s` ### [](#transaction_type)`transaction_type` Type of transaction operation. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `INSERT` **Options**: `INSERT`, `INSERT_MERGE`, `INSERT_REPLACE`, `UPDATE_MERGE`, `UPDATE_REPLACE`, `DELETE` ```yaml # Examples: transaction_type: ${! json("operation") } # --- transaction_type: ${! meta("operation") } # --- transaction_type: INSERT ``` --- # Page 153: broker **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/broker.md --- # broker --- title: broker latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/broker page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/broker.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/broker.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/broker/)[Input](/redpanda-cloud/develop/connect/components/inputs/broker/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/broker/ "View the Self-Managed version of this component") A meta-output that routes messages to child outputs using a range of brokering [patterns](#patterns). Unlike regular outputs, `broker` doesn’t send messages anywhere by itself. Instead, it wraps other outputs and controls how messages are delivered across them. Use `broker` to fan out the same message to multiple destinations (for example, publishing events to Kafka while also writing them to a database), or to distribute messages across a pool of outputs for load balancing or throughput scaling. The delivery pattern determines whether each message is written to all outputs or routed to a single output, and whether writes happen in parallel or in sequence. > 📝 **NOTE** > > The name `broker` refers to the brokering delivery pattern, not a Redpanda broker (cluster node). #### Common ```yml outputs: label: "" broker: pattern: fan_out outputs: [] # No default (required) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" broker: copies: 1 pattern: fan_out outputs: [] # No default (required) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` [Processors](../../processors/about/) can be listed to apply across individual outputs or all outputs: ```yaml output: broker: pattern: fan_out outputs: - resource: foo - resource: bar # Processors only applied to messages sent to bar. processors: - resource: bar_processor # Processors applied to messages sent to all brokered outputs. processors: - resource: general_processor ``` ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#copies)`copies` The number of copies of each configured output to spawn. **Type**: `int` **Default**: `1` ### [](#outputs)`outputs[]` A list of child outputs to broker. **Type**: `output` ### [](#pattern)`pattern` The brokering pattern to use. **Type**: `string` **Default**: `fan_out` **Options**: `fan_out`, `fan_out_fail_fast`, `fan_out_sequential`, `fan_out_sequential_fail_fast`, `round_robin`, `greedy` ## [](#patterns)Patterns The broker pattern determines how messages are distributed across outputs. Use `fan_out` (the default) when every output should receive every message. Use `round_robin` or `greedy` when you want to distribute messages across outputs for load balancing rather than duplication. The available patterns are: ### [](#fan_out)`fan_out` With the fan out pattern all outputs will be sent every message that passes through Redpanda Connect in parallel. If an output applies back pressure it will block all subsequent messages, and if an output fails to send a message it will be retried continuously until completion or service shut down. This mechanism is in place in order to prevent one bad output from causing a larger retry loop that results in a good output from receiving unbounded message duplicates. Sometimes it is useful to disable the back pressure or retries of certain fan out outputs and instead drop messages that have failed or were blocked. In this case you can wrap outputs with a [`drop_on` output](../drop_on/). ### [](#fan_out_fail_fast)`fan_out_fail_fast` The same as the `fan_out` pattern, except that output failures will not be automatically retried. This pattern should be used with caution as busy retry loops could result in unlimited duplicates being introduced into the non-failure outputs. ### [](#fan_out_sequential)`fan_out_sequential` Similar to the fan out pattern except outputs are written to sequentially, meaning an output is only written to once the preceding output has confirmed receipt of the same message. If an output applies back pressure it will block all subsequent messages, and if an output fails to send a message it will be retried continuously until completion or service shut down. This mechanism is in place in order to prevent one bad output from causing a larger retry loop that results in a good output from receiving unbounded message duplicates. ### [](#fan_out_sequential_fail_fast)`fan_out_sequential_fail_fast` The same as the `fan_out_sequential` pattern, except that output failures will not be automatically retried. This pattern should be used with caution as busy retry loops could result in unlimited duplicates being introduced into the non-failure outputs. ### [](#round_robin)`round_robin` With the round robin pattern each message will be assigned a single output following their order. If an output applies back pressure it will block all subsequent messages. If an output fails to send a message then the message will be re-attempted with the next input, and so on. ### [](#greedy)`greedy` The greedy pattern results in higher output throughput at the cost of potentially disproportionate message allocations to those outputs. Each message is sent to a single output, which is determined by allowing outputs to claim messages as soon as they are able to process them. This results in certain faster outputs potentially processing more messages at the cost of slower outputs. --- # Page 154: cache **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/cache.md --- # cache --- title: cache latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/cache page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/cache.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/cache.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/cache/)[Processor](/redpanda-cloud/develop/connect/components/processors/cache/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/cache/ "View the Self-Managed version of this component") Stores each message in a [cache](../../caches/about/). #### Common ```yml outputs: label: "" cache: target: "" # No default (required) key: ${!count("items")}-${!timestamp_unix_nano()} max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" cache: target: "" # No default (required) key: ${!count("items")}-${!timestamp_unix_nano()} ttl: "" # No default (optional) max_in_flight: 64 ``` Caches are configured as [resources](../../caches/about/), where there’s a wide variety to choose from. The `target` field must reference a configured cache resource label like follows: ```yaml output: cache: target: foo key: ${!json("document.id")} cache_resources: - label: foo memcached: addresses: - localhost:11211 default_ttl: 60s ``` In order to create a unique `key` value per item you should use function interpolations described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#key)`key` The key to store messages by, function interpolation should be used in order to derive a unique key for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${!count("items")}-${!timestamp_unix_nano()}` ```yaml # Examples: key: ${!count("items")}-${!timestamp_unix_nano()} # --- key: ${!json("doc.id")} # --- key: ${!meta("kafka_key")} ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#target)`target` The target cache to store messages in. **Type**: `string` ### [](#ttl)`ttl` The TTL of each individual item as a duration string. After this period an item will be eligible for removal during the next compaction. Not all caches support per-key TTLs, and those that do not will fall back to their generally configured TTL setting. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: ttl: 60s # --- ttl: 5m # --- ttl: 36h ``` --- # Page 155: cyborgdb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/cyborgdb.md --- # cyborgdb --- title: cyborgdb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/cyborgdb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/cyborgdb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/cyborgdb.adoc categories: "[AI]" description: Inserts items into a CyborgDB encrypted vector index. page-git-created-date: "2025-10-09" page-git-modified-date: "2025-10-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/cyborgdb/ "View the Self-Managed version of this component") Inserts items into a CyborgDB encrypted vector index. #### Common ```yaml outputs: label: "" cyborgdb: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) host: "" # No default (required) api_key: "" # No default (required) index_name: redpanda-vectors index_key: "" # No default (required) operation: upsert id: "" # No default (required) vector_mapping: "" # No default (optional) metadata_mapping: "" # No default (optional) ``` #### Advanced ```yaml outputs: label: "" cyborgdb: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) host: "" # No default (required) api_key: "" # No default (required) index_name: redpanda-vectors index_key: "" # No default (required) create_if_missing: false operation: upsert id: "" # No default (required) vector_mapping: "" # No default (optional) metadata_mapping: "" # No default (optional) ``` This output allows you to write vectors to a CyborgDB encrypted index. CyborgDB provides end-to-end encrypted vector storage with automatic dimension detection and index optimization. All vector data is encrypted client-side before being sent to the server, ensuring complete data privacy. The encryption key never leaves your infrastructure. ## [](#fields)Fields ### [](#api_key)`api_key` The API key for authenticating with the CyborgDB service. This key identifies your account and provides access to your CyborgDB indexes. Keep this key secure and avoid exposing it in logs or version control. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#create_if_missing)`create_if_missing` Whether to create the index if it doesn’t exist. When enabled, CyborgDB automatically detects the vector dimensions from your data and optimizes the index configuration for performance. This is useful for development and testing environments. **Type**: `bool` **Default**: `false` ### [](#host)`host` The host URL for the CyborgDB instance. This should include the protocol (https://) and port number if required. For example: `[https://api.cyborgdb.com](https://api.cyborgdb.com)` or `[https://localhost:8080](https://localhost:8080)`. **Type**: `string` ```yaml # Examples: host: api.cyborg.com # --- host: localhost:8000 ``` ### [](#id)`id` A [Bloblang mapping](../../../guides/bloblang/about/) that determines the unique identifier for each vector entry. This ID is used to update existing vectors during upsert operations or to specify which vectors to delete. If not provided, CyborgDB will generate unique IDs automatically. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#index_key)`index_key` The base64-encoded encryption key for the CyborgDB index. This key must be exactly 32 bytes when decoded from base64. All vector data is encrypted client-side using this key before transmission, ensuring complete data privacy. Store this key securely as it cannot be recovered if lost. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: index_key: your-base64-encoded-32-byte-key ``` ### [](#index_name)`index_name` The name of the CyborgDB index to write vectors to. If the index doesn’t exist and `create_if_missing` is enabled, CyborgDB will create it automatically with optimized settings based on your data. **Type**: `string` **Default**: `redpanda-vectors` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#metadata_mapping)`metadata_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) that extracts metadata to associate with the vector entry. The metadata can contain any JSON-serializable data that helps identify or categorize the vector. This data is stored encrypted alongside the vector. **Type**: `string` ```yaml # Examples: metadata_mapping: root = @ # --- metadata_mapping: root = metadata() # --- metadata_mapping: root = {"summary": this.summary, "category": this.category} ``` ### [](#operation)`operation` The operation to perform against the CyborgDB index. Supported operations: - `upsert`: Insert new vectors or update existing ones (requires `vector_mapping`) - `delete`: Remove vectors from the index (requires `id`) - `query`: Search for similar vectors (requires `vector_mapping`) **Type**: `string` **Default**: `upsert` **Options**: `upsert`, `delete` ### [](#vector_mapping)`vector_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) that extracts the vector from the message. The result must be an array of floating-point numbers representing the vector embeddings. This field is required for `upsert` and `query` operations. **Type**: `string` ```yaml # Examples: vector_mapping: root = this.embeddings_vector # --- vector_mapping: root = [1.2, 0.5, 0.76] ``` --- # Page 156: drop_on **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/drop_on.md --- # drop\_on --- title: drop_on latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/drop_on page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/drop_on.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/drop_on.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/drop_on/ "View the Self-Managed version of this component") Attempts to write messages to a child output and if the write fails for one of a list of configurable reasons the message is dropped (acked) instead of being reattempted (or nacked). ```yml outputs: label: "" drop_on: error: false error_patterns: [] # No default (optional) back_pressure: "" # No default (optional) output: "" # No default (required) ``` Regular Redpanda Connect outputs will apply back pressure when downstream services aren’t accessible, and Redpanda Connect retries (or nacks) all messages that fail to be delivered. However, in some circumstances, or for certain output types, we instead might want to relax these mechanisms, which is when this output becomes useful. ## [](#fields)Fields ### [](#back_pressure)`back_pressure` An optional duration string that determines the maximum length of time to wait for a given message to be accepted by the child output before the message should be dropped instead. The most common reason for an output to block is when waiting for a lost connection to be re-established. Once a message has been dropped due to back pressure all subsequent messages are dropped immediately until the output is ready to process them again. Note that if `error` is set to `false` and this field is specified then messages dropped due to back pressure will return an error response (are nacked or reattempted). **Type**: `string` ```yaml # Examples: back_pressure: 30s # --- back_pressure: 1m ``` ### [](#error)`error` Whether messages should be dropped when the child output returns an error of any type. For example, this could be when an `http_client` output gets a 4XX response code. In order to instead drop only on specific error patterns use the `error_matches` field instead. **Type**: `bool` **Default**: `false` ### [](#error_patterns)`error_patterns[]` A list of regular expressions (re2) where if the child output returns an error that matches any part of any of these patterns the message will be dropped. **Type**: `array` ```yaml # Examples: error_patterns: - "and that was really bad$" # --- error_patterns: - "roughly [0-9]+ issues occurred" ``` ### [](#output)`output` A child output to wrap with this drop mechanism. **Type**: `output` nclude::redpanda-connect:components:partial$examples/outputs/drop\_on.adoc\[\] --- # Page 157: drop **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/drop.md --- # drop --- title: drop latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/drop page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/drop.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/drop.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/drop/ "View the Self-Managed version of this component") Drops all messages. ```yml outputs: label: "" drop: {} ``` --- # Page 158: elasticsearch_v8 **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/elasticsearch_v8.md --- # elasticsearch\_v8 --- title: elasticsearch_v8 latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/elasticsearch_v8 page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/elasticsearch_v8.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/elasticsearch_v8.adoc page-git-created-date: "2025-03-12" page-git-modified-date: "2025-03-12" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/elasticsearch_v8/ "View the Self-Managed version of this component") Publishes messages into an [Elasticsearch index](https://www.elastic.co/guide/en/elasticsearch/reference/current/documents-indices.html). If the index does not exist, this output creates it using dynamic mapping. > 📝 **NOTE** > > The `elasticsearch_v8` output is based on the the [go-elasticsearch/v8](https://github.com/elastic/go-elasticsearch?tab=readme-ov-file) library. For full information about breaking changes from previous versions, see [Elastic’s Migrating to 8.0 guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/migrating-8.0.html#breaking_80_rest_api_changes). To help configure your own `elasticsearch_v8` output, this page includes [example pipeline configurations](#example-pipelines). #### Common ```yml outputs: label: "" elasticsearch_v8: urls: [] # No default (required) index: "" # No default (required) action: "" # No default (required) id: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" elasticsearch_v8: urls: [] # No default (required) index: "" # No default (required) action: "" # No default (required) id: "" # No default (required) pipeline: "" routing: "" retry_on_conflict: 0 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] max_in_flight: 64 basic_auth: enabled: false username: "" password: "" batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#set-values-dynamically)Set values dynamically You can use [function interpolations](../../../configuration/interpolation/#bloblang-queries) to dynamically set values for the [`id`](#id) and [`index`](#index) fields, as well as other fields where [function interpolations](../../../configuration/interpolation/#bloblang-queries) are supported. When message batches are sent, interpolations are performed per message. ## [](#performance)Performance For improved performance, this output sends: - Multiple messages in parallel. Adjust the `max_in_flight` field value to tune the maximum number of in-flight messages (or message batches). - Messages as batches. You can configure batches at both input and output level. For more information, see [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#action)`action` The action to perform on each document. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). For more information on how the `update` action works, see [Example pipelines](#example-pipelines). **Type**: `string` ### [](#basic_auth)`basic_auth` Configure basic authentication credentials for connecting to Elasticsearch. When enabled, these credentials are sent with each request to authenticate with the cluster. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#batching)`batching` Configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#id)`id` Define the ID for indexed messages. Use [function interpolations](../../../configuration/interpolation/#bloblang-queries) to dynamically create a unique ID for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: id: ${!counter()}-${!timestamp_unix()} ``` ### [](#index)`index` The Elasticsearch index where messages are published. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#pipeline)`pipeline` Specify the ID of a pipeline to preprocess incoming documents before they are published (optional). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#retry_on_conflict)`retry_on_conflict` The number of times to retry an update operation when a version conflict occurs. **Type**: `int` **Default**: `0` ### [](#routing)`routing` The routing key to use for the document. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether to enable TLS for secure connections. Set to `true` to enable TLS encryption. Required to be `true` for other TLS options (like `client_certs`, `root_cas`, etc.) to take effect. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. This output attempts to connect to each URL in the list, in order, until a successful connection is established. If an item in the list contains commas, it is split into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "http://localhost:9200" ``` ## [](#example-pipelines)Example pipelines ### Update documents To update documents in the target index, the top level of the request body must include at least one of the following fields: - `doc`: Performs partial updates on a document. - `upsert`: Updates an existing document or inserts a document if it doesn’t exist. - `script`: Performs an update using a scripting language, such as [Elasticsearch’s Painless scripting language](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html). The following examples show how to configure mapping processors with this output to achieve different types of updates. Example 1: Partial document update ```yaml output: processors: # Sets the metadata ID field to the message ID then # performs a partial update on the document. - mapping: | meta id = this.id root.doc = this elasticsearch_v8: urls: [localhost:9200] # The URL of the Elasticsearch server. index: my_target_index # The name of the Elasticsearch index. id: ${! @id } # Sets the document ID to the value of the metadata ID field. action: update # The action to perform on each document. ``` Example 2: Scripted update ```yaml output: processors: # Sets the metadata ID field to the message ID then # increments the counter field by `1` using a script. - mapping: | meta id = this.id root.script.source = "ctx._source.counter += 1" elasticsearch_v8: urls: [localhost:9200] # The URL of the Elasticsearch server. index: my_target_index # The name of the Elasticsearch index. id: ${! @id } # Sets the document ID to the value of the metadata ID field. action: update # The action to perform on each document. ``` Example 3: Upsert ```yaml output: processors: # Sets the metadata ID field to the message ID. # If the product with the specified ID exists, update its product_price to 100. # If the document does not exist, insert a new document with the ID set to 1 # and the `product_price` set to 50. - mapping: | meta id = this.id root.doc.product_price = 100 root.upsert.product_price = 50 elasticsearch_v8: urls: [localhost:9200] # The URL of the Elasticsearch server. index: my_target_index # The name of the Elasticsearch index. id: ${! @id } # Sets the document ID to the value of the metadata ID field. action: update # The action to perform on each document. ``` For more information on the structures and behaviors of `doc`, `upsert`, and `script` fields, see the [Elasticsearch Update API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html). ### Index documents from Redpanda Reads messages from a Redpanda cluster and writes them to an Elasticsearch index using a field from the message as the document ID. ```yaml # Reads messages from a Redpanda cluster. input: redpanda: seed_brokers: [localhost:19092] # The address of the Redpanda broker. topics: ["product_code"] # The topic to consume messages from. consumer_group: "rpcn3" # The consumer group ID. processors: # Sets the metadata ID field to the message ID and # sets the root of the message to the message content. - mapping: | meta id = this.id root = this # Writes messages to the specified Elasticsearch index. output: elasticsearch_v8: urls: ['http://localhost:9200'] # The URL of the Elasticsearch server. index: "product_code" # The name of the Elasticsearch index. action: "index" # The action to perform on each document. id: ${! meta("id") } # Sets the document ID to the value of the metadata ID field. ``` ### Index documents from AWS S3 Reads messages from a AWS S3 bucket and writes them to an Elasticsearch index using the S3 key as the ID for the Elasticsearch document. ```yaml # Reads messages from an AWS S3 bucket. input: aws_s3: bucket: "my_bucket" # The name of the S3 bucket. prefix: "prod_inventory/" # A prefix to filter objects in the bucket. scanner: to_the_end: {} # Scans the bucket to the end. # Writes messages to the specified Elasticsearch index. output: elasticsearch_v8: urls: ['http://localhost:9200'] # The URL of the Elasticsearch server. index: "current_prod_inventory" # The name of the Elasticsearch index. action: "index" # The action to perform on each document. id: ${! meta("s3_key") } # Sets the document ID to the S3 key. ``` --- # Page 159: fallback **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/fallback.md --- # fallback --- title: fallback latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/fallback page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/fallback.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/fallback.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/fallback/ "View the Self-Managed version of this component") Attempts to send each message to a child output, starting from the first output on the list. If an output attempt fails then the next output in the list is attempted, and so on. ```yml outputs: label: "" fallback: - label: "" stdout: codec: lines - label: "" file: path: /tmp/fallback.txt codec: lines ``` This pattern is useful for triggering events in the case where certain output targets have broken. For example, if you had an output type `http_client` but wished to reroute messages whenever the endpoint becomes unreachable you could use this pattern: ```yaml output: fallback: - http_client: url: http://foo:4195/post/might/become/unreachable retries: 3 retry_period: 1s - http_client: url: http://bar:4196/somewhere/else retries: 3 retry_period: 1s processors: - mapping: 'root = "failed to send this message to foo: " + content()' - file: path: /usr/local/benthos/everything_failed.jsonl ``` ## [](#metadata)Metadata When a given output fails the message routed to the following output will have a metadata value named `fallback_error` containing a string error message outlining the cause of the failure. The content of this string will depend on the particular output and can be used to enrich the message or provide information used to broker the data to an appropriate output using something like a `switch` output. ## [](#batching)Batching When an output within a fallback sequence uses batching, like so: ```yaml output: fallback: - aws_dynamodb: table: foo string_columns: id: ${!json("id")} content: ${!content()} batching: count: 10 period: 1s - file: path: /usr/local/benthos/failed_stuff.jsonl ``` Redpanda Connect makes a best attempt at inferring which specific messages of the batch failed, and only propagates those individual messages to the next fallback tier. However, depending on the output and the error returned it is sometimes not possible to determine the individual messages that failed, in which case the whole batch is passed to the next tier in order to preserve at-least-once delivery guarantees. --- # Page 160: gcp_bigquery **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/gcp_bigquery.md --- # gcp\_bigquery --- title: gcp_bigquery latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/gcp_bigquery page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/gcp_bigquery.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/gcp_bigquery.adoc categories: "[\"GCP\",\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/gcp_bigquery/ "View the Self-Managed version of this component") Inserts message data as new rows in a Google Cloud BigQuery table. #### Common ```yml outputs: label: "" gcp_bigquery: project: "" job_project: "" dataset: "" # No default (required) table: "" # No default (required) format: NEWLINE_DELIMITED_JSON max_in_flight: 64 job_labels: {} credentials_json: "" csv: header: [] field_delimiter: , allow_jagged_rows: false allow_quoted_newlines: false encoding: UTF-8 skip_leading_rows: 1 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" gcp_bigquery: project: "" job_project: "" dataset: "" # No default (required) table: "" # No default (required) format: NEWLINE_DELIMITED_JSON max_in_flight: 64 write_disposition: WRITE_APPEND create_disposition: CREATE_IF_NEEDED ignore_unknown_values: false max_bad_records: 0 auto_detect: false job_labels: {} credentials_json: "" csv: header: [] field_delimiter: , allow_jagged_rows: false allow_quoted_newlines: false encoding: UTF-8 skip_leading_rows: 1 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#credentials)Credentials By default, Redpanda Connect uses a [shared credentials file](../../../guides/cloud/gcp/) when connecting to GCP services. ## [](#format)Format The `gcp_bigquery` output currently supports only `NEWLINE_DELIMITED_JSON`, `CSV` and `PARQUET` formats. To learn more about how to use BigQuery with these formats, see the following documentation: - [`NEWLINE_DELIMITED_JSON`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json) - [`CSV`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv) - [`PARQUET`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet) ### [](#newline-delimited-json)Newline-delimited JSON Each JSON message may contain multiple elements separated by newlines. For example, a single message containing: ```json {"key": "1"} {"key": "2"} ``` Is equivalent to two separate messages: ```json {"key": "1"} ``` And: ```json {"key": "2"} ``` The same is true for the CSV format. ### [](#csv)CSV When the field `csv.header` is specified for the `CSV` format, a header row is inserted as the first line of each message batch. If this field is not provided, then the first message of each message batch must include a header line. ### [](#parquet)Parquet Each message sent to this output must be a Parquet file. You can use the [`parquet_encode` processor](../../processors/parquet_encode/) to convert message data into the correct format. For example: ```yaml input: generate: mapping: | root = { "foo": random_int(), "bar": uuid_v4(), "time": now(), } interval: 0 count: 1000 batch_size: 1000 pipeline: processors: - parquet_encode: schema: - name: foo type: INT64 - name: bar type: UTF8 - name: time type: UTF8 default_compression: zstd output: gcp_bigquery: project: "${PROJECT}" dataset: "my_bq_dataset" table: "redpanda_connect_ingest" format: PARQUET ``` ## [](#performance)Performance The `gcp_bigquery` output benefits from sending multiple messages in parallel for improved performance. You can tune the maximum number of in-flight messages (or message batches) with the field `max_in_flight`. This output also sends messages as a batch for improved performance. Redpanda Connect can form batches at both the input and output level. For more information, see [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#auto_detect)`auto_detect` Whether this component automatically infers the options and schema for `CSV` and `NEWLINE_DELIMITED_JSON` sources. If this value is set to `false` and the destination table doesn’t exist, the output throws an insertion error as it is unable to insert data. > ⚠️ **CAUTION** > > This field delegates schema detection to the GCP BigQuery service. For the `CSV` format, values like `no` may be treated as booleans. **Type**: `bool` **Default**: `false` ### [](#batching)`batching` Configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#create_disposition)`create_disposition` Specifies the circumstances under which a destination table is created. - Use `CREATE_IF_NEEDED` to create the destination table if it does not already exist. Tables are created atomically on successful completion of a job. - Use `CREATE_NEVER` if the destination table must already exist. **Type**: `string` **Default**: `CREATE_IF_NEEDED` **Options**: `CREATE_IF_NEEDED`, `CREATE_NEVER` ### [](#credentials_json)`credentials_json` Sets the [Google Service Account Credentials JSON](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account) (optional). > ⚠️ **WARNING** > > When using [interpolation functions](../../../configuration/interpolation/#bloblang-queries) to populate this field, wrap the function in single quotes, not double quotes. For example, use `'${secrets.GCP_CREDENTIALS_JSON}'` instead of `"${secrets.GCP_CREDENTIALS_JSON}"`. Double quotes cause JSON parsing errors because the credentials already contain JSON content. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#csv-2)`csv` Specify how CSV data is interpreted. **Type**: `object` ### [](#csv-allow_jagged_rows)`csv.allow_jagged_rows` Set to `true` to treat optional missing trailing columns as nulls in CSV data. **Type**: `bool` **Default**: `false` ### [](#csv-allow_quoted_newlines)`csv.allow_quoted_newlines` Whether quoted data sections containing new lines are allowed when reading CSV data. **Type**: `bool` **Default**: `false` ### [](#csv-encoding)`csv.encoding` The character encoding of CSV data. **Type**: `string` **Default**: `UTF-8` **Options**: `UTF-8`, `ISO-8859-1` ### [](#csv-field_delimiter)`csv.field_delimiter` The separator for fields in a CSV file. The output uses this value when reading or exporting data. **Type**: `string` **Default**: `,` ### [](#csv-header)`csv.header[]` A list of values to use as the header for each batch of messages. If not specified, the first line of each message is used as the header. **Type**: `array` **Default**: `[]` ### [](#csv-skip_leading_rows)`csv.skip_leading_rows` The number of rows at the top of a CSV file that BigQuery will skip when reading data. The default value is `1`, which allows Redpanda Connect to add the specified header in the first line of each batch sent to BigQuery. **Type**: `int` **Default**: `1` ### [](#dataset)`dataset` The BigQuery Dataset ID. **Type**: `string` ### [](#format-2)`format` The format of each incoming message. **Type**: `string` **Default**: `NEWLINE_DELIMITED_JSON` **Options**: `NEWLINE_DELIMITED_JSON`, `CSV`, `PARQUET` ### [](#ignore_unknown_values)`ignore_unknown_values` Set this value to `true` to ignore values that do not match the schema: - For the `CSV` format, extra values at the end of a line are ignored. - For the `NEWLINE_DELIMITED_JSON` format, values that do not match any column name are ignored. By default, this value is set to `false`, and records containing unknown values are treated as bad records. Use the `max_bad_records` field to customize how bad records are handled. **Type**: `bool` **Default**: `false` ### [](#job_labels)`job_labels` A list of labels to add to the load job. **Type**: `string` **Default**: `{}` ### [](#job_project)`job_project` Specify the project ID in which jobs are executed. If not set, the `project` value is used. **Type**: `string` **Default**: `""` ### [](#max_bad_records)`max_bad_records` The maximum number of bad records to ignore when reading data and [`ignore_unknown_values`](#ignore_unknown_values) is set to `true`. **Type**: `int` **Default**: `0` ### [](#max_in_flight)`max_in_flight` The maximum number of message batches to have in flight at a given time. Increase this value to improve throughput. **Type**: `int` **Default**: `64` ### [](#project)`project` Specify the project ID of the dataset to insert data into. If not set, the project ID is inferred from the project linked to the service account or read from the `GOOGLE_CLOUD_PROJECT` environment variable. **Type**: `string` **Default**: `""` ### [](#table)`table` The table to insert messages into. **Type**: `string` ### [](#write_disposition)`write_disposition` Specifies how existing data in a destination table is treated. **Type**: `string` **Default**: `WRITE_APPEND` **Options**: `WRITE_APPEND`, `WRITE_EMPTY`, `WRITE_TRUNCATE` --- # Page 161: gcp_cloud_storage **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/gcp_cloud_storage.md --- # gcp\_cloud\_storage --- title: gcp_cloud_storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/gcp_cloud_storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/gcp_cloud_storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/gcp_cloud_storage.adoc categories: "[\"Services\",\"GCP\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/gcp_cloud_storage/)[Cache](/redpanda-cloud/develop/connect/components/caches/gcp_cloud_storage/)[Input](/redpanda-cloud/develop/connect/components/inputs/gcp_cloud_storage/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/gcp_cloud_storage/ "View the Self-Managed version of this component") Sends message parts as objects to a Google Cloud Storage bucket. Each object is uploaded with the path specified with the `path` field. #### Common ```yml outputs: label: "" gcp_cloud_storage: bucket: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt content_type: application/octet-stream collision_mode: overwrite timeout: 3s credentials_json: "" max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" gcp_cloud_storage: bucket: "" # No default (required) path: ${!counter()}-${!timestamp_unix_nano()}.txt content_type: application/octet-stream content_encoding: "" collision_mode: overwrite chunk_size: 16777216 timeout: 3s credentials_json: "" max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` In order to have a different path for each object you should use function interpolations described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries), which are calculated per message of a batch. ## [](#metadata)Metadata Metadata fields on messages will be sent as headers, in order to mutate these values (or remove them) check out the [metadata docs](../../../configuration/metadata/). ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to GCP services. You can find out more in [Google Cloud Platform](../../../guides/cloud/gcp/). ## [](#batching)Batching It’s common to want to upload messages to Google Cloud Storage as batched archives, the easiest way to do this is to batch your messages at the output level and join the batch of messages with an [`archive`](../../processors/archive/) and/or [`compress`](../../processors/compress/) processor. For example, if we wished to upload messages as a .tar.gz archive of documents we could achieve that with the following config: ```yaml output: gcp_cloud_storage: bucket: TODO path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz batching: count: 100 period: 10s processors: - archive: format: tar - compress: algorithm: gzip ``` Alternatively, if we wished to upload JSON documents as a single large document containing an array of objects we can do that with: ```yaml output: gcp_cloud_storage: bucket: TODO path: ${!counter()}-${!timestamp_unix_nano()}.json batching: count: 100 processors: - archive: format: json_array ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching-2)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#bucket)`bucket` The bucket to upload messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#chunk_size)`chunk_size` An optional chunk size which controls the maximum number of bytes of the object that the Writer will attempt to send to the server in a single request. If ChunkSize is set to zero, chunking will be disabled. **Type**: `int` **Default**: `16777216` ### [](#collision_mode)`collision_mode` Determines how file path collisions should be dealt with. Options are "overwrite", which replaces the existing file with the new one, "append", which appends the message bytes to the original file, "error-if-exists", which returns an error and rejects the message if the file exists, and "ignore", does not modify the original file and drops the message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `overwrite` **Options**: `overwrite`, `append`, `error-if-exists`, `ignore` ### [](#content_encoding)`content_encoding` An optional content encoding to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#content_type)`content_type` The content type to set for each object. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `application/octet-stream` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of message batches to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#path)`path` The path of each message to upload. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${!counter()}-${!timestamp_unix_nano()}.txt` ```yaml # Examples: path: ${!counter()}-${!timestamp_unix_nano()}.txt # --- path: ${!meta("kafka_key")}.json # --- path: ${!json("doc.namespace")}/${!json("doc.id")}.json ``` ### [](#timeout)`timeout` The maximum period to wait on an upload before abandoning it and reattempting. **Type**: `string` **Default**: `3s` ```yaml # Examples: timeout: 1s # --- timeout: 500ms ``` --- # Page 162: gcp_pubsub **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/gcp_pubsub.md --- # gcp\_pubsub --- title: gcp_pubsub latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/gcp_pubsub page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/gcp_pubsub.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/gcp_pubsub.adoc categories: "[\"Services\",\"GCP\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/gcp_pubsub/)[Input](/redpanda-cloud/develop/connect/components/inputs/gcp_pubsub/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/gcp_pubsub/ "View the Self-Managed version of this component") Sends messages to a GCP Cloud Pub/Sub topic. [Metadata](../../../configuration/metadata/) from messages are sent as attributes. #### Common ```yml outputs: label: "" gcp_pubsub: project: "" # No default (required) credentials_json: "" topic: "" # No default (required) endpoint: "" max_in_flight: 64 count_threshold: 100 delay_threshold: 10ms byte_threshold: 1000000 metadata: exclude_prefixes: [] batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" gcp_pubsub: project: "" # No default (required) credentials_json: "" topic: "" # No default (required) endpoint: "" ordering_key: "" # No default (optional) max_in_flight: 64 count_threshold: 100 delay_threshold: 10ms byte_threshold: 1000000 publish_timeout: 1m0s validate_topic: true metadata: exclude_prefixes: [] flow_control: max_outstanding_bytes: -1 max_outstanding_messages: 1000 limit_exceeded_behavior: block batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` For information on how to set up credentials, see [this guide](https://cloud.google.com/docs/authentication/production). ## [](#troubleshooting)Troubleshooting If you’re consistently seeing `Failed to send message to gcp_pubsub: context deadline exceeded` error logs without any further information it is possible that you are encountering [https://github.com/benthosdev/benthos/issues/1042](https://github.com/benthosdev/benthos/issues/1042), which occurs when metadata values contain characters that are not valid utf-8. This can frequently occur when consuming from Kafka as the key metadata field may be populated with an arbitrary binary value, but this issue is not exclusive to Kafka. If you are blocked by this issue then a work around is to delete either the specific problematic keys: ```yaml pipeline: processors: - mapping: | meta kafka_key = deleted() ``` Or delete all keys with: ```yaml pipeline: processors: - mapping: meta = deleted() ``` ## [](#fields)Fields ### [](#batching)`batching` Configures a batching policy on this output. While the PubSub client maintains its own internal buffering mechanism, preparing larger batches of messages can further trade-off some latency for throughput. **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#byte_threshold)`byte_threshold` Publish a batch when its size in bytes reaches this value. **Type**: `int` **Default**: `1000000` ### [](#count_threshold)`count_threshold` Publish a pubsub buffer when it has this many messages **Type**: `int` **Default**: `100` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#delay_threshold)`delay_threshold` Publish a non-empty pubsub buffer after this delay has passed. **Type**: `string` **Default**: `10ms` ### [](#endpoint)`endpoint` An optional endpoint to override the default of `pubsub.googleapis.com:443`. This can be used to connect to a region specific pubsub endpoint. For a list of valid values, see [this document](https://cloud.google.com/pubsub/docs/reference/service_apis_overview#list_of_regional_endpoints). **Type**: `string` **Default**: `""` ```yaml # Examples: endpoint: us-central1-pubsub.googleapis.com:443 # --- endpoint: us-west3-pubsub.googleapis.com:443 ``` ### [](#flow_control)`flow_control` For a given topic, configures the PubSub client’s internal buffer for messages to be published. **Type**: `object` ### [](#flow_control-limit_exceeded_behavior)`flow_control.limit_exceeded_behavior` Configures the behavior when trying to publish additional messages while the flow controller is full. The available options are block (default), ignore (disable), and signal\_error (publish results will return an error). **Type**: `string` **Default**: `block` **Options**: `ignore`, `block`, `signal_error` ### [](#flow_control-max_outstanding_bytes)`flow_control.max_outstanding_bytes` Maximum size of buffered messages to be published. If less than or equal to zero, this is disabled. **Type**: `int` **Default**: `-1` ### [](#flow_control-max_outstanding_messages)`flow_control.max_outstanding_messages` Maximum number of buffered messages to be published. If less than or equal to zero, this is disabled. **Type**: `int` **Default**: `1000` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increasing this may improve throughput. **Type**: `int` **Default**: `64` ### [](#metadata)`metadata` Specify criteria for which metadata values are sent as attributes, all are sent by default. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#ordering_key)`ordering_key` The ordering key to use for publishing messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#project)`project` The project ID of the topic to publish to. **Type**: `string` ### [](#publish_timeout)`publish_timeout` The maximum length of time to wait before abandoning a publish attempt for a message. **Type**: `string` **Default**: `1m0s` ```yaml # Examples: publish_timeout: 10s # --- publish_timeout: 5m # --- publish_timeout: 60m ``` ### [](#topic)`topic` The topic to publish to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#validate_topic)`validate_topic` Whether to validate the existence of the topic before publishing. If set to false and the topic does not exist, messages will be lost. **Type**: `bool` **Default**: `true` --- # Page 163: http_client **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/http_client.md --- # http\_client --- title: http_client latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/http_client page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/http_client.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/http_client.adoc page-git-created-date: "2025-03-04" page-git-modified-date: "2025-03-04" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/http_client/)[Input](/redpanda-cloud/develop/connect/components/inputs/http_client/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/http_client/ "View the Self-Managed version of this component") Sends messages to a HTTP server. #### Common ```yml outputs: label: "" http_client: url: "" # No default (required) verb: POST headers: {} rate_limit: "" # No default (optional) timeout: 5s max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml # All configuration fields, showing default values output: label: "" http_client: url: "" # No default (required) verb: POST headers: {} metadata: include_prefixes: [] include_patterns: [] dump_request_log_level: "" # Optional oauth: enabled: false consumer_key: "" # Optional consumer_secret: "" # Optional access_token: "" # Optional access_token_secret: "" # Optional oauth2: enabled: false client_key: "" # Optional client_secret: "" # Optional token_url: "" # Optional scopes: [] endpoint_params: {} basic_auth: enabled: false username: "" # Optional password: "" # Optional jwt: enabled: false private_key_file: "" # Optional signing_method: "" # Optional claims: {} headers: {} tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] rate_limit: "" # No default (optional) timeout: 5s retry_period: 1s max_retry_backoff: 300s retries: 3 backoff_on: - 429 drop_on: [] successful_on: [] proxy_url: "" # No default (optional) batch_as_multipart: false max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" # Optional check: "" # Optional processors: [] # No default (optional) multipart: [] ``` ## [](#message-sends)Message sends The body of the request sent to the HTTP server is the raw contents of the message payload. If the message has multiple parts (is a batch), the request is sent according to [RFC1341](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html). To disable this behavior, set the [`batch_as_multipart`](#batch_as_multipart) field to `false`. When message retries are exhausted, this output rejects a message. Typically, a pipeline then continues attempts to send the message until it succeeds, whilst applying back pressure. ## [](#dynamic-url-and-header-settings)Dynamic URL and header settings You can set the [`url`](#url) and [`headers`](#headers) values dynamically using [function interpolations](../../../configuration/interpolation/#bloblang-queries). ## [](#performance)Performance For improved performance, this output sends: - Multiple messages in parallel. Adjust the `max_in_flight` field value to tune the maximum number of in-flight messages (or message batches). - Messages as batches. You can configure batches at both input and output level. For more information, see [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#backoff_on)`backoff_on[]` A list of status codes that indicate a request failure and trigger retries with an increasing backoff period between attempts. **Type**: `int` **Default**: ```yaml - 429 ``` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#batch_as_multipart)`batch_as_multipart` When set to `true`, sends all message in a batch as a single request using [RFC1341](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html). When set to `false`, sends messages in a batch as individual requests. **Type**: `bool` **Default**: `false` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#disable_http2)`disable_http2` Whether to disable HTTP/2. By default, HTTP/2 is enabled. **Type**: `bool` **Default**: `false` ### [](#drop_on)`drop_on[]` A list of status codes that indicate a request failure where the input should not attempt retries. This helps avoid unnecessary retries for requests that are unlikely to succeed. > 📝 **NOTE** > > In these cases, the _request_ is dropped, but the _message_ that triggered the request is retained. **Type**: `int` **Default**: `[]` ### [](#dump_request_log_level)`dump_request_log_level` EXPERIMENTAL: Set the logging level for the request and response payloads of each HTTP request. **Type**: `string` **Default**: `""` **Options**: `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR`, `FATAL`, \`\` ### [](#follow_redirects)`follow_redirects` Whether or not to transparently follow redirects, i.e. responses with 300-399 status codes. If disabled, the response message will contain the body, status, and headers from the redirect response and the processor will not make a request to the URL set in the Location header of the response. **Type**: `bool` **Default**: `true` ### [](#headers)`headers` A map of headers to add to the request. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/octet-stream traceparent: ${! tracing_span().traceparent } ``` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication. This feature is in beta and may change in future releases. JWT tokens provide secure, stateless authentication between services. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Additional key-value pairs to include in the JWT header (optional). These headers provide extra metadata for JWT processing. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_retry_backoff)`max_retry_backoff` The maximum period to wait between failed requests. **Type**: `string` **Default**: `300s` ### [](#metadata)`metadata` Specify matching rules that determine which metadata keys to add to the HTTP request as headers (optional). **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#multipart)`multipart[]` EXPERIMENTAL: Create explicit multipart HTTP requests by specifying an array of parts to add to a request. Each part consists of content headers and a data field, which can be populated dynamically. If populated, this field overrides the [default request creation behavior](#message-sends). **Type**: `object` **Default**: `[]` ### [](#multipart-body)`multipart[].body` The body of the individual message part. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: body: ${! this.data.part1 } ``` ### [](#multipart-content_disposition)`multipart[].content_disposition` The content disposition of the individual message part. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: content_disposition: form-data; name="bin"; filename='${! @AttachmentName } ``` ### [](#multipart-content_type)`multipart[].content_type` The content type of the individual message part. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: content_type: application/bin ``` ### [](#oauth)`oauth` Configure OAuth version 1.0 authentication for secure API access. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` The value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2)`oauth2` Allows you to specify open authentication using OAuth version 2 and the client credentials token flow. **Type**: `object` ### [](#oauth2-client_key)`oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#oauth2-client_secret)`oauth2.client_secret` The secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth2-enabled)`oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2-endpoint_params)`oauth2.endpoint_params` A list of endpoint parameters specified as arrays of strings (optional). **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: bar: - woof foo: - meow - quack ``` ### [](#oauth2-scopes)`oauth2.scopes[]` A list of requested permissions (optional). **Type**: `array` **Default**: `[]` ### [](#oauth2-token_url)`oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#proxy_url)`proxy_url` A HTTP proxy URL (optional). **Type**: `string` ### [](#rate_limit)`rate_limit` A [rate limit](../../rate_limits/about/) to throttle requests by (optional). **Type**: `string` ### [](#retries)`retries` The maximum number of retry attempts to make. **Type**: `int` **Default**: `3` ### [](#retry_period)`retry_period` The initial period to wait between failed requests before retrying. **Type**: `string` **Default**: `1s` ### [](#successful_on)`successful_on[]` A list of HTTP status codes that should be considered as successful, even if they are not 2XX codes. This is useful for handling cases where non-2XX codes indicate that the request was processed successfully, such as `303 See Other` or `409 Conflict`. By default, all 2XX codes are considered successful unless they are specified in `backoff_on` or `drop_on` fields. **Type**: `int` **Default**: `[]` ### [](#timeout)`timeout` A static timeout to apply to requests. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL to connect to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#verb)`verb` A verb to connect with. **Type**: `string` **Default**: `POST` ```yaml # Examples: verb: POST # --- verb: GET # --- verb: DELETE ``` --- # Page 164: iceberg **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/iceberg.md --- # iceberg --- title: iceberg latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/iceberg page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/iceberg.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/iceberg.adoc categories: "[\"Services\",\"AWS\",\"GCP\",\"Azure\"]" description: Fan out Redpanda topics to Apache Iceberg tables using the REST catalog API. page-git-created-date: "2026-03-05" page-git-modified-date: "2026-03-05" --- Fan out Redpanda topics to Apache Iceberg tables using the REST catalog API. This output is well suited for migrating fanout pipelines from Kafka Connect to Redpanda Connect, and supports: - Multiple storage backends (S3, GCS, Azure) - Automatic table creation with schema detection - Partition transforms (year, month, day, hour, bucket, truncate) - Schema evolution (automatic column addition) - Transaction retry logic for concurrent writes ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ### Common ```yml outputs: label: "" iceberg: catalog: url: "" # No default (required) warehouse: "" # No default (optional) auth: oauth2: server_uri: /v1/oauth/tokens client_id: "" # No default (required) client_secret: "" # No default (required) scope: "" # No default (optional) bearer: "" # No default (optional) aws_sigv4: region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) service: "" # No default (optional) headers: "" # No default (optional) tls_skip_verify: false namespace: "" # No default (required) table: "" # No default (required) storage: aws_s3: bucket: "" # No default (required) region: "" # No default (optional) endpoint: "" # No default (optional) force_path_style_urls: false credentials: id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) gcp_cloud_storage: bucket: "" # No default (required) endpoint: "" # No default (optional) credentials_type: "" # No default (optional) credentials_file: "" # No default (optional) credentials_json: "" # No default (optional) azure_blob_storage: storage_account: "" # No default (required) container: "" # No default (required) endpoint: "" # No default (optional) storage_sas_token: "" # No default (optional) storage_connection_string: "" # No default (optional) storage_access_key: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 4 ``` ### Advanced ```yml outputs: label: "" iceberg: catalog: url: "" # No default (required) warehouse: "" # No default (optional) auth: oauth2: server_uri: /v1/oauth/tokens client_id: "" # No default (required) client_secret: "" # No default (required) scope: "" # No default (optional) bearer: "" # No default (optional) aws_sigv4: region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) service: "" # No default (optional) headers: "" # No default (optional) tls_skip_verify: false namespace: "" # No default (required) table: "" # No default (required) storage: aws_s3: bucket: "" # No default (required) region: "" # No default (optional) endpoint: "" # No default (optional) force_path_style_urls: false credentials: id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) gcp_cloud_storage: bucket: "" # No default (required) endpoint: "" # No default (optional) credentials_type: "" # No default (optional) credentials_file: "" # No default (optional) credentials_json: "" # No default (optional) azure_blob_storage: storage_account: "" # No default (required) container: "" # No default (required) endpoint: "" # No default (optional) storage_sas_token: "" # No default (optional) storage_connection_string: "" # No default (optional) storage_access_key: "" # No default (optional) schema_evolution: enabled: false partition_spec: () table_location: "" # No default (optional) schema_metadata: "" new_column_type_mapping: "" # No default (optional) commit: manifest_merge_enabled: true max_snapshot_age: 24h max_retries: 3 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 4 ``` ## [](#catalog-integration)Catalog integration This output works with REST catalog implementations including Apache Polaris, AWS Glue Data Catalog, and Databricks Unity Catalog. ### [](#apache-polaris)Apache Polaris To use with [Apache Polaris](https://polaris.apache.org): - Set `catalog.url` to the Polaris REST endpoint (e.g., `[http://localhost:8181/api/catalog](http://localhost:8181/api/catalog)`). - Set `catalog.warehouse` to the catalog name configured in Polaris. - Configure `catalog.auth.oauth2` with client credentials granted access to the catalog. ### [](#aws-glue-data-catalog)AWS Glue Data Catalog To use with AWS Glue Data Catalog: - Set `catalog.url` to `[https://glue..amazonaws.com/iceberg](https://glue..amazonaws.com/iceberg)` (the REST client appends the API version automatically). - Set `catalog.warehouse` to your AWS account ID (the Glue catalog identifier). - Set `schema_evolution.table_location` to an S3 prefix (e.g., `s3://my-bucket/`) since Glue does not automatically assign table locations. - Configure `catalog.auth.aws_sigv4` with the appropriate region and set `service` to `glue`. - Configure `storage.aws_s3` with the same bucket and region. ### [](#azure-blob-storage-adls-gen2)Azure Blob Storage (ADLS Gen2) To use with Azure Data Lake Storage Gen2: - Configure `storage.azure_blob_storage` with your storage account name and container. - Authenticate using one of: `storage_access_key` (shared key), `storage_sas_token`, or `storage_connection_string`. - The storage account must have hierarchical namespace (HNS) enabled for ADLS Gen2 compatibility. ## [](#type-mapping)Type mapping | Bloblang type | Iceberg type | | --- | --- | | string | string | | bytes | binary | | bool | boolean | | number | double | | timestamp | timestamp (with timezone) | | object | struct | | array | list | ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#catalog)`catalog` REST catalog configuration. **Type**: `object` ### [](#catalog-auth)`catalog.auth` Authentication configuration for the REST catalog. Only one authentication method can be active at a time. **Type**: `object` ### [](#catalog-auth-aws_sigv4)`catalog.auth.aws_sigv4` AWS SigV4 authentication (for AWS Glue Data Catalog or API Gateway). **Type**: `object` ### [](#catalog-auth-aws_sigv4-credentials)`catalog.auth.aws_sigv4.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#catalog-auth-aws_sigv4-credentials-from_ec2_role)`catalog.auth.aws_sigv4.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#catalog-auth-aws_sigv4-credentials-id)`catalog.auth.aws_sigv4.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#catalog-auth-aws_sigv4-credentials-profile)`catalog.auth.aws_sigv4.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#catalog-auth-aws_sigv4-credentials-role)`catalog.auth.aws_sigv4.credentials.role` A role ARN to assume. **Type**: `string` ### [](#catalog-auth-aws_sigv4-credentials-role_external_id)`catalog.auth.aws_sigv4.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#catalog-auth-aws_sigv4-credentials-secret)`catalog.auth.aws_sigv4.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#catalog-auth-aws_sigv4-credentials-token)`catalog.auth.aws_sigv4.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#catalog-auth-aws_sigv4-endpoint)`catalog.auth.aws_sigv4.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#catalog-auth-aws_sigv4-region)`catalog.auth.aws_sigv4.region` The AWS region to target. **Type**: `string` ### [](#catalog-auth-aws_sigv4-service)`catalog.auth.aws_sigv4.service` AWS service name for SigV4 signing. **Type**: `string` ### [](#catalog-auth-aws_sigv4-tcp)`catalog.auth.aws_sigv4.tcp` TCP socket configuration. **Type**: `object` ### [](#catalog-auth-aws_sigv4-tcp-connect_timeout)`catalog.auth.aws_sigv4.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#catalog-auth-aws_sigv4-tcp-keep_alive)`catalog.auth.aws_sigv4.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#catalog-auth-aws_sigv4-tcp-keep_alive-count)`catalog.auth.aws_sigv4.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#catalog-auth-aws_sigv4-tcp-keep_alive-idle)`catalog.auth.aws_sigv4.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#catalog-auth-aws_sigv4-tcp-keep_alive-interval)`catalog.auth.aws_sigv4.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#catalog-auth-aws_sigv4-tcp-tcp_user_timeout)`catalog.auth.aws_sigv4.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#catalog-auth-bearer)`catalog.auth.bearer` Static bearer token for authentication. For testing only, not recommended for production. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#catalog-auth-oauth2)`catalog.auth.oauth2` OAuth2 authentication configuration. **Type**: `object` ### [](#catalog-auth-oauth2-client_id)`catalog.auth.oauth2.client_id` OAuth2 client identifier. **Type**: `string` ### [](#catalog-auth-oauth2-client_secret)`catalog.auth.oauth2.client_secret` OAuth2 client secret. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#catalog-auth-oauth2-scope)`catalog.auth.oauth2.scope` OAuth2 scope to request. **Type**: `string` ### [](#catalog-auth-oauth2-server_uri)`catalog.auth.oauth2.server_uri` OAuth2 token endpoint URI. **Type**: `string` **Default**: `/v1/oauth/tokens` ### [](#catalog-headers)`catalog.headers` Custom HTTP headers to include in all requests to the catalog. **Type**: `object` ```yaml # Examples: headers: X-Api-Key: your-api-key ``` ### [](#catalog-tls_skip_verify)`catalog.tls_skip_verify` Skip TLS certificate verification. Not recommended for production. **Type**: `bool` **Default**: `false` ### [](#catalog-url)`catalog.url` The REST catalog endpoint URL. **Type**: `string` ```yaml # Examples: url: http://localhost:8181/api/catalog # --- url: https://polaris.example.com/api/catalog # --- url: https://glue.us-east-1.amazonaws.com/iceberg ``` ### [](#catalog-warehouse)`catalog.warehouse` The REST catalog warehouse. **Type**: `string` ```yaml # Examples: warehouse: redpanda-catalog ``` ### [](#commit)`commit` Commit behavior configuration. **Type**: `object` ### [](#commit-manifest_merge_enabled)`commit.manifest_merge_enabled` Merge small manifest files during commits to reduce metadata overhead. **Type**: `bool` **Default**: `true` ### [](#commit-max_retries)`commit.max_retries` Maximum number of times to retry a failed transaction commit. **Type**: `int` **Default**: `3` ### [](#commit-max_snapshot_age)`commit.max_snapshot_age` Maximum age of snapshots to retain for time-travel queries. Set to zero to disable removing old snapshots. **Type**: `string` **Default**: `24h` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `4` ### [](#namespace)`namespace` The Iceberg namespace for the table, dot delimiters are split as nested namespaces. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: namespace: analytics.events # --- namespace: production ``` ### [](#schema_evolution)`schema_evolution` Schema evolution configuration. **Type**: `object` ### [](#schema_evolution-enabled)`schema_evolution.enabled` Enable automatic schema evolution. When enabled, new columns will be automatically added to the table. **Type**: `bool` **Default**: `false` ### [](#schema_evolution-new_column_type_mapping)`schema_evolution.new_column_type_mapping` An optional Bloblang mapping to customize column types during schema evolution. This mapping is executed for each new column and can override the inferred or schema-metadata-derived type. The mapping receives an object with fields `name` (column name), `path` (dot-separated path), `value` (sample value), `inferred_type` (the type that would be used without this mapping), `message` (the full message body), `namespace`, and `table`. It must return a string with a valid Iceberg type name: `boolean`, `int`, `long`, `float`, `double`, `string`, `binary`, `date`, `time`, `timestamp`, `timestamptz`, `uuid`, `decimal(p,s)`, or `fixed[n]`. **Type**: `string` ### [](#schema_evolution-partition_spec)`schema_evolution.partition_spec` A Bloblang expression to evaluate when a new table is created to determine the table’s partition spec. The result of the mapping should be an iceberg partition spec in the same string format as the [^Redpanda Streaming Topic Property](https://docs.redpanda.com/current/manage/iceberg/about-iceberg-topics/#use-custom-partitioning) This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `()` ```yaml # Examples: partition_spec: (col1) # --- partition_spec: (nested.col) # --- partition_spec: (year(my_ts_col)) # --- partition_spec: (year(my_ts_col), col2) # --- partition_spec: (hour(my_ts_col), truncate(42, col2)) # --- partition_spec: (day(my_ts_col), bucket(4, nested.col)) # --- partition_spec: (day(my_ts_col), void(`non.nested column.with.dots`), identity(nested.column)) ``` ### [](#schema_evolution-schema_metadata)`schema_evolution.schema_metadata` The name of a message metadata field containing a schema definition. When set, the schema is used to determine column types during schema evolution and table creation instead of inferring types from values. The schema must be in the standard common schema format (the same format used by the `parquet_encode` processor’s `schema_metadata` field). For batches of messages, the first message’s schema is used. **Type**: `string` **Default**: `""` ### [](#schema_evolution-table_location)`schema_evolution.table_location` A prefix used as the location for new tables when the catalog does not automatically assign one. For example, AWS Glue requires explicit table locations. When set, table locations are derived as `{prefix}{namespace}/{table}`. **Type**: `string` ```yaml # Examples: table_location: s3://my-iceberg-bucket/ ``` ### [](#storage)`storage` Storage backend configuration for data files. Exactly one of `aws_s3`, `gcp_cloud_storage`, or `azure_blob_storage` must be specified. **Type**: `object` ### [](#storage-aws_s3)`storage.aws_s3` S3 storage configuration. **Type**: `object` ### [](#storage-aws_s3-bucket)`storage.aws_s3.bucket` The S3 bucket name. **Type**: `string` ```yaml # Examples: bucket: my-iceberg-data ``` ### [](#storage-aws_s3-credentials)`storage.aws_s3.credentials` Static AWS credentials for S3 access. When not specified, credentials are loaded from the default AWS credential chain. **Type**: `object` ### [](#storage-aws_s3-credentials-id)`storage.aws_s3.credentials.id` The AWS access key ID. **Type**: `string` ### [](#storage-aws_s3-credentials-secret)`storage.aws_s3.credentials.secret` The AWS secret access key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#storage-aws_s3-credentials-token)`storage.aws_s3.credentials.token` The AWS session token, required when using short term credentials. **Type**: `string` ### [](#storage-aws_s3-endpoint)`storage.aws_s3.endpoint` Custom endpoint for S3-compatible storage (e.g., MinIO). **Type**: `string` ```yaml # Examples: endpoint: http://localhost:9000 ``` ### [](#storage-aws_s3-force_path_style_urls)`storage.aws_s3.force_path_style_urls` Forces the client API to use path style URLs, which is often required when connecting to custom endpoints. **Type**: `bool` **Default**: `false` ### [](#storage-aws_s3-region)`storage.aws_s3.region` The AWS region. **Type**: `string` ```yaml # Examples: region: us-west-2 ``` ### [](#storage-azure_blob_storage)`storage.azure_blob_storage` Azure Blob Storage (ADLS Gen2) configuration. **Type**: `object` ### [](#storage-azure_blob_storage-container)`storage.azure_blob_storage.container` The Azure blob container name. **Type**: `string` ```yaml # Examples: container: iceberg-data ``` ### [](#storage-azure_blob_storage-endpoint)`storage.azure_blob_storage.endpoint` Custom endpoint for Azure-compatible storage. **Type**: `string` ### [](#storage-azure_blob_storage-storage_access_key)`storage.azure_blob_storage.storage_access_key` Azure storage access key for shared key authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#storage-azure_blob_storage-storage_account)`storage.azure_blob_storage.storage_account` The Azure storage account name. **Type**: `string` ```yaml # Examples: storage_account: mystorageaccount ``` ### [](#storage-azure_blob_storage-storage_connection_string)`storage.azure_blob_storage.storage_connection_string` Azure storage connection string. Use this or other auth methods, not both. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#storage-azure_blob_storage-storage_sas_token)`storage.azure_blob_storage.storage_sas_token` SAS token for authentication. Prefix with the container name followed by a dot if container-specific. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#storage-gcp_cloud_storage)`storage.gcp_cloud_storage` Google Cloud Storage configuration. **Type**: `object` ### [](#storage-gcp_cloud_storage-bucket)`storage.gcp_cloud_storage.bucket` The GCS bucket name. **Type**: `string` ```yaml # Examples: bucket: my-iceberg-data ``` ### [](#storage-gcp_cloud_storage-credentials_file)`storage.gcp_cloud_storage.credentials_file` Path to a GCP credentials JSON file. **Type**: `string` ### [](#storage-gcp_cloud_storage-credentials_json)`storage.gcp_cloud_storage.credentials_json` GCP credentials JSON content. Use this or `credentials_file`, not both. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#storage-gcp_cloud_storage-credentials_type)`storage.gcp_cloud_storage.credentials_type` The type of credentials to use. Valid values: `service_account`, `authorized_user`, `impersonated_service_account`, `external_account`. **Type**: `string` ```yaml # Examples: credentials_type: service_account ``` ### [](#storage-gcp_cloud_storage-endpoint)`storage.gcp_cloud_storage.endpoint` Custom endpoint for GCS-compatible storage. **Type**: `string` ### [](#table)`table` The Iceberg table name. Supports interpolation functions for dynamic table names. **Type**: `string` ```yaml # Examples: table: user_events # --- table: events_${!meta("topic")} ``` --- # Page 165: inproc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/inproc.md --- # inproc --- title: inproc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/inproc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/inproc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/inproc.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/inproc/)[Input](/redpanda-cloud/develop/connect/components/inputs/inproc/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/inproc/ "View the Self-Managed version of this component") ```yml outputs: label: "" inproc: "" ``` Sends data directly to Redpanda Connect inputs by connecting to a unique ID. It is possible to connect multiple inputs to the same inproc ID, resulting in messages dispatching in a round-robin fashion to connected inputs. However, only one output can assume an inproc ID, and will replace existing outputs if a collision occurs. --- # Page 166: kafka_franz **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/kafka_franz.md --- # kafka\_franz --- title: kafka_franz latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/kafka_franz page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/kafka_franz.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/kafka_franz.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/kafka_franz/)[Input](/redpanda-cloud/develop/connect/components/inputs/kafka_franz/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/kafka_franz/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../../inputs/redpanda/) and [`redpanda` output](../redpanda/) components. The `kafka_franz` output writes a batch of messages to Kafka brokers and waits for acknowledgement before propagating any acknowledgments back to the input. This output often outperforms the traditional `kafka` output, as well as providing more useful logs and error messages. This output uses the [Franz Kafka client library](https://github.com/twmb/franz-go). #### Common ```yml outputs: label: "" kafka_franz: seed_brokers: [] # No default (required) topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] max_in_flight: 10 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" kafka_franz: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] timestamp_ms: "" # No default (optional) max_in_flight: 10 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) partitioner: "" # No default (optional) idempotent_write: true compression: "" # No default (optional) allow_auto_topic_creation: true timeout: 10s max_message_bytes: 1MiB broker_write_max_bytes: 100MiB ``` ## [](#fields)Fields ### [](#allow_auto_topic_creation)`allow_auto_topic_creation` Enables topics to be auto created if they do not exist when fetching their metadata. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#broker_write_max_bytes)`broker_write_max_bytes` The maximum number of bytes this output can write to a broker connection in a single write. This field corresponds to Kafka’s `socket.request.max.bytes`. **Type**: `string` **Default**: `100MiB` ```yaml # Examples: broker_write_max_bytes: 128MB # --- broker_write_max_bytes: 50mib ``` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#compression)`compression` Set an explicit compression type (optional). The default preference is to use `snappy` when the broker supports it. Otherwise, use `none`. **Type**: `string` **Options**: `lz4`, `snappy`, `gzip`, `none`, `zstd` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#idempotent_write)`idempotent_write` Enables the idempotent write producer option. This requires the `IDEMPOTENT_WRITE` permission on `CLUSTER`. Disable this option if the `IDEMPOTENT_WRITE` permission is unavailable. **Type**: `bool` **Default**: `true` ### [](#key)`key` An optional key to populate for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of batches to send in parallel at any given time. **Type**: `int` **Default**: `10` ### [](#max_message_bytes)`max_message_bytes` The maximum space (in bytes) that an individual message may use. Messages larger than this value are rejected. This field corresponds to Kafka’s `max.message.bytes`. **Type**: `string` **Default**: `1MiB` ```yaml # Examples: max_message_bytes: 100MB # --- max_message_bytes: 50mib ``` ### [](#metadata)`metadata` Configure which metadata values are added to messages as headers. This allows you to pass additional context information along with your messages. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#partition)`partition` Set a partition for each message (optional). This field is only relevant when the `partitioner` is set to `manual`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). You must provide an interpolation string that is a valid integer. **Type**: `string` ```yaml # Examples: partition: ${! meta("partition") } ``` ### [](#partitioner)`partitioner` Override the default murmur2 hashing partitioner. **Type**: `string` | Option | Summary | | --- | --- | | least_backup | Chooses the least backed up partition (the partition with the fewest amount of buffered records). Partitions are selected per batch. | | manual | Manually select a partition for each message, requires the field partition to be specified. | | murmur2_hash | Kafka’s default hash algorithm that uses a 32-bit murmur2 hash of the key to compute which partition the record will be on. | | round_robin | Round-robin’s messages through all available partitions. This algorithm has lower throughput and causes higher CPU load on brokers, but can be useful if you want to ensure an even distribution of records to partitions. | ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter, but with the option to extend the timeout deadline. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods or mechanisms of SASL authentication, which are attempted in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period of time to wait for message sends before abandoning the request and retrying. **Type**: `string` **Default**: `10s` ### [](#timestamp_ms)`timestamp_ms` Set a timestamp (in milliseconds) for each message (optional). When left empty, the current timestamp is used. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: timestamp_ms: ${! timestamp_unix_milli() } # --- timestamp_ms: ${! metadata("kafka_timestamp_ms") } ``` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` A topic to write messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 167: kafka **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/kafka.md --- # kafka --- title: kafka latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/kafka page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/kafka.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/kafka.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/kafka/)[Input](/redpanda-cloud/develop/connect/components/inputs/kafka/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/kafka/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../../inputs/redpanda/) and [`redpanda` output](../redpanda/) components. The `kafka` output writes a batch of messages to Kafka brokers and waits for acknowledgement before propagating any acknowledgements back to the input. #### Common ```yml outputs: label: "" kafka: addresses: [] # No default (required) topic: "" # No default (required) target_version: "" # No default (optional) key: "" partitioner: fnv1a_hash compression: none static_headers: "" # No default (optional) metadata: exclude_prefixes: [] max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" kafka: addresses: [] # No default (required) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: mechanism: none user: "" password: "" access_token: "" token_cache: "" token_key: "" topic: "" # No default (required) client_id: benthos target_version: "" # No default (optional) rack_id: "" key: "" partitioner: fnv1a_hash partition: "" custom_topic_creation: enabled: false partitions: -1 replication_factor: -1 compression: none static_headers: "" # No default (optional) metadata: exclude_prefixes: [] inject_tracing_map: "" # No default (optional) max_in_flight: 64 idempotent_write: false ack_replicas: false max_msg_bytes: 1000000 timeout: 5s retry_as_batch: false batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_retries: 0 backoff: initial_interval: 3s max_interval: 10s max_elapsed_time: 30s timestamp_ms: "" # No default (optional) ``` The configuration field `ack_replicas` determines whether Redpanda Connect waits for acknowledgement from all replicas or just a single broker. Both the `key` and `topic` fields can be dynamically set using function interpolations described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries). [Metadata](../../../configuration/metadata/) will be added to each message sent as headers (version 0.11+), but can be restricted using the field [`metadata`](#metadata). ## [](#strict-ordering-and-retries)Strict ordering and retries When strict ordering is required for messages written to topic partitions it is important to ensure that both the field `max_in_flight` is set to `1` and that the field `retry_as_batch` is set to `true`. You must also ensure that failed batches are never rerouted back to the same output. This can be done by setting the field `max_retries` to `0` and `backoff.max_elapsed_time` to empty, which will apply back pressure indefinitely until the batch is sent successfully. However, this also means that manual intervention will eventually be required in cases where the batch cannot be sent due to configuration problems such as an incorrect `max_msg_bytes` estimate. A less strict but automated alternative would be to route failed batches to a dead letter queue using a [`fallback` broker](../fallback/), but this would allow subsequent batches to be delivered in the meantime whilst those failed batches are dealt with. ## [](#troubleshooting)Troubleshooting If you’re seeing issues writing to or reading from Kafka with this component then it’s worth trying out the newer [`kafka_franz` output](../kafka_franz/). - I’m seeing logs that report `Failed to connect to kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`, but the brokers are definitely reachable. Unfortunately this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double check your authentication configuration and also ensure that you have [enabled TLS](#tlsenabled) if applicable. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#ack_replicas)`ack_replicas` Ensure that messages have been copied across all replicas before acknowledging receipt. **Type**: `bool` **Default**: `false` ### [](#addresses)`addresses[]` A list of broker addresses to connect to. If an item of the list contains commas it will be expanded into multiple addresses. **Type**: `array` ```yaml # Examples: addresses: - "localhost:9092" # --- addresses: - "localhost:9041,localhost:9042" # --- addresses: - "localhost:9041" - "localhost:9042" ``` ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `3s` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. Setting this value to a zeroed duration (such as `0s`) will result in unbounded retries. **Type**: `string` **Default**: `30s` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `10s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `benthos` ### [](#compression)`compression` The compression algorithm to use. **Type**: `string` **Default**: `none` **Options**: `none`, `snappy`, `lz4`, `gzip`, `zstd` ### [](#custom_topic_creation)`custom_topic_creation` If enabled, topics will be created with the specified number of partitions and replication factor if they do not already exist. **Type**: `object` ### [](#custom_topic_creation-enabled)`custom_topic_creation.enabled` Whether to enable custom topic creation. **Type**: `bool` **Default**: `false` ### [](#custom_topic_creation-partitions)`custom_topic_creation.partitions` The number of partitions to create for new topics. Leave at -1 to use the broker configured default. Must be >= 1. **Type**: `int` **Default**: `-1` ### [](#custom_topic_creation-replication_factor)`custom_topic_creation.replication_factor` The replication factor to use for new topics. Leave at -1 to use the broker configured default. Must be an odd number, and less then or equal to the number of brokers. **Type**: `int` **Default**: `-1` ### [](#idempotent_write)`idempotent_write` Enable the idempotent write producer option. This requires the `IDEMPOTENT_WRITE` permission on `CLUSTER` and can be disabled if this permission is not available. **Type**: `bool` **Default**: `false` ### [](#inject_tracing_map)`inject_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) used to inject an object containing tracing propagation information into outbound messages. The specification of the injected fields will match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: inject_tracing_map: meta = @.merge(this) # --- inject_tracing_map: root.meta.span = this ``` ### [](#key)`key` An optional key to populate for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_msg_bytes)`max_msg_bytes` The maximum size in bytes of messages sent to the target topic. **Type**: `int` **Default**: `1000000` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `0` ### [](#metadata)`metadata` Specify criteria for which metadata values are sent with messages as headers. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#partition)`partition` The manually-specified partition to publish messages to, relevant only when the field `partitioner` is set to `manual`. Must be able to parse as a 32-bit integer. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#partitioner)`partitioner` The partitioning algorithm to use. **Type**: `string` **Default**: `fnv1a_hash` **Options**: `fnv1a_hash`, `murmur2_hash`, `random`, `round_robin`, `manual` ### [](#rack_id)`rack_id` A rack identifier for this client. **Type**: `string` **Default**: `""` ### [](#retry_as_batch)`retry_as_batch` When enabled forces an entire batch of messages to be retried if any individual message fails on a send, otherwise only the individual messages that failed are retried. Disabling this helps to reduce message duplicates during intermittent errors, but also makes it impossible to guarantee strict ordering of messages. **Type**: `bool` **Default**: `false` ### [](#sasl)`sasl` Enables SASL authentication. **Type**: `object` ### [](#sasl-access_token)`sasl.access_token` A static OAUTHBEARER access token **Type**: `string` **Default**: `""` ### [](#sasl-mechanism)`sasl.mechanism` The SASL authentication mechanism, if left empty SASL authentication is not used. **Type**: `string` **Default**: `none` | Option | Summary | | --- | --- | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. NOTE: When using plain text auth it is extremely likely that you’ll also need to enable TLS. | | SCRAM-SHA-256 | Authentication using the SCRAM-SHA-256 mechanism. | | SCRAM-SHA-512 | Authentication using the SCRAM-SHA-512 mechanism. | | none | Default, no SASL authentication. | ### [](#sasl-password)`sasl.password` A PLAIN password. It is recommended that you use environment variables to populate this field. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: ${PASSWORD} ``` ### [](#sasl-token_cache)`sasl.token_cache` Instead of using a static `access_token` allows you to query a [`cache`](../../caches/about/) resource to fetch OAUTHBEARER tokens from **Type**: `string` **Default**: `""` ### [](#sasl-token_key)`sasl.token_key` Required when using a `token_cache`, the key to query the cache with for tokens. **Type**: `string` **Default**: `""` ### [](#sasl-user)`sasl.user` A PLAIN username. It is recommended that you use environment variables to populate this field. **Type**: `string` **Default**: `""` ```yaml # Examples: user: ${USER} ``` ### [](#static_headers)`static_headers` An optional map of static headers that should be added to messages in addition to metadata. **Type**: `string` ```yaml # Examples: static_headers: first-static-header: value-1 second-static-header: value-2 ``` ### [](#target_version)`target_version` The version of the Kafka protocol to use. This limits the capabilities used by the client and should ideally match the version of your brokers. Defaults to the oldest supported stable version. **Type**: `string` ```yaml # Examples: target_version: 2.1.0 # --- target_version: 3.1.0 ``` ### [](#timeout)`timeout` The maximum period of time to wait for message sends before abandoning the request and retrying. **Type**: `string` **Default**: `5s` ### [](#timestamp_ms)`timestamp_ms` Set a timestamp (in milliseconds) for each message (optional). When left empty, the current timestamp is used. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: timestamp_ms: ${! timestamp_unix_milli() } # --- timestamp_ms: ${! metadata("kafka_timestamp_ms") } ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` The topic to publish messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 168: mongodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/mongodb.md --- # mongodb --- title: mongodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/mongodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/mongodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/mongodb.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/mongodb/)[Cache](/redpanda-cloud/develop/connect/components/caches/mongodb/)[Input](/redpanda-cloud/develop/connect/components/inputs/mongodb/)[Processor](/redpanda-cloud/develop/connect/components/processors/mongodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/mongodb/ "View the Self-Managed version of this component") Inserts items into a MongoDB collection. #### Common ```yml outputs: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collection: "" # No default (required) operation: update-one write_concern: w: majority j: false w_timeout: "" document_map: "" filter_map: "" hint_map: "" upsert: false max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" app_name: benthos collection: "" # No default (required) operation: update-one write_concern: w: majority j: false w_timeout: "" document_map: "" filter_map: "" hint_map: "" upsert: false max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight, in parallel, for improved performance. You can tune the maximum number of in flight messages (or message batches) using the `max_in_flight` field. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. For more information, see [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#app_name)`app_name` The client application name. **Type**: `string` **Default**: `benthos` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#collection)`collection` The name of the target collection. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#database)`database` The name of the target MongoDB database. **Type**: `string` ### [](#document_map)`document_map` A Bloblang map that represents a document to store in MongoDB, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). The `document_map` parameter is required for the following database operations: `insert-one`, `replace-one`, and `update-one`. **Type**: `string` **Default**: `""` ```yaml # Examples: document_map: |- root.a = this.foo root.b = this.bar ``` ### [](#filter_map)`filter_map` A Bloblang map that represents a filter for a MongoDB command, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). The `filter_map` parameter is required for all database operations except `insert-one`. This output uses `filter_map` to find documents for the specified operation. For example, for a `delete-one` operation, the filter map should include the fields required to locate the document for deletion. **Type**: `string` **Default**: `""` ```yaml # Examples: filter_map: |- root.a = this.foo root.b = this.bar ``` ### [](#hint_map)`hint_map` A Bloblang map that represents a hint or index for a MongoDB command to use, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). This map is optional, and is used with all operations except `insert-one`. Define a `hint_map` to improve performance when finding documents in the MongoDB database. **Type**: `string` **Default**: `""` ```yaml # Examples: hint_map: |- root.a = this.foo root.b = this.bar ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput. **Type**: `int` **Default**: `64` ### [](#operation)`operation` The MongoDB database operation to perform. **Type**: `string` **Default**: `update-one` **Options**: `insert-one`, `delete-one`, `delete-many`, `replace-one`, `update-one` ### [](#password)`password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#upsert)`upsert` The `upsert` parameter is optional, and only applies for `update-one` and `replace-one` operations. If the filter specified in `filter_map` matches an existing document, this operation updates or replaces the document, otherwise a new document is created. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target MongoDB server. **Type**: `string` ```yaml # Examples: url: mongodb://localhost:27017 ``` ### [](#username)`username` The username required to connect to the database. **Type**: `string` **Default**: `""` ### [](#write_concern)`write_concern` The [write concern settings](https://www.mongodb.com/docs/manual/reference/write-concern/) for the MongoDB connection. **Type**: `object` ### [](#write_concern-j)`write_concern.j` The `j` requests acknowledgement from MongoDB, which is created when write operations are written to the journal. **Type**: `bool` **Default**: `false` ### [](#write_concern-w)`write_concern.w` The `w` requests acknowledgement, which write operations propagate to the specified number of MongoDB instances. **Type**: `string` **Default**: `majority` ### [](#write_concern-w_timeout)`write_concern.w_timeout` The write concern timeout. **Type**: `string` **Default**: `""` --- # Page 169: mqtt **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/mqtt.md --- # mqtt --- title: mqtt latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/mqtt page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/mqtt.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/mqtt.adoc categories: "[\"Services\"]" page-git-created-date: "2024-11-07" page-git-modified-date: "2024-11-07" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/mqtt/)[Input](/redpanda-cloud/develop/connect/components/inputs/mqtt/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/mqtt/ "View the Self-Managed version of this component") Pushes messages to an MQTT broker. #### Common ```yml outputs: label: "" mqtt: urls: [] # No default (required) client_id: "" connect_timeout: 30s topic: "" # No default (required) qos: 1 write_timeout: 3s retained: false max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" mqtt: urls: [] # No default (required) client_id: "" dynamic_client_id_suffix: "" # No default (optional) connect_timeout: 30s will: enabled: false qos: 0 retained: false topic: "" payload: "" user: "" password: "" keepalive: 30 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] topic: "" # No default (required) qos: 1 write_timeout: 3s retained: false retained_interpolated: "" # No default (optional) max_in_flight: 64 ``` The `topic` field can be dynamically set using function interpolations described [here](../../../configuration/interpolation/#bloblang-queries). When sending batched messages these interpolations are performed per message part. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `""` ### [](#connect_timeout)`connect_timeout` The maximum amount of time to wait in order to establish a connection before the attempt is abandoned. **Type**: `string` **Default**: `30s` ```yaml # Examples: connect_timeout: 1s # --- connect_timeout: 500ms ``` ### [](#dynamic_client_id_suffix)`dynamic_client_id_suffix` Append a dynamically generated suffix to the specified `client_id` on each run of the pipeline. This can be useful when clustering Redpanda Connect producers. **Type**: `string` | Option | Summary | | --- | --- | | nanoid | append a nanoid of length 21 characters | ### [](#keepalive)`keepalive` Max seconds of inactivity before a keepalive message is sent. **Type**: `int` **Default**: `30` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#password)`password` A password to connect with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#qos)`qos` The QoS value to set for each message. Has options 0, 1, 2. **Type**: `int` **Default**: `1` ### [](#retained)`retained` Set message as retained on the topic. **Type**: `bool` **Default**: `false` ### [](#retained_interpolated)`retained_interpolated` Override the value of `retained` with an interpolable value, this allows it to be dynamically set based on message contents. The value must resolve to either `true` or `false`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` The topic to publish messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#urls)`urls[]` A list of URLs to connect to. Use the format `scheme://host:port`, where: - `scheme` is one of the following: `tcp`, `ssl`, `ws` - `host` is the IP address or hostname - `port` is the port on which the MQTT broker accepts connections If an item in the list contains commas, it is expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "tcp://localhost:1883" ``` ### [](#user)`user` A username to connect with. **Type**: `string` **Default**: `""` ### [](#will)`will` Set last will message in case of Redpanda Connect failure **Type**: `object` ### [](#will-enabled)`will.enabled` Whether to enable last will messages. **Type**: `bool` **Default**: `false` ### [](#will-payload)`will.payload` Set payload for last will message. **Type**: `string` **Default**: `""` ### [](#will-qos)`will.qos` Set QoS for last will message. Valid values are: 0, 1, 2. **Type**: `int` **Default**: `0` ### [](#will-retained)`will.retained` Set retained for last will message. **Type**: `bool` **Default**: `false` ### [](#will-topic)`will.topic` Set topic for last will message. **Type**: `string` **Default**: `""` ### [](#write_timeout)`write_timeout` The maximum amount of time to wait to write data before the attempt is abandoned. **Type**: `string` **Default**: `3s` ```yaml # Examples: write_timeout: 1s # --- write_timeout: 500ms ``` --- # Page 170: nats_jetstream **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/nats_jetstream.md --- # nats\_jetstream --- title: nats_jetstream latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/nats_jetstream page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/nats_jetstream.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/nats_jetstream.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/nats_jetstream/)[Input](/redpanda-cloud/develop/connect/components/inputs/nats_jetstream/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/nats_jetstream/ "View the Self-Managed version of this component") Write messages to a NATS JetStream subject. #### Common ```yml outputs: label: "" nats_jetstream: urls: [] # No default (required) subject: "" # No default (required) headers: {} metadata: include_prefixes: [] include_patterns: [] max_in_flight: 1024 ``` #### Advanced ```yml outputs: label: "" nats_jetstream: urls: [] # No default (required) max_reconnects: "" # No default (optional) subject: "" # No default (required) headers: {} metadata: include_prefixes: [] include_patterns: [] max_in_flight: 1024 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) inject_tracing_map: "" # No default (optional) ``` ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#headers)`headers` Explicit message headers to add to messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/json Timestamp: ${!meta("Timestamp")} ``` ### [](#inject_tracing_map)`inject_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) used to inject an object containing tracing propagation information into outbound messages. The specification of the injected fields will match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: inject_tracing_map: meta = @.merge(this) # --- inject_tracing_map: root.meta.span = this ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `1024` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#metadata)`metadata` Determine which (if any) metadata values should be added to messages as headers. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#subject)`subject` A subject to write to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: subject: foo.bar.baz # --- subject: ${! meta("kafka_topic") } # --- subject: foo.${! json("meta.type") } ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 171: nats_kv **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/nats_kv.md --- # nats\_kv --- title: nats_kv latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/nats_kv page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/nats_kv.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/nats_kv.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/nats_kv/)[Cache](/redpanda-cloud/develop/connect/components/caches/nats_kv/)[Input](/redpanda-cloud/develop/connect/components/inputs/nats_kv/)[Processor](/redpanda-cloud/develop/connect/components/processors/nats_kv/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/nats_kv/ "View the Self-Managed version of this component") Put messages into a NATS key-value bucket. #### Common ```yml outputs: label: "" nats_kv: urls: [] # No default (required) bucket: "" # No default (required) key: "" # No default (required) max_in_flight: 1024 ``` #### Advanced ```yml outputs: label: "" nats_kv: urls: [] # No default (required) max_reconnects: "" # No default (optional) bucket: "" # No default (required) key: "" # No default (required) max_in_flight: 1024 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) ``` The `key` field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries), which lets you create a unique key for each message. ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#bucket)`bucket` The name of the KV bucket. **Type**: `string` ```yaml # Examples: bucket: my_kv_bucket ``` ### [](#key)`key` The key for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: foo # --- key: foo.bar.baz # --- key: foo.${! json("meta.type") } ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `1024` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 172: nats **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/nats.md --- # nats --- title: nats latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/nats page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/nats.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/nats.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/nats/)[Input](/redpanda-cloud/develop/connect/components/inputs/nats/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/nats/ "View the Self-Managed version of this component") Publish to an NATS subject. #### Common ```yml outputs: label: "" nats: urls: [] # No default (required) subject: "" # No default (required) headers: {} metadata: include_prefixes: [] include_patterns: [] max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" nats: urls: [] # No default (required) max_reconnects: "" # No default (optional) subject: "" # No default (required) headers: {} metadata: include_prefixes: [] include_patterns: [] max_in_flight: 64 tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) inject_tracing_map: "" # No default (optional) ``` This output interpolates functions within the subject field. For a full list of functions, see [configuration:interpolation.adoc#bloblang-queries](../../../configuration/interpolation/#bloblang-queries). ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#headers)`headers` Explicit message headers to add to messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/json Timestamp: ${!meta("Timestamp")} ``` ### [](#inject_tracing_map)`inject_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) used to inject an object containing tracing propagation information into outbound messages. The specification of the injected fields will match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: inject_tracing_map: meta = @.merge(this) # --- inject_tracing_map: root.meta.span = this ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#metadata)`metadata` Determine which (if any) metadata values should be added to messages as headers. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#subject)`subject` The subject to publish to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: subject: foo.bar.baz ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 173: opensearch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/opensearch.md --- # opensearch --- title: opensearch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/opensearch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/opensearch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/opensearch.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/opensearch/ "View the Self-Managed version of this component") Publishes messages into an Elasticsearch index. If the index does not exist then it is created with a dynamic mapping. #### Common ```yml outputs: label: "" opensearch: urls: [] # No default (required) index: "" # No default (required) action: "" # No default (required) id: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" opensearch: urls: [] # No default (required) index: "" # No default (required) action: "" # No default (required) id: "" # No default (required) pipeline: "" routing: "" tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] max_in_flight: 64 basic_auth: enabled: false username: "" password: "" batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) aws: enabled: false region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` Both the `id` and `index` fields can be dynamically set using function interpolations described [here](../../../configuration/interpolation/#bloblang-queries). When sending batched messages these interpolations are performed per message part. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#examples)Examples ### [](#updating-documents)Updating Documents When [updating documents](https://opensearch.org/docs/latest/api-reference/document-apis/update-document/) the request body should contain a combination of a `doc`, `upsert`, and/or `script` fields at the top level, this should be done via mapping processors. ```yaml output: processors: - mapping: | meta id = this.id root.doc = this opensearch: urls: [ TODO ] index: foo id: ${! @id } action: update ``` ## [](#fields)Fields ### [](#action)`action` The action to take on the document. This field must resolve to one of the following action types: `index`, `update` or `delete`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#aws)`aws` Enables and customises connectivity to Amazon Elastic Service. **Type**: `object` ### [](#aws-credentials)`aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#aws-credentials-from_ec2_role)`aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#aws-credentials-id)`aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#aws-credentials-profile)`aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#aws-credentials-role)`aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#aws-credentials-role_external_id)`aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#aws-credentials-secret)`aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#aws-credentials-token)`aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#aws-enabled)`aws.enabled` Whether to connect to Amazon Elastic Service. **Type**: `bool` **Default**: `false` ### [](#aws-endpoint)`aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#aws-region)`aws.region` The AWS region to target. **Type**: `string` ### [](#aws-tcp)`aws.tcp` TCP socket configuration. **Type**: `object` ### [](#aws-tcp-connect_timeout)`aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#aws-tcp-keep_alive)`aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#aws-tcp-keep_alive-count)`aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#aws-tcp-keep_alive-idle)`aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#aws-tcp-keep_alive-interval)`aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#aws-tcp-tcp_user_timeout)`aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#id)`id` The ID for indexed messages. Interpolation should be used in order to create a unique ID for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: id: ${!counter()}-${!timestamp_unix()} ``` ### [](#index)`index` The index to place messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#pipeline)`pipeline` An optional pipeline id to preprocess incoming documents. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#routing)`routing` The routing key to use for the document. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If an item of the list contains commas it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "http://localhost:9200" ``` --- # Page 174: otlp_grpc **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/otlp_grpc.md --- # otlp\_grpc --- title: otlp_grpc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/otlp_grpc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/otlp_grpc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/otlp_grpc.adoc page-git-created-date: "2026-01-23" page-git-modified-date: "2026-01-23" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/otlp_grpc/)[Input](/redpanda-cloud/develop/connect/components/inputs/otlp_grpc/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/otlp_grpc/ "View the Self-Managed version of this component") Send OpenTelemetry traces, logs, and metrics via OTLP/gRPC protocol. Sends OpenTelemetry telemetry data to a remote collector via OTLP/gRPC protocol. Accepts batches of Redpanda OTEL v1 protobuf messages (spans, log records, or metrics) and converts them to OTLP format for transmission to OpenTelemetry collectors. #### Common ```yml outputs: label: "" otlp_grpc: endpoint: "" # No default (required) max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" otlp_grpc: endpoint: "" # No default (required) headers: {} timeout: 30s compression: gzip tls: enabled: false skip_cert_verify: false cert_file: "" key_file: "" tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} max_in_flight: 64 ``` ## [](#input-format)Input format Expects messages in Redpanda OTEL v1 protobuf format with metadata: - `signal_type`: "trace", "log", or "metric" Each batch must contain messages of the same signal type. The entire batch is converted to a single OTLP export request and sent via gRPC. ## [](#authentication)Authentication Supports multiple authentication methods: - Bearer token authentication (via `auth_token` field) - OAuth v2 (via `oauth2` configuration block) > 📝 **NOTE** > > OAuth2 requires TLS to be enabled. ## [](#fields)Fields ### [](#compression)`compression` Compression type for gRPC requests. Options: 'gzip' or 'none'. **Type**: `string` **Default**: `gzip` **Options**: `gzip`, `none` ### [](#endpoint)`endpoint` The gRPC endpoint of the remote OTLP collector. **Type**: `string` ### [](#headers)`headers` A map of headers to add to the gRPC request metadata. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: X-Custom-Header: value traceparent: ${! tracing_span().traceparent } ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#oauth2)`oauth2` Allows you to specify open authentication via OAuth version 2 using the client credentials token flow. **Type**: `object` ### [](#oauth2-client_key)`oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#oauth2-client_secret)`oauth2.client_secret` A secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth2-enabled)`oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2-endpoint_params)`oauth2.endpoint_params` A list of optional endpoint parameters, values should be arrays of strings. **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: audience: - https://example.com resource: - https://api.example.com ``` ### [](#oauth2-scopes)`oauth2.scopes[]` A list of optional requested permissions. **Type**: `array` **Default**: `[]` ### [](#oauth2-token_url)`oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` Timeout for gRPC requests. **Type**: `string` **Default**: `30s` ### [](#tls)`tls` TLS configuration for gRPC client. **Type**: `object` ### [](#tls-cert_file)`tls.cert_file` Path to the TLS certificate file for client authentication. **Type**: `string` **Default**: `""` ### [](#tls-enabled)`tls.enabled` Enable TLS connections. **Type**: `bool` **Default**: `false` ### [](#tls-key_file)`tls.key_file` Path to the TLS key file for client authentication. **Type**: `string` **Default**: `""` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Skip certificate verification (insecure). **Type**: `bool` **Default**: `false` --- # Page 175: otlp_http **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/otlp_http.md --- # otlp\_http --- title: otlp_http latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/otlp_http page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/otlp_http.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/otlp_http.adoc page-git-created-date: "2026-01-23" page-git-modified-date: "2026-01-23" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/otlp_http/)[Input](/redpanda-cloud/develop/connect/components/inputs/otlp_http/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/otlp_http/ "View the Self-Managed version of this component") Send OpenTelemetry traces, logs, and metrics via OTLP/HTTP protocol. Sends OpenTelemetry telemetry data to a remote collector via OTLP/HTTP protocol. Accepts batches of Redpanda OTEL v1 protobuf messages (spans, log records, or metrics) and converts them to OTLP format for transmission to OpenTelemetry collectors. #### Common ```yml outputs: label: "" otlp_http: endpoint: "" # No default (required) max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" otlp_http: endpoint: "" # No default (required) content_type: protobuf headers: {} timeout: 30s proxy_url: "" follow_redirects: false disable_http2: false tls: enabled: false skip_cert_verify: false cert_file: "" key_file: "" tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} max_in_flight: 64 ``` ## [](#input-format)Input format Expects messages in Redpanda OTEL v1 protobuf format with metadata: - `signal_type`: "trace", "log", or "metric" Each batch must contain messages of the same signal type. The entire batch is converted to a single OTLP export request and sent via HTTP POST. ## [](#endpoints)Endpoints The output automatically appends the signal type path to the base endpoint: - Traces: `{endpoint}/v1/traces` - Logs: `{endpoint}/v1/logs` - Metrics: `{endpoint}/v1/metrics` ## [](#content-types)Content types Supports two content types: - `protobuf` (default): `application/x-protobuf` - `json`: `application/json` ## [](#authentication)Authentication Supports multiple authentication methods: - Basic authentication - OAuth v1 - OAuth v2 - JWT ## [](#fields)Fields ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#content_type)`content_type` Content type for HTTP requests. Options: 'protobuf' or 'json'. **Type**: `string` **Default**: `protobuf` **Options**: `protobuf`, `json` ### [](#disable_http2)`disable_http2` Whether or not to disable HTTP/2. **Type**: `bool` **Default**: `false` ### [](#endpoint)`endpoint` The HTTP endpoint of the remote OTLP collector (without the signal path). **Type**: `string` ### [](#follow_redirects)`follow_redirects` Transparently follow redirects, i.e. responses with 300-399 status codes. If disabled, the response message will contain the body, status, and headers from the redirect response and the processor will not make a request to the URL set in the Location header of the response. **Type**: `bool` **Default**: `false` ### [](#headers)`headers` A map of headers to add to the request. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: X-Custom-Header: value traceparent: ${! tracing_span().traceparent } ``` ### [](#jwt)`jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#oauth)`oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2)`oauth2` Allows you to specify open authentication via OAuth version 2 using the client credentials token flow. **Type**: `object` ### [](#oauth2-client_key)`oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#oauth2-client_secret)`oauth2.client_secret` A secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth2-enabled)`oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2-endpoint_params)`oauth2.endpoint_params` A list of optional endpoint parameters, values should be arrays of strings. **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: audience: - https://example.com resource: - https://api.example.com ``` ### [](#oauth2-scopes)`oauth2.scopes[]` A list of optional requested permissions. **Type**: `array` **Default**: `[]` ### [](#oauth2-token_url)`oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#proxy_url)`proxy_url` An optional HTTP proxy URL. **Type**: `string` **Default**: `""` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` Timeout for HTTP requests. **Type**: `string` **Default**: `30s` ### [](#tls)`tls` TLS configuration for HTTP client. **Type**: `object` ### [](#tls-cert_file)`tls.cert_file` Path to the TLS certificate file for client authentication. **Type**: `string` **Default**: `""` ### [](#tls-enabled)`tls.enabled` Enable TLS connections. **Type**: `bool` **Default**: `false` ### [](#tls-key_file)`tls.key_file` Path to the TLS key file for client authentication. **Type**: `string` **Default**: `""` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Skip certificate verification (insecure). **Type**: `bool` **Default**: `false` --- # Page 176: pinecone **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/pinecone.md --- # pinecone --- title: pinecone latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/pinecone page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/pinecone.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/pinecone.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/pinecone/ "View the Self-Managed version of this component") Inserts items into a Pinecone index. #### Common ```yml outputs: label: "" pinecone: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) host: "" # No default (required) api_key: "" # No default (required) operation: upsert-vectors id: "" # No default (required) vector_mapping: "" # No default (optional) metadata_mapping: "" # No default (optional) ``` #### Advanced ```yml outputs: label: "" pinecone: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) host: "" # No default (required) api_key: "" # No default (required) operation: upsert-vectors namespace: "" id: "" # No default (required) vector_mapping: "" # No default (optional) metadata_mapping: "" # No default (optional) ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#api_key)`api_key` The Pinecone API key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#host)`host` The host for the Pinecone index. **Type**: `string` ### [](#id)`id` The ID for the index entry in Pinecone. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#metadata_mapping)`metadata_mapping` An optional mapping of message to metadata in the Pinecone index entry. **Type**: `string` ```yaml # Examples: metadata_mapping: root = @ # --- metadata_mapping: root = metadata() # --- metadata_mapping: root = {"summary": this.summary, "foo": this.other_field} ``` ### [](#namespace)`namespace` The namespace to write to - writes to the default namespace by default. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#operation)`operation` The operation to perform against the Pinecone index. **Type**: `string` **Default**: `upsert-vectors` **Options**: `update-vector`, `upsert-vectors`, `delete-vectors` ### [](#vector_mapping)`vector_mapping` The mapping to extract out the vector from the document. The result must be a floating point array. Required if not a delete operation. **Type**: `string` ```yaml # Examples: vector_mapping: root = this.embeddings_vector # --- vector_mapping: root = [1.2, 0.5, 0.76] ``` --- # Page 177: qdrant **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/qdrant.md --- # qdrant --- title: qdrant latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/qdrant page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/qdrant.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/qdrant.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/qdrant/)[Processor](/redpanda-cloud/develop/connect/components/processors/qdrant/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/qdrant/ "View the Self-Managed version of this component") Adds items to a [Qdrant](https://qdrant.tech/) collection #### Common ```yml outputs: label: "" qdrant: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) grpc_host: "" # No default (required) api_token: "" collection_name: "" # No default (required) id: "" # No default (required) vector_mapping: "" # No default (required) payload_mapping: root = {} ``` #### Advanced ```yml outputs: label: "" qdrant: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) grpc_host: "" # No default (required) api_token: "" tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] collection_name: "" # No default (required) id: "" # No default (required) vector_mapping: "" # No default (required) payload_mapping: root = {} ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#api_token)`api_token` The Qdrant API token for authentication. Defaults to an empty string. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#collection_name)`collection_name` The name of the collection in Qdrant. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#grpc_host)`grpc_host` The gRPC host of the Qdrant server. **Type**: `string` ```yaml # Examples: grpc_host: localhost:6334 # --- grpc_host: xyz-example.eu-central.aws.cloud.qdrant.io:6334 ``` ### [](#id)`id` The ID of the point to insert. Can be a UUID string or positive integer. **Type**: `string` ```yaml # Examples: id: root = "dc88c126-679f-49f5-ab85-04b77e8c2791" # --- id: root = 832 ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#payload_mapping)`payload_mapping` An optional mapping of message to payload associated with the point. **Type**: `string` **Default**: `root = {}` ```yaml # Examples: payload_mapping: root = {"field": this.value, "field_2": 987} # --- payload_mapping: root = metadata() ``` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#vector_mapping)`vector_mapping` The mapping to extract the vector from the document. **Type**: `string` ```yaml # Examples: vector_mapping: root = {"dense_vector": [0.352,0.532,0.754],"sparse_vector": {"indices": [23,325,532],"values": [0.352,0.532,0.532]}, "multi_vector": [[0.352,0.532],[0.352,0.532]]} # --- vector_mapping: root = [1.2, 0.5, 0.76] # --- vector_mapping: root = this.vector # --- vector_mapping: root = [[0.352,0.532,0.532,0.234],[0.352,0.532,0.532,0.234]] # --- vector_mapping: root = {"some_sparse": {"indices":[23,325,532],"values":[0.352,0.532,0.532]}} # --- vector_mapping: root = {"some_multi": [[0.352,0.532,0.532,0.234],[0.352,0.532,0.532,0.234]]} # --- vector_mapping: root = {"some_dense": [0.352,0.532,0.532,0.234]} ``` --- # Page 178: questdb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/questdb.md --- # questdb --- title: questdb page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/questdb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/questdb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/questdb.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-11-07" page-git-modified-date: "2024-11-07" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta Pushes messages to a [QuestDB](https://questdb.io/docs/) table. #### Common ```yml outputs: label: "" questdb: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) address: "" # No default (required) username: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) table: "" # No default (required) designated_timestamp_field: "" # No default (optional) designated_timestamp_unit: auto timestamp_string_fields: [] # No default (optional) timestamp_string_format: Jan _2 15:04:05.000000Z0700 symbols: [] # No default (optional) doubles: [] # No default (optional) error_on_empty_messages: false ``` #### Advanced ```yml outputs: label: "" questdb: max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] address: "" # No default (required) username: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) retry_timeout: "" # No default (optional) request_timeout: "" # No default (optional) request_min_throughput: "" # No default (optional) table: "" # No default (required) designated_timestamp_field: "" # No default (optional) designated_timestamp_unit: auto timestamp_string_fields: [] # No default (optional) timestamp_string_format: Jan _2 15:04:05.000000Z0700 symbols: [] # No default (optional) doubles: [] # No default (optional) error_on_empty_messages: false ``` > ❗ **IMPORTANT** > > Redpanda Data recommends enabling the dedupe feature on the QuestDB server. For more information about deploying, configuring, and using QuestDB, see the [QuestDB documentation](https://questdb.io/docs/). ## [](#performance)Performance For improved performance, this output sends multiple messages in parallel. You can tune the maximum number of in-flight messages (or message batches), using the `max_in_flight` field. You can configure batches at both the input and output level. For more information, see [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#address)`address` The host and port of the QuestDB server. **Type**: `string` ```yaml # Examples: address: localhost:9000 ``` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#designated_timestamp_field)`designated_timestamp_field` The name of the designated timestamp field in QuestDB. **Type**: `string` ### [](#designated_timestamp_unit)`designated_timestamp_unit` Units used for the designated timestamp field in QuestDB. **Type**: `string` **Default**: `auto` ### [](#doubles)`doubles[]` Columns that must be the `double` type, with `int` as the default. **Type**: `array` ### [](#error_on_empty_messages)`error_on_empty_messages` Mark a message as an error if it is empty after field validation. **Type**: `bool` **Default**: `false` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this value to improve throughput. **Type**: `int` **Default**: `64` ### [](#password)`password` The password to use for basic authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#request_min_throughput)`request_min_throughput` The minimum expected throughput in bytes per second for HTTP requests. If the throughput is lower than this value, the connection times out. The `quest_db` output uses this value to calculate an additional timeout on top of the `request_timeout`. This setting is useful for large requests. Set it to `0` to disable this logic. **Type**: `int` ### [](#request_timeout)`request_timeout` The period of time to wait for a response from the QuestDB server in addition to any connection timeout calculated for the `request_min_throughput` field. **Type**: `string` ### [](#retry_timeout)`retry_timeout` The period of time to continue retrying after a failed HTTP request. The interval between retries is an exponential backoff starting at 10 ms, and doubling after each failed attempt up to a maximum of 1 second. **Type**: `string` ### [](#symbols)`symbols[]` Columns that must be the `symbol` type. String values default to `string` types. **Type**: `array` ### [](#table)`table` The destination table in QuestDB. **Type**: `string` ```yaml # Examples: table: trades ``` ### [](#timestamp_string_fields)`timestamp_string_fields[]` String fields with textual timestamps. **Type**: `array` ### [](#timestamp_string_format)`timestamp_string_format` The timestamp format, which is used when parsing timestamp string fields and uses Golang’s time formatting. **Type**: `string` **Default**: `Jan _2 15:04:05.000000Z0700` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#token)`token` The bearer token to use for authentication, which takes precedence over the basic authentication username and password. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#username)`username` The username to use for basic authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` --- # Page 179: redis_hash **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redis_hash.md --- # redis\_hash --- title: redis_hash latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redis_hash page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redis_hash.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redis_hash.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redis_hash/ "View the Self-Managed version of this component") Sets Redis hash objects using the HMSET command. #### Common ```yml outputs: label: "" redis_hash: url: "" # No default (required) key: "" # No default (required) walk_metadata: false walk_json_object: false fields: {} max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" redis_hash: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] key: "" # No default (required) walk_metadata: false walk_json_object: false fields: {} max_in_flight: 64 ``` The field `key` supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries), allowing you to create a unique key for each message. The field `fields` allows you to specify an explicit map of field names to interpolated values, also evaluated per message of a batch: ```yaml output: redis_hash: url: tcp://localhost:6379 key: ${!json("id")} fields: topic: ${!meta("kafka_topic")} partition: ${!meta("kafka_partition")} content: ${!json("document.text")} ``` If the field `walk_metadata` is set to `true` then Redpanda Connect will walk all metadata fields of messages and add them to the list of hash fields to set. If the field `walk_json_object` is set to `true` then Redpanda Connect will walk each message as a JSON object, extracting keys and the string representation of their value and adds them to the list of hash fields to set. The order of hash field extraction is as follows: 1. Metadata (if enabled) 2. JSON object (if enabled) 3. Explicit fields Where latter stages will overwrite matching field names of a former stage. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#fields-2)`fields` A map of key/value pairs to set as hash fields. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ### [](#key)`key` The key for each message, function interpolations should be used to create a unique key per message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: ${! @.kafka_key } # --- key: ${! this.doc.id } # --- key: ${! counter() } ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` ### [](#walk_json_object)`walk_json_object` Whether to walk each message as a JSON object and add each key/value pair to the list of hash fields to set. **Type**: `bool` **Default**: `false` ### [](#walk_metadata)`walk_metadata` Whether all metadata fields of messages should be walked and added to the list of hash fields to set. **Type**: `bool` **Default**: `false` --- # Page 180: redis_list **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redis_list.md --- # redis\_list --- title: redis_list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redis_list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redis_list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redis_list.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redis_list/)[Input](/redpanda-cloud/develop/connect/components/inputs/redis_list/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redis_list/ "View the Self-Managed version of this component") Pushes messages onto the end of a Redis list (which is created if it doesn’t already exist) using the RPUSH command. #### Common ```yml outputs: label: "" redis_list: url: "" # No default (required) key: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" redis_list: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] key: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) command: rpush ``` The field `key` supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries), allowing you to create a unique key for each message. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#command)`command` The command used to push elements to the Redis list **Type**: `string` **Default**: `rpush` **Options**: `rpush`, `lpush` ### [](#key)`key` The key for each message, function interpolations can be optionally used to create a unique key per message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: some_list # --- key: ${! @.kafka_key } # --- key: ${! this.doc.id } # --- key: ${! counter() } ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 181: redis_pubsub **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redis_pubsub.md --- # redis\_pubsub --- title: redis_pubsub latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redis_pubsub page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redis_pubsub.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redis_pubsub.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redis_pubsub/)[Input](/redpanda-cloud/develop/connect/components/inputs/redis_pubsub/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redis_pubsub/ "View the Self-Managed version of this component") Publishes messages through the Redis PubSub model. It is not possible to guarantee that messages have been received. #### Common ```yml outputs: label: "" redis_pubsub: url: "" # No default (required) channel: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" redis_pubsub: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] channel: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` This output will interpolate functions within the channel field, you can find a list of functions [here](../../../configuration/interpolation/#bloblang-queries). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#channel)`channel` The channel to publish messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 182: redis_streams **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redis_streams.md --- # redis\_streams --- title: redis_streams latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redis_streams page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redis_streams.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redis_streams.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redis_streams/)[Input](/redpanda-cloud/develop/connect/components/inputs/redis_streams/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redis_streams/ "View the Self-Managed version of this component") Pushes messages to a Redis (v5.0+) Stream (which is created if it doesn’t already exist) using the XADD command. #### Common ```yml outputs: label: "" redis_streams: url: "" # No default (required) stream: "" # No default (required) id: * body_key: body max_length: 0 max_in_flight: 64 metadata: exclude_prefixes: [] batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" redis_streams: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] stream: "" # No default (required) id: * body_key: body max_length: 0 max_in_flight: 64 metadata: exclude_prefixes: [] batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` It’s possible to specify a maximum length of the target stream by setting it to a value greater than 0, in which case this cap is applied only when Redis is able to remove a whole macro node, for efficiency. Redis stream entries are key/value pairs, as such it is necessary to specify the key to be set to the body of the message. All metadata fields of the message will also be set as key/value pairs, if there is a key collision between a metadata item and the body then the body takes precedence. ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#body_key)`body_key` A key to set the raw body of the message to. **Type**: `string` **Default**: `body` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#id)`id` The entry ID for the stream message. Allows function interpolations. When set to `*` (the default), Redis auto-generates a unique ID based on the current time. Set a custom ID to control message ordering, for example to replay messages in upstream order. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `*` ```yaml # Examples: id: * # --- id: ${! @redis_stream } # --- id: ${! this.id } # --- id: ${! counter() }-0 ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#max_length)`max_length` When greater than zero enforces a rough cap on the length of the target stream. **Type**: `int` **Default**: `0` ### [](#metadata)`metadata` Specify criteria for which metadata values are included in the message body. **Type**: `object` ### [](#metadata-exclude_prefixes)`metadata.exclude_prefixes[]` Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages. **Type**: `array` **Default**: `[]` ### [](#stream)`stream` The stream to add messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 183: redpanda_common **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redpanda_common.md --- # redpanda\_common --- title: redpanda_common latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redpanda_common page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redpanda_common.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redpanda_common.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redpanda_common/)[Input](/redpanda-cloud/develop/connect/components/inputs/redpanda_common/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redpanda_common/ "View the Self-Managed version of this component") > ⚠️ **WARNING: Deprecated in 4.68.0** > > Deprecated in 4.68.0 > > This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](../../inputs/redpanda/) and [`redpanda` output](../redpanda/) components. Sends data to a Redpanda (Kafka) broker, using credentials from a common `redpanda` configuration block. To avoid duplicating Redpanda cluster credentials in your `redpanda_common` input, output, or any other components in your data pipeline, you can use a single [`redpanda` configuration block](../../redpanda/about/). For more details, see the [Pipeline example](#pipeline-example). > 📝 **NOTE** > > If you need to move topic data between Redpanda clusters or other Apache Kafka clusters, consider using the [`redpanda` input](../../inputs/redpanda/) and [output](../redpanda/) instead. #### Common ```yml outputs: label: "" redpanda_common: topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] max_in_flight: 10 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" redpanda_common: topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] timestamp_ms: "" # No default (optional) max_in_flight: 10 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#pipeline-example)Pipeline example This data pipeline reads data from `topic_A` and `topic_B` on a Redpanda cluster, and then writes the data to `topic_C` on the same cluster. The cluster details are configured within the `redpanda` configuration block, so you only need to configure them once. This is a useful feature when you have multiple inputs and outputs in the same data pipeline that need to connect to the same cluster. ```none input: redpanda_common: topics: [ topic_A, topic_B ] output: redpanda_common: topic: topic_C key: ${! @id } redpanda: seed_brokers: [ "127.0.0.1:9092" ] tls: enabled: true sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#key)`key` A key to populate for each message (optional). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput until performance plateaus. **Type**: `int` **Default**: `10` ### [](#metadata)`metadata` Configure which metadata values are added to messages as headers. This allows you to pass additional context information along with your messages. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#partition)`partition` Set a partition for each message (optional). This field is only relevant when the `partitioner` is set to `manual`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). You must provide an interpolation string that is a valid integer. **Type**: `string` ```yaml # Examples: partition: ${! meta("partition") } ``` ### [](#timestamp_ms)`timestamp_ms` Set a timestamp (in milliseconds) for each message (optional). When left empty, the current timestamp is used. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: timestamp_ms: ${! timestamp_unix_milli() } # --- timestamp_ms: ${! metadata("kafka_timestamp_ms") } ``` ### [](#topic)`topic` A topic to write messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 184: redpanda_migrator **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redpanda_migrator.md --- # redpanda\_migrator --- title: redpanda_migrator latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redpanda_migrator page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redpanda_migrator.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redpanda_migrator.adoc categories: "[\"Services\"]" page-git-created-date: "2024-10-02" page-git-modified-date: "2024-10-16" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redpanda_migrator/)[Input](/redpanda-cloud/develop/connect/components/inputs/redpanda_migrator/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redpanda_migrator/ "View the Self-Managed version of this component") A Kafka producer for migrating data between Kafka/Redpanda clusters. The `redpanda_migrator` output coordinates migration of topics, schemas, and consumer groups from a source Kafka/Redpanda cluster to a destination cluster. > ❗ **IMPORTANT** > > This output **must** be paired with a [`redpanda_migrator` input](../../inputs/redpanda_migrator/) in the same pipeline. Each pipeline requires both input and output components. #### Common ```yml outputs: label: "" redpanda_migrator: seed_brokers: [] # No default (required) schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} enabled: true interval: 5m include: [] # No default (optional) exclude: [] # No default (optional) subject: "" # No default (optional) versions: all include_deleted: false translate_ids: false normalize: false strict: false max_parallel_http_requests: 10 consumer_groups: enabled: true interval: 1m fetch_timeout: 10s include: [] # No default (optional) exclude: [] # No default (optional) only_empty: false topic: ${! @kafka_topic } topic_replication_factor: "" # No default (optional) sync_topic_acls: false max_in_flight: 10 ``` #### Advanced ```yml outputs: label: "" redpanda_migrator: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s partitioner: "" # No default (optional) idempotent_write: true compression: "" # No default (optional) allow_auto_topic_creation: true timeout: 10s max_message_bytes: 1MiB broker_write_max_bytes: 100MiB schema_registry: url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} enabled: true interval: 5m include: [] # No default (optional) exclude: [] # No default (optional) subject: "" # No default (optional) versions: all include_deleted: false translate_ids: false normalize: false strict: false max_parallel_http_requests: 10 consumer_groups: enabled: true interval: 1m fetch_timeout: 10s include: [] # No default (optional) exclude: [] # No default (optional) only_empty: false topic: ${! @kafka_topic } topic_replication_factor: "" # No default (optional) sync_topic_interval: 5m sync_topic_acls: false serverless: false provenance_header: redpanda-migrator-provenance offset_header: redpanda-migrator-offset max_in_flight: 10 ``` ## [](#multiple-migrator-pairs)Multiple migrator pairs When using multiple migrator pairs in a pipeline, match the `label` field exactly between input and output components for correct coordination. ## [](#performance-tuning)Performance tuning For high-throughput workloads, adjust the following settings: On this output component: - `max_in_flight`: Set to the total number of partitions being copied in parallel (up to all partitions in the cluster) On the paired [`redpanda_migrator` input component](../../inputs/redpanda_migrator/#performance-tuning): - `partition_buffer_bytes`: Set to 2MB to increase per-partition buffer size - `max_yield_batch_bytes`: Set to 1MB to allow larger batches to be yielded ## [](#synchronization-details)Synchronization details **Topics** - Name resolution with interpolation (default: preserve source name) - Automatic creation with mirrored partition counts - Selectable replication factor (default: inherit from source) - Supported topic configuration keys (serverless-aware subset) - Optional ACL replication: - Excludes `ALLOW WRITE` entries - Downgrades `ALLOW ALL` to `READ` - Preserves resource pattern type and host filters **Schema Registry** - One-shot or periodic syncing - Subject selection via include/exclude regex - Subject renaming with interpolation - Versions: `latest` or `all` (default: `all`) - Optional include of soft-deleted subjects - ID handling: translate IDs or keep fixed - Optional schema normalization - Compatibility propagation (per-subject only) - Schema metadata/rules not copied in Serverless mode **Consumer Groups** - Periodic syncing - Group selection using regex - Only `Empty` state groups migrated - Timestamp-based offset translation (approximate) - No rewind guarantee: offsets only move forward - Requires matching partition counts ## [](#how-it-works)How it works - Topics: Synced on demand. First write triggers creation. - Schema Registry: Synced at connect, then as needed. - Consumer Groups: Background loop, filtered by topic mappings. ## [](#guarantees)Guarantees - Topics created with intended partitioning/replication. - Existing topics respected. Mismatches logged. - Consumer group offsets never rewound. - ACL replication excludes unsafe grants. ## [](#limitations)Limitations - Destination Schema Registry must be in `READWRITE` or `IMPORT` mode. - Offset translation is best-effort. - Consumer group migration requires identical partition counts. ## [](#metrics)Metrics The component exposes comprehensive metrics for monitoring migration operations: | Metric Name | Type | Labels | Description | | --- | --- | --- | --- | | Topic migration metrics | | | | | redpanda_migrator_topics_created_total | counter | | Total topics created on destination | | redpanda_migrator_topic_create_errors_total | counter | | Topic creation errors | | redpanda_migrator_topic_create_latency_ns | timer | | Topic creation latency (ns) | | Schema Registry migration metrics | | | | | redpanda_migrator_sr_schemas_created_total | counter | | Schemas created in destination registry | | redpanda_migrator_sr_schema_create_errors_total | counter | | Schema creation errors | | redpanda_migrator_sr_schema_create_latency_ns | timer | | Schema creation latency (ns) | | redpanda_migrator_sr_compatibility_updates_total | counter | | Compatibility level updates applied | | redpanda_migrator_sr_compatibility_update_errors_total | counter | | Compatibility update errors | | redpanda_migrator_sr_compatibility_update_latency_ns | timer | | Compatibility update latency (ns) | | Consumer group migration metrics | | | | | redpanda_migrator_cg_offsets_translated_total | counter | group | Offsets translated per consumer group | | redpanda_migrator_cg_offset_translation_errors_total | counter | group | Offset translation errors per group | | redpanda_migrator_cg_offset_translation_latency_ns | timer | group | Offset translation latency per group (ns) | | redpanda_migrator_cg_offsets_committed_total | counter | group | Offsets committed per consumer group | | redpanda_migrator_cg_offset_commit_errors_total | counter | group | Offset commit errors per group | | redpanda_migrator_cg_offset_commit_latency_ns | timer | group | Offset commit latency per group (ns) | | Consumer lag metrics | | | | | redpanda_lag | gauge | topic, partition | Current consumer lag in messages for each topic partition. Shows difference between high water mark and current consumer position. | ## [](#examples)Examples ### [](#basic-migration)Basic migration Migrate topics, schemas and consumer groups from source to destination. ```yaml input: redpanda_migrator: seed_brokers: ["source:9092"] topics: ["orders", "payments"] consumer_group: "migration" output: redpanda_migrator: seed_brokers: ["destination:9092"] # Write to the same topic name topic: ${! metadata("kafka_topic") } schema_registry: url: "http://dest-registry:8081" translate_ids: true consumer_groups: interval: 1m ``` ### [](#migration-to-redpanda-serverless)Migration to Redpanda Serverless Migrate from Confluent/Kafka to Redpanda Cloud serverless cluster with authentication. ```yaml input: redpanda_migrator: seed_brokers: ["source-kafka:9092"] regexp_topics_include: - '.' regexp_topics_exclude: - '^_' consumer_group: "migrator_cg" schema_registry: url: "http://source-registry:8081" output: redpanda_migrator: seed_brokers: ["serverless-cluster.redpanda.com:9092"] tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: "migrator" password: "migrator" schema_registry: url: "https://serverless-cluster.redpanda.com:8081" basic_auth: enabled: true username: "migrator" password: "migrator" translate_ids: true consumer_groups: exclude: - "migrator_cg" # Exclude the migration consumer group itself serverless: true # Enable serverless mode for restricted configurations ``` ## [](#fields)Fields ### [](#allow_auto_topic_creation)`allow_auto_topic_creation` Enables topics to be auto created if they do not exist when fetching their metadata. **Type**: `bool` **Default**: `true` ### [](#broker_write_max_bytes)`broker_write_max_bytes` The maximum number of bytes this output can write to a broker connection in a single write. This field corresponds to Kafka’s `socket.request.max.bytes`. **Type**: `string` **Default**: `100MiB` ```yaml # Examples: broker_write_max_bytes: 128MB # --- broker_write_max_bytes: 50mib ``` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#compression)`compression` Set an explicit compression type (optional). The default preference is to use `snappy` when the broker supports it. Otherwise, use `none`. **Type**: `string` **Options**: `lz4`, `snappy`, `gzip`, `none`, `zstd` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#consumer_groups)`consumer_groups` **Type**: `object` ### [](#consumer_groups-enabled)`consumer_groups.enabled` Whether consumer group offset migration is enabled. When disabled, no consumer group operations are performed. **Type**: `bool` **Default**: `true` ### [](#consumer_groups-exclude)`consumer_groups.exclude[]` Regular expressions for consumer groups to exclude from offset migration. Takes precedence over include patterns. Useful for excluding system or temporary groups. **Type**: `array` ```yaml # Examples: exclude: [".*-test", ".*-temp", "connect-.*"] # --- exclude: ["dev-.*", "local-.*"] ``` ### [](#consumer_groups-fetch_timeout)`consumer_groups.fetch_timeout` Maximum time to wait for data when fetching records for timestamp-based offset translation. Increase for clusters with low message throughput. **Type**: `string` **Default**: `10s` ```yaml # Examples: fetch_timeout: 1s # Fast clusters # --- fetch_timeout: 10s # Slower clusters ``` ### [](#consumer_groups-include)`consumer_groups.include[]` Regular expressions for consumer groups to include in offset migration. If empty, all groups are included (unless excluded). **Type**: `array` ```yaml # Examples: include: ["prod-.*", "staging-.*"] # --- include: ["app-.*", "service-.*"] ``` ### [](#consumer_groups-interval)`consumer_groups.interval` How often to synchronise consumer group offsets. Regular syncing helps maintain offset accuracy during ongoing migration. **Type**: `string` **Default**: `1m` ```yaml # Examples: interval: 0s # Disabled # --- interval: 30s # Sync every 30 seconds # --- interval: 5m # Sync every 5 minutes ``` ### [](#consumer_groups-only_empty)`consumer_groups.only_empty` Whether to only migrate Empty consumer groups. When false (default), all statuses except Dead are included; when true, only Empty groups are migrated. **Type**: `bool` **Default**: `false` ### [](#idempotent_write)`idempotent_write` Enable the idempotent write producer option. This requires the `IDEMPOTENT_WRITE` permission on `CLUSTER`. Disable this option if the `IDEMPOTENT_WRITE` permission is unavailable. **Type**: `bool` **Default**: `true` ### [](#max_in_flight)`max_in_flight` The maximum number of batches to send in parallel at any given time. Increase this value to improve throughput during migration. For optimal performance, set this to match the total number of partitions being migrated. Setting it higher than the partition count provides no additional benefit, as each partition can only have one in-flight batch at a time. Example: If migrating 100 partitions, set `max_in_flight: 100` for maximum throughput. **Type**: `int` **Default**: `10` ```yaml # Examples: max_in_flight: 64 # For a cluster with 64 partitions # --- max_in_flight: 128 # For multiple topics with combined 128 partitions ``` ### [](#max_message_bytes)`max_message_bytes` The maximum space in bytes that an individual message may use. Messages larger than this value are rejected. This field corresponds to Kafka’s `max.message.bytes`. **Type**: `string` **Default**: `1MiB` ```yaml # Examples: max_message_bytes: 100MB # --- max_message_bytes: 50mib ``` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#offset_header)`offset_header` The name of a message header to add to migrated records. This header contains the source offset, enabling exact consumer group offset translation during migration. When left empty (default), no offset header is added and consumer groups are migrated using timestamp-based positioning. This approach works well for most cases, but may be imprecise for consumer groups with no committed offsets when multiple records share the same timestamp (timestamps have millisecond resolution). Set this field to enable precise offset translation, especially when migrating consumer groups that are caught up or have minimal lag. Note: This header is only added when consumer group migration is enabled. **Type**: `string` **Default**: `redpanda-migrator-offset` ### [](#partitioner)`partitioner` Override the default murmur2 hashing partitioner. **Type**: `string` | Option | Summary | | --- | --- | | least_backup | Chooses the least backed up partition (the partition with the fewest amount of buffered records). Partitions are selected per batch. | | manual | Manually select a partition for each message, requires the field partition to be specified. | | murmur2_hash | Kafka’s default hash algorithm that uses a 32-bit murmur2 hash of the key to compute which partition the record will be on. | | round_robin | Round-robin’s messages through all available partitions. This algorithm has lower throughput and causes higher CPU load on brokers, but can be useful if you want to ensure an even distribution of records to partitions. | ### [](#provenance_header)`provenance_header` Header name to add to migrated records indicating their source cluster. When set, each migrated message receives a header with this name containing the source cluster’s seed broker addresses, enabling downstream systems to track message origins for auditing, debugging, or multi-cluster orchestration workflows. If empty, no provenance header is added to messages. The header value format is a comma-separated list of the source cluster’s `seed_brokers`. Example: Setting `provenance_header: "rp-source-cluster"` adds a header like `rp-source-cluster: "kafka-1:9092,kafka-2:9092"`. **Type**: `string` **Default**: `redpanda-migrator-provenance` ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter, but with the option to extend the timeout deadline. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods of SASL authentication, which are tried in order. If the broker supports the first mechanism, all connections will use that mechanism. If the first mechanism fails, the client picks the first supported mechanism. Connections fail if the broker does not support any client mechanisms. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#schema_registry)`schema_registry` Configuration for schema registry integration. Enables migration of schema subjects, versions, and compatibility settings between clusters. **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#schema_registry-enabled)`schema_registry.enabled` Whether schema registry migration is enabled. When disabled, no schema operations are performed. **Type**: `bool` **Default**: `true` ### [](#schema_registry-exclude)`schema_registry.exclude[]` Regular expressions for schema subjects to exclude from migration. Takes precedence over include patterns. Note: the migrator consumer group is always ignored. **Type**: `array` ```yaml # Examples: exclude: [".*-test", ".*-temp"] # --- exclude: ["dev-.*", "local-.*"] ``` ### [](#schema_registry-include)`schema_registry.include[]` Regular expressions for schema subjects to include in migration. If empty, all subjects are included (unless excluded). Note: the migrator consumer group is always ignored. **Type**: `array` ```yaml # Examples: include: ["prod-.*", "staging-.*"] # --- include: ["user-.*", "order-.*"] ``` ### [](#schema_registry-include_deleted)`schema_registry.include_deleted` Whether to include soft-deleted schemas in migration. Useful for complete migration but may not be supported by all schema registries. **Type**: `bool` **Default**: `false` ### [](#schema_registry-interval)`schema_registry.interval` How often to synchronise schema registry subjects. Set to 0s for one-time sync at startup only. **Type**: `string` **Default**: `5m` ```yaml # Examples: interval: 0s # One-time sync only # --- interval: 5m # Sync every 5 minutes # --- interval: 30m # Sync every 30 minutes ``` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#schema_registry-max_parallel_http_requests)`schema_registry.max_parallel_http_requests` Maximum number of parallel HTTP requests to the schema registry. Controls concurrency when syncing multiple schemas. **Type**: `int` **Default**: `10` ### [](#schema_registry-normalize)`schema_registry.normalize` Whether to normalize schemas when creating them in the destination registry. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth)`schema_registry.oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-strict)`schema_registry.strict` Error on unknown schema IDs. Only relevant when translate\_ids is true. When false (default), unknown schema IDs are passed through unchanged, allowing migration of topics with mixed message formats. Note: messages with 0-byte prefixes (e.g., protobuf) cannot be distinguished from schema registry headers and may fail when strict is enabled. **Type**: `bool` **Default**: `false` ### [](#schema_registry-subject)`schema_registry.subject` Template for transforming subject names during migration. Use interpolation to rename subjects systematically. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: subject: prod_${! metadata("schema_registry_subject") } # --- subject: ${! metadata("schema_registry_subject") | replace("dev_", "prod_") } ``` ### [](#schema_registry-timeout)`schema_registry.timeout` HTTP client timeout for schema registry requests. **Type**: `string` **Default**: `5s` ### [](#schema_registry-tls)`schema_registry.tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-enabled)`schema_registry.tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#schema_registry-translate_ids)`schema_registry.translate_ids` Whether to translate schema IDs during migration. **Type**: `bool` **Default**: `false` ### [](#schema_registry-url)`schema_registry.url` The base URL of the schema registry service. Required for schema migration functionality. **Type**: `string` ```yaml # Examples: url: http://localhost:8081 # --- url: https://schema-registry.example.com:8081 ``` ### [](#schema_registry-versions)`schema_registry.versions` Which schema versions to migrate. 'latest' migrates only the current version, 'all' migrates complete version history for better compatibility. **Type**: `string` **Default**: `all` **Options**: `latest`, `all` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#serverless)`serverless` Enable serverless mode for Redpanda Cloud serverless clusters. This restricts topic configurations and schema features to those supported by serverless environments. **Type**: `bool` **Default**: `false` ### [](#sync_topic_acls)`sync_topic_acls` Whether to synchronise topic ACLs from source to destination cluster. ACLs are transformed safely: ALLOW WRITE permissions are excluded, and ALLOW ALL is downgraded to ALLOW READ to prevent conflicts. **Type**: `bool` **Default**: `false` ### [](#sync_topic_interval)`sync_topic_interval` How often to synchronize topics from the source cluster to the destination. This creates destination topics for any new source topics, including empty topics with no message flow. Set to 0s to disable periodic sync (topics are still created on first message). **Type**: `string` **Default**: `5m` ```yaml # Examples: sync_topic_interval: 0s # Disable periodic sync # --- sync_topic_interval: 1m # Sync every minute # --- sync_topic_interval: 5m # Sync every 5 minutes ``` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period of time to wait for message sends before abandoning the request and retrying. **Type**: `string` **Default**: `10s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` A topic to write messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `${! @kafka_topic }` ```yaml # Examples: topic: prod_${! @kafka_topic } ``` ### [](#topic_replication_factor)`topic_replication_factor` The replication factor for created topics. If not specified, inherits the replication factor from source topics. Useful when migrating to clusters with different sizes. **Type**: `int` ```yaml # Examples: topic_replication_factor: 3 # --- topic_replication_factor: 1 # For single-node clusters ``` --- # Page 185: redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/redpanda.md --- # redpanda --- title: redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/redpanda page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/redpanda.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/redpanda.adoc page-git-created-date: "2024-11-19" page-git-modified-date: "2025-10-24" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/redpanda/)[Cache](/redpanda-cloud/develop/connect/components/caches/redpanda/)[Input](/redpanda-cloud/develop/connect/components/inputs/redpanda/)[Tracer](/redpanda-cloud/develop/connect/components/tracers/redpanda/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/redpanda/ "View the Self-Managed version of this component") Sends message data to Kafka brokers and waits for acknowledgement before propagating any acknowledgements back to the input. #### Common ```yml outputs: label: "" redpanda: seed_brokers: [] # No default (optional) topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] max_in_flight: 256 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" redpanda: seed_brokers: [] # No default (optional) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s topic: "" # No default (required) key: "" # No default (optional) partition: "" # No default (optional) metadata: include_prefixes: [] include_patterns: [] timestamp_ms: "" # No default (optional) max_in_flight: 256 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) inject_tracing_map: "" # No default (optional) partitioner: "" # No default (optional) idempotent_write: true compression: "" # No default (optional) allow_auto_topic_creation: true timeout: 10s max_message_bytes: 1MiB broker_write_max_bytes: 100MiB ``` ## [](#fields)Fields ### [](#allow_auto_topic_creation)`allow_auto_topic_creation` Enables topics to be auto created if they do not exist when fetching their metadata. **Type**: `bool` **Default**: `true` ### [](#batching)`batching` Optional explicit batching policy for the output. Note that when batches are formed at the input level they can be expanded by this policy, but not contracted. When consuming data from a Redpanda input it is recommended to tune batches from the input config via the `max_yield_batch_bytes` field, or the `unordered_processing.batching` field if appropriate. **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#broker_write_max_bytes)`broker_write_max_bytes` The maximum number of bytes this output can write to a broker connection in a single write. This field corresponds to Kafka’s `socket.request.max.bytes`. **Type**: `string` **Default**: `100MiB` ```yaml # Examples: broker_write_max_bytes: 128MB # --- broker_write_max_bytes: 50mib ``` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#compression)`compression` Set an explicit compression type (optional). The default preference is to use `snappy` when the broker supports it. Otherwise, use `none`. **Type**: `string` **Options**: `lz4`, `snappy`, `gzip`, `none`, `zstd` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#idempotent_write)`idempotent_write` Enable the idempotent write producer option. This requires the `IDEMPOTENT_WRITE` permission on `CLUSTER`. Disable this option if the `IDEMPOTENT_WRITE` permission is not available. **Type**: `bool` **Default**: `true` ### [](#inject_tracing_map)`inject_tracing_map` EXPERIMENTAL: A [Bloblang mapping](../../../guides/bloblang/about/) used to inject an object containing tracing propagation information into outbound messages. The specification of the injected fields will match the format used by the service wide tracer. **Type**: `string` ```yaml # Examples: inject_tracing_map: meta = @.merge(this) # --- inject_tracing_map: root.meta.span = this ``` ### [](#key)`key` An optional key to populate for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput until performance plateaus. **Type**: `int` **Default**: `256` ### [](#max_message_bytes)`max_message_bytes` The maximum space (in bytes) that an individual message may use. Messages larger than this value are rejected. This field corresponds to Kafka’s `max.message.bytes`. **Type**: `string` **Default**: `1MiB` ```yaml # Examples: max_message_bytes: 100MB # --- max_message_bytes: 50mib ``` ### [](#metadata)`metadata` Configure which metadata values are added to messages as headers. This allows you to pass additional context information along with your messages. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. Lower values provide more responsive topic and partition discovery but may increase broker load. Higher values reduce broker queries but can delay detection of topology changes. **Type**: `string` **Default**: `1m` ### [](#partition)`partition` Set a partition for each message (optional). This field is only relevant when the `partitioner` is set to `manual`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). You must provide an interpolation string that is a valid integer. **Type**: `string` ```yaml # Examples: partition: ${! meta("partition") } ``` ### [](#partitioner)`partitioner` Override the default murmur2 hashing partitioner. **Type**: `string` | Option | Summary | | --- | --- | | least_backup | Chooses the least backed up partition (the partition with the fewest amount of buffered records). Partitions are selected per batch. | | manual | Manually select a partition for each message, requires the field partition to be specified. | | murmur2_hash | Kafka’s default hash algorithm that uses a 32-bit murmur2 hash of the key to compute which partition the record will be on. | | round_robin | Round-robin’s messages through all available partitions. This algorithm has lower throughput and causes higher CPU load on brokers, but can be useful if you want to ensure an even distribution of records to partitions. | ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter, but with the option to extend the timeout deadline. **Type**: `string` **Default**: `10s` ### [](#sasl)`sasl[]` Specify one or more methods or mechanisms of SASL authentication, which are attempted in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. Optional when `seed_brokers` is configured in a top-level `redpanda` block. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period of time to wait for message sends before abandoning the request and retrying. **Type**: `string` **Default**: `10s` ### [](#timestamp_ms)`timestamp_ms` Set a timestamp (in milliseconds) for each message (optional). When left empty, the current timestamp is used. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: timestamp_ms: ${! timestamp_unix_milli() } # --- timestamp_ms: ${! metadata("kafka_timestamp_ms") } ``` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` A topic to write messages to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 186: reject_errored **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/reject_errored.md --- # reject\_errored --- title: reject_errored latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/reject_errored page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/reject_errored.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/reject_errored.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/reject_errored/ "View the Self-Managed version of this component") Rejects messages that have failed their processing steps, resulting in nack behavior at the input level, otherwise sends them to a child output. ```yml # Config fields, showing default values output: label: "" reject_errored: null # No default (required) ``` The routing of messages rejected by this output depends on the type of input it came from. For inputs that support propagating nacks upstream such as AMQP or NATS the message will be nacked. However, for inputs that are sequential such as files or Kafka the messages will simply be reprocessed from scratch. ## [](#examples)Examples ### [](#rejecting-failed-messages)Rejecting Failed Messages The most straight forward use case for this output type is to nack messages that have failed their processing steps. In this example our mapping might fail, in which case the messages that failed are rejected and will be nacked by our input: ```yaml input: nats_jetstream: urls: [ nats://127.0.0.1:4222 ] subject: foos.pending pipeline: processors: - mutation: 'root.age = this.fuzzy.age.int64()' output: reject_errored: nats_jetstream: urls: [ nats://127.0.0.1:4222 ] subject: foos.processed ``` ### [](#dlqing-failed-messages)DLQing Failed Messages Another use case for this output is to send failed messages straight into a dead-letter queue. You use it within a [fallback output](../fallback/) that allows you to specify where these failed messages should go to next. ```yaml pipeline: processors: - mutation: 'root.age = this.fuzzy.age.int64()' output: fallback: - reject_errored: http_client: url: http://foo:4195/post/might/become/unreachable retries: 3 retry_period: 1s - http_client: url: http://bar:4196/somewhere/else retries: 3 retry_period: 1s ``` --- # Page 187: reject **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/reject.md --- # reject --- title: reject latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/reject page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/reject.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/reject.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/reject/ "View the Self-Managed version of this component") Rejects all messages, treating them as though the output destination failed to publish them. ```yml # Config fields, showing default values output: label: "" reject: "" ``` The routing of messages after this output depends on the type of input it came from. For inputs that support propagating nacks upstream such as AMQP or NATS the message will be nacked. However, for inputs that are sequential such as files or Kafka the messages will simply be reprocessed from scratch. To learn when this output could be useful, see \[the [Examples](#examples). ## [](#examples)Examples ### [](#rejecting-failed-messages)Rejecting Failed Messages This input is particularly useful for routing messages that have failed during processing, where instead of routing them to some sort of dead letter queue we wish to push the error upstream. We can do this with a switch broker: ```yaml output: switch: retry_until_success: false cases: - check: '!errored()' output: amqp_1: urls: [ amqps://guest:guest@localhost:5672/ ] target_address: queue:/the_foos - output: reject: "processing failed due to: ${! error() }" ``` --- # Page 188: resource **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/resource.md --- # resource --- title: resource latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/resource page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/resource.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/resource.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/resource/)[Input](/redpanda-cloud/develop/connect/components/inputs/resource/)[Processor](/redpanda-cloud/develop/connect/components/processors/resource/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/resource/ "View the Self-Managed version of this component") Resource is an output type that channels messages to a resource output, identified by its name. ```yml # Config fields, showing default values output: resource: "" ``` Resources allow you to tidy up deeply nested configs. For example, the config: ```yaml output: broker: pattern: fan_out outputs: - kafka: addresses: [ TODO ] topic: foo - gcp_pubsub: project: bar topic: baz ``` Could also be expressed as: ```yaml output: broker: pattern: fan_out outputs: - resource: foo - resource: bar output_resources: - label: foo kafka: addresses: [ TODO ] topic: foo - label: bar gcp_pubsub: project: bar topic: baz ``` --- # Page 189: retry **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/retry.md --- # retry --- title: retry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/retry page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/retry.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/retry.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/retry/)[Processor](/redpanda-cloud/develop/connect/components/processors/retry/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/retry/ "View the Self-Managed version of this component") Attempts to write messages to a child output and if the write fails for any reason the message is retried either until success or, if the retries or max elapsed time fields are non-zero, either is reached. #### Common ```yml outputs: label: "" retry: output: "" # No default (required) ``` #### Advanced ```yml outputs: label: "" retry: max_retries: 0 backoff: initial_interval: 500ms max_interval: 3s max_elapsed_time: 0s output: "" # No default (required) ``` All messages in Redpanda Connect are always retried on an output error, but this would usually involve propagating the error back to the source of the message, whereby it would be reprocessed before reaching the output layer once again. This output type is useful whenever we wish to avoid reprocessing a message on the event of a failed send. We might, for example, have a deduplication processor that we want to avoid reapplying to the same message more than once in the pipeline. Rather than retrying the same output you may wish to retry the send using a different output target (a dead letter queue). In which case you should instead use the [`fallback`](../fallback/) output type. ## [](#fields)Fields ### [](#backoff)`backoff` Control time intervals between retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `500ms` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum period to wait before retry attempts are abandoned. If zero then no limit is used. **Type**: `string` **Default**: `0s` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts. **Type**: `string` **Default**: `3s` ### [](#max_retries)`max_retries` The maximum number of retries before giving up on the request. If set to zero there is no discrete limit. **Type**: `int` **Default**: `0` ### [](#output)`output` A child output. **Type**: `output` --- # Page 190: schema_registry **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/schema_registry.md --- # schema\_registry --- title: schema_registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/schema_registry page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/schema_registry.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/schema_registry.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/schema_registry/)[Input](/redpanda-cloud/develop/connect/components/inputs/schema_registry/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/schema_registry/ "View the Self-Managed version of this component") Publishes schemas to a schema registry. This output uses the [Franz Kafka Schema Registry client](https://github.com/twmb/franz-go/tree/master/pkg/sr). #### Common ```yml outputs: label: "" schema_registry: url: "" # No default (required) subject: "" # No default (required) max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" schema_registry: url: "" # No default (required) subject: "" # No default (required) subject_compatibility_level: "" # No default (optional) backfill_dependencies: true translate_ids: false normalize: true remove_metadata: true remove_rule_set: true input_resource: schema_registry_input tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] max_in_flight: 64 oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} ``` ## [](#performance)Performance The `schema_registry` output sends multiple messages in parallel for improved performance. You can use the `max_in_flight` field to tune the maximum number of in-flight messages, or message batches. ## [](#example)Example This example writes schemas to a schema registry instance and logs errors for existing schemas. ```yaml output: fallback: - schema_registry: url: http://localhost:8082 subject: ${! @schema_registry_subject } - switch: cases: - check: '@fallback_error == "request returned status: 422"' output: drop: {} processors: - log: message: | Subject '${! @schema_registry_subject }' version ${! @schema_registry_version } already has schema: ${! content() } - output: reject: ${! @fallback_error } ``` ## [](#fields)Fields ### [](#backfill_dependencies)`backfill_dependencies` Backfill missing schema references and previous schema versions. If set to `true`, you must also configure a [`schema_registry`](../../inputs/schema_registry/) input to read source schemas. **Type**: `bool` **Default**: `true` ### [](#basic_auth)`basic_auth` Configure basic authentication for requests from this component to your schema registry. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` The username of the account credentials to authenticate as. Used together with `password` for basic authentication. **Type**: `string` **Default**: `""` ### [](#input_resource)`input_resource` The label of the [`schema_registry` input](../../inputs/schema_registry/) from which to read source schemas. **Type**: `string` **Default**: `schema_registry_input` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication for secure data transmission from this component to your schema registry. This feature is in beta and may change in future releases. **Type**: `object` ### [](#jwt-claims)`jwt.claims` Values used to pass the identity of the authenticated entity to the service provider. In this case, between this component and the schema registry. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` The key/value pairs that identify the type of token and signing algorithm. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` A PEM-encoded file containing a private key that is formatted using either PKCS1 or PKCS8 standards. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The method used to sign the token, such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput. **Type**: `int` **Default**: `64` ### [](#normalize)`normalize` Normalize schemas. **Type**: `bool` **Default**: `true` ### [](#oauth)`oauth` Configure OAuth version 1.0 to give this component authorized access to your schema registry. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` The value this component can use to gain access to the schema registry. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` The value used to identify this component or client to your schema registry. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#remove_metadata)`remove_metadata` Removes metadata fields from schema output. Use this to produce leaner schema definitions for downstream consumers or when metadata is not required. **Type**: `bool` **Default**: `true` ### [](#remove_rule_set)`remove_rule_set` Removes rule set definitions from schema output. Useful for simplifying schemas when rule sets are not required by consumers or applications. **Type**: `bool` **Default**: `true` ### [](#subject)`subject` The subject name. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#subject_compatibility_level)`subject_compatibility_level` The compatibility level for the subject. Can be one of `BACKWARD`, `BACKWARD_TRANSITIVE`, `FORWARD`, `FORWARD_TRANSITIVE`, `FULL`, `FULL_TRANSITIVE`, `NONE`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#translate_ids)`translate_ids` When set to `true`, this field automatically translates the schema ID in each message to match the corresponding schema in the destination schema registry. The updated message is then written to the destination schema registry. **Type**: `bool` **Default**: `false` ### [](#url)`url` The base URL of the schema registry service. **Type**: `string` --- # Page 191: sftp **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/sftp.md --- # sftp --- title: sftp latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/sftp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/sftp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/sftp.adoc categories: "[\"Network\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/sftp/)[Input](/redpanda-cloud/develop/connect/components/inputs/sftp/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/sftp/ "View the Self-Managed version of this component") Writes files to an SFTP server. #### Common ```yml outputs: label: "" sftp: address: "" # No default (required) credentials: username: "" password: "" host_public_key_file: "" # No default (optional) host_public_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key: "" # No default (optional) private_key_pass: "" path: "" # No default (required) codec: all-bytes max_in_flight: 64 ``` #### Advanced ```yml outputs: label: "" sftp: address: "" # No default (required) connection_timeout: 30s credentials: username: "" password: "" host_public_key_file: "" # No default (optional) host_public_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key: "" # No default (optional) private_key_pass: "" path: "" # No default (required) codec: all-bytes max_in_flight: 64 ``` In order to have a different path for each object you should use function interpolations described [here](../../../configuration/interpolation/#bloblang-queries). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. ## [](#fields)Fields ### [](#address)`address` The address (hostname or IP address) of the SFTP server to connect to. **Type**: `string` ### [](#codec)`codec` The way in which the bytes of messages should be written out into the output data stream. It’s possible to write lines using a custom delimiter with the `delim:x` codec, where x is the character sequence custom delimiter. **Type**: `string` **Default**: `all-bytes` | Option | Summary | | --- | --- | | all-bytes | Only applicable to file based outputs. Writes each message to a file in full, if the file already exists the old content is deleted. | | append | Append each message to the output stream without any delimiter or special encoding. | | delim:x | Append each message to the output stream followed by a custom delimiter. | | lines | Append each message to the output stream followed by a line break. | ```yaml # Examples: codec: lines # --- codec: delim: # --- codec: delim:foobar ``` ### [](#connection_timeout)`connection_timeout` The connection timeout to use when connecting to the target server. **Type**: `string` **Default**: `30s` ### [](#credentials)`credentials` The credentials required to log in to the SFTP server. This can include a username and password, or a private key for secure access. **Type**: `object` ### [](#credentials-host_public_key)`credentials.host_public_key` The raw contents of the SFTP server’s public key, used for host key verification. **Type**: `string` ### [](#credentials-host_public_key_file)`credentials.host_public_key_file` The path to the SFTP server’s public key file, used for host key verification. **Type**: `string` ### [](#credentials-password)`credentials.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#credentials-private_key)`credentials.private_key` The private key used to authenticate with the SFTP server. This field provides an alternative to the [`private_key_file`](#credentials-private_key_file). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-private_key_file)`credentials.private_key_file` The path to a private key file used to authenticate with the SFTP server. You can also provide a private key using the [`private_key`](#credentials-private_key) field. **Type**: `string` ### [](#credentials-private_key_pass)`credentials.private_key_pass` A passphrase for private key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#credentials-username)`credentials.username` The username required to authenticate with the SFTP server. **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#path)`path` The file to save the messages to on the SFTP server. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 192: slack_post **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/slack_post.md --- # slack\_post --- title: slack_post latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/slack_post page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/slack_post.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/slack_post.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/slack_post/ "View the Self-Managed version of this component") Posts a new message to a Slack channel using the Slack API method [chat.postMessage](https://api.slack.com/methods/chat.postMessage). ```yml # Common configuration fields, showing default values output: label: "" slack_post: bot_token: "" # No default (required) channel_id: "" # No default (required) thread_ts: "" # No default (optional) text: "" # No default (optional) blocks: "" # No default (optional) markdown: true unfurl_links: false unfurl_media: true link_names: 0 ``` See also: [Examples](#examples) ## [](#fields)Fields ### [](#blocks)`blocks` A Bloblang query that should return a JSON array of [Slack blocks](https://api.slack.com/reference/block-kit/blocks). You can either specify message content in the `text` or `blocks` fields, but not both. **Type**: `string` ### [](#bot_token)`bot_token` Your Slack bot user’s OAuth token, which must have the correct permissions to post messages to the target Slack channel. **Type**: `string` ### [](#channel_id)`channel_id` The encoded ID of the target Slack channel. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#link_names)`link_names` When set to `1`, this output finds and links to [user groups](https://api.slack.com/reference/surfaces/formatting#mentioning-groups) mentioned in Slack messages. **Type**: `bool` **Default**: `false` ### [](#markdown)`markdown` When set to `true`, this output accepts message content in Markdown format. **Type**: `bool` **Default**: `true` ### [](#text)`text` The text content of the message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). You can either specify message content in the `text` or `blocks` fields, but not both. **Type**: `string` **Default**: `""` ### [](#thread_ts)`thread_ts` Specify the thread timestamp (`ts` value) of another message to post a reply within the same thread. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#unfurl_links)`unfurl_links` When set to `true`, this output provides previews of linked content in Slack messages. For more information about unfurling links, see the [Slack documentation](https://api.slack.com/reference/messaging/link-unfurling). **Type**: `bool` **Default**: `false` ### [](#unfurl_media)`unfurl_media` When set to `true`, this output provides previews of rich content in Slack messages, such as videos or embedded tweets. **Type**: `bool` **Default**: `true` ## [](#examples)Examples ### [](#echo-slackbot)Echo Slackbot A slackbot that echo messages from other users ```yaml input: slack: app_token: "${APP_TOKEN:xapp-demo}" bot_token: "${BOT_TOKEN:xoxb-demo}" pipeline: processors: - mutation: | # ignore hidden or non message events if this.event.type != "message" || (this.event.hidden | false) { root = deleted() } # Don't respond to our own messages if this.authorizations.any(auth -> auth.user_id == this.event.user) { root = deleted() } output: slack_post: bot_token: "${BOT_TOKEN:xoxb-demo}" channel_id: "${!this.event.channel}" thread_ts: "${!this.event.ts}" text: "ECHO: ${!this.event.text}" ``` --- # Page 193: slack_reaction **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/slack_reaction.md --- # slack\_reaction --- title: slack_reaction latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/slack_reaction page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/slack_reaction.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/slack_reaction.adoc categories: "[]" description: Add or remove an emoji reaction to a Slack message. page-git-created-date: "2025-07-08" page-git-modified-date: "2025-07-08" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/slack_reaction/ "View the Self-Managed version of this component") Add or remove an emoji reaction to a Slack message using [`reactions.add`](https://api.slack.com/methods/reactions.add) and [`reactions.remove`](https://api.slack.com/methods/reactions.remove). ```yaml output: label: "" slack_reaction: bot_token: "" # No default (required) channel_id: "" # No default (required) timestamp: "" # No default (required) emoji: "" # No default (required) action: add max_in_flight: 64 ``` ## [](#fields)Fields ### [](#action)`action` Whether to add or remove the reaction. When set to `add`, the specified emoji reaction is applied to the target message. When set to `remove`, the emoji reaction is removed from the target message. **Type**: `string` **Default**: `add` **Options**: `add`, `remove` ### [](#bot_token)`bot_token` Your Slack Bot User OAuth token used to authenticate the API request. This token must have the necessary `reactions:write` and `channels:read` (or related) scopes. It typically begins with `xoxb-`. **Type**: `string` ### [](#channel_id)`channel_id` The unique Slack channel ID where the target message resides. Channel IDs usually start with `C` for public channels or `G` for private channels. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#emoji)`emoji` The name of the emoji to be added or removed, without surrounding colons. Use the plain emoji name, such as `thumbsup` or `tada`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increasing this value can improve throughput in high-volume scenarios, but be cautious not to exceed Slack’s API rate limits. **Type**: `int` **Default**: `64` ### [](#timestamp)`timestamp` The timestamp of the message to react to. This is a unique identifier for the message, usually obtained from a previous Slack API call (such as `chat.postMessage` or `conversations.history`). It typically looks like a Unix timestamp with a decimal. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 194: snowflake_put **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/snowflake_put.md --- # snowflake\_put --- title: snowflake_put latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/snowflake_put page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/snowflake_put.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/snowflake_put.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/snowflake_put/ "View the Self-Managed version of this component") > 💡 **TIP** > > Use the [`snowflake_streaming` output](../snowflake_streaming/) for improved performance, cost-effectiveness, and ease of use. Sends messages to Snowflake stages and, optionally, calls Snowpipe to load this data into one or more tables. #### Common ```yml outputs: label: "" snowflake_put: account: "" # No default (required) region: "" # No default (optional) cloud: "" # No default (optional) user: "" # No default (required) password: "" # No default (optional) private_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key_pass: "" # No default (optional) role: "" # No default (required) database: "" # No default (required) warehouse: "" # No default (required) schema: "" # No default (required) stage: "" # No default (required) path: "" file_name: "" file_extension: "" compression: AUTO request_id: "" snowpipe: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 1 ``` #### Advanced ```yml outputs: label: "" snowflake_put: account: "" # No default (required) region: "" # No default (optional) cloud: "" # No default (optional) user: "" # No default (required) password: "" # No default (optional) private_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key_pass: "" # No default (optional) role: "" # No default (required) database: "" # No default (required) warehouse: "" # No default (required) schema: "" # No default (required) stage: "" # No default (required) path: "" file_name: "" file_extension: "" upload_parallel_threads: 4 compression: AUTO request_id: "" snowpipe: "" # No default (optional) client_session_keep_alive: false batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 1 ``` In order to use a different stage and / or Snowpipe for each message, you can use function interpolations as described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries). When using batching, messages are grouped by the calculated stage and Snowpipe and are streamed to individual files in their corresponding stage and, optionally, a Snowpipe `insertFiles` REST API call will be made for each individual file. ## [](#credentials)Credentials Two authentication mechanisms are supported: - User/password - Key Pair Authentication ### [](#userpassword)User/password This is a basic authentication mechanism which allows you to PUT data into a stage. However, it is not compatible with Snowpipe. ### [](#key-pair-authentication)Key pair authentication This authentication mechanism allows Snowpipe functionality, but it does require configuring an SSH Private Key beforehand. Please consult the [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication) for details on how to set it up and assign the Public Key to your user. Note that the Snowflake documentation [used to suggest](https://twitter.com/felipehoffa/status/1560811785606684672) using this command: ```bash openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 ``` to generate an encrypted SSH private key. However, in this case, it uses an encryption algorithm called `pbeWithMD5AndDES-CBC`, which is part of the PKCS#5 v1.5 and is considered insecure. Due to this, Redpanda Connect does not support it and, if you wish to use password-protected keys directly, you must use PKCS#5 v2.0 to encrypt them by using the following command (as the current Snowflake docs suggest): ```bash openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8 ``` If you have an existing key encrypted with PKCS#5 v1.5, you can re-encrypt it with PKCS#5 v2.0 using this command: ```bash openssl pkcs8 -in rsa_key_original.p8 -topk8 -v2 des3 -out rsa_key.p8 ``` Please consult the [pkcs8 command documentation](https://linux.die.net/man/1/pkcs8) for details on PKCS#5 algorithms. ## [](#batching)Batching It’s common to want to upload messages to Snowflake as batched archives. The easiest way to do this is to batch your messages at the output level and join the batch of messages with an [`archive`](../../processors/archive/) and/or [`compress`](../../processors/compress/) processor. For the optimal batch size, please consult the Snowflake [documentation](https://docs.snowflake.com/en/user-guide/data-load-considerations-prepare.html). ## [](#snowpipe)Snowpipe Given a table called `BENTHOS_TBL` with one column of type `variant`: ```sql CREATE OR REPLACE TABLE BENTHOS_DB.PUBLIC.BENTHOS_TBL(RECORD variant) ``` and the following `BENTHOS_PIPE` Snowpipe: ```sql CREATE OR REPLACE PIPE BENTHOS_DB.PUBLIC.BENTHOS_PIPE AUTO_INGEST = FALSE AS COPY INTO BENTHOS_DB.PUBLIC.BENTHOS_TBL FROM (SELECT * FROM @%BENTHOS_TBL) FILE_FORMAT = (TYPE = JSON COMPRESSION = AUTO) ``` you can configure Redpanda Connect to use the implicit table stage `@%BENTHOS_TBL` as the `stage` and `BENTHOS_PIPE` as the `snowpipe`. In this case, you must set `compression` to `AUTO` and, if using message batching, you’ll need to configure an [`archive`](../../processors/archive/) processor with the `concatenate` format. Since the `compression` is set to `AUTO`, the [gosnowflake](https://github.com/snowflakedb/gosnowflake) client library will compress the messages automatically so you don’t need to add a [`compress`](../../processors/compress/) processor for message batches. If you add `STRIP_OUTER_ARRAY = TRUE` in your Snowpipe `FILE_FORMAT` definition, then you must use `json_array` instead of `concatenate` as the archive processor format. > 📝 **NOTE** > > Only Snowpipes with `FILE_FORMAT` `TYPE` `JSON` are currently supported. ## [](#snowpipe-troubleshooting)Snowpipe troubleshooting Snowpipe [provides](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-rest-apis.html) the `insertReport` and `loadHistoryScan` REST API endpoints which can be used to get information about recent Snowpipe calls. In order to query them, you’ll first need to generate a valid JWT token for your Snowflake account. There are two methods for doing so: - Using the `snowsql` [utility](https://docs.snowflake.com/en/user-guide/snowsql.html): ```bash snowsql --private-key-path rsa_key.p8 --generate-jwt -a -u ``` - Using the Python `sql-api-generate-jwt` [utility](https://docs.snowflake.com/en/developer-guide/sql-api/authenticating.html#generating-a-jwt-in-python): ```bash python3 sql-api-generate-jwt.py --private_key_file_path=rsa_key.p8 --account= --user= ``` Once you successfully generate a JWT token and store it into the `JWT_TOKEN` environment variable, then you can, for example, query the `insertReport` endpoint using `curl`: ```bash curl -H "Authorization: Bearer ${JWT_TOKEN}" "https://.snowflakecomputing.com/v1/data/pipes/../insertReport" ``` If you need to pass in a valid `requestId` to any of these Snowpipe REST API endpoints, you can set a [uuid\_v4()](../../../guides/bloblang/functions/#uuid_v4) string in a metadata field called `request_id`, log it via the [`log`](../../processors/log/) processor and then configure `request_id: ${ @request_id }` ). Alternatively, you can [enable debug logging](../../logger/about/) and Redpanda Connect will print the Request IDs that it sends to Snowpipe. ## [](#general-troubleshooting)General troubleshooting The underlying [`gosnowflake` driver](https://github.com/snowflakedb/gosnowflake) requires write access to the default directory to use for temporary files. Please consult the [`os.TempDir`](https://pkg.go.dev/os#TempDir) docs for details on how to change this directory via environment variables. A silent failure can occur due to [this issue](https://github.com/snowflakedb/gosnowflake/issues/701), where the underlying [`gosnowflake` driver](https://github.com/snowflakedb/gosnowflake) doesn’t return an error and doesn’t log a failure if it can’t figure out the current username. One way to trigger this behavior is by running Redpanda Connect in a Docker container with a non-existent user ID (such as `--user 1000:1000`). ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#examples)Examples ### [](#kafka-realtime-brokers)Kafka / realtime brokers Upload message batches from realtime brokers such as Kafka persisting the batch partition and offsets in the stage path and filename similarly to the [Kafka Connector scheme](https://docs.snowflake.com/en/user-guide/kafka-connector-ts.html#step-1-view-the-copy-history-for-the-table) and call Snowpipe to load them into a table. When batching is configured at the input level, it is done per-partition. ```yaml input: redpanda: seed_brokers: - localhost:9092 topics: - foo consumer_group: rpcn max_yield_batch_bytes: 8MB processors: - mapping: | meta kafka_start_offset = meta("kafka_offset").from(0) meta kafka_end_offset = meta("kafka_offset").from(-1) meta batch_timestamp = if batch_index() == 0 { now() } - mapping: | meta batch_timestamp = if batch_index() != 0 { meta("batch_timestamp").from(0) } output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos/BENTHOS_TBL/${! @kafka_partition } file_name: ${! @kafka_start_offset }_${! @kafka_end_offset }_${! meta("batch_timestamp") } upload_parallel_threads: 4 compression: NONE snowpipe: BENTHOS_PIPE ``` ### [](#no-compression)No compression Upload concatenated messages into a `.json` file to a table stage without calling Snowpipe. ```yaml output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos upload_parallel_threads: 4 compression: NONE batching: count: 10 period: 3s processors: - archive: format: concatenate ``` ### [](#parquet-format-with-snappy-compression)Parquet format with snappy compression Upload concatenated messages into a `.parquet` file to a table stage without calling Snowpipe. ```yaml output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos file_extension: parquet upload_parallel_threads: 4 compression: NONE batching: count: 10 period: 3s processors: - parquet_encode: schema: - name: ID type: INT64 - name: CONTENT type: BYTE_ARRAY default_compression: snappy ``` ### [](#automatic-compression)Automatic compression Upload concatenated messages compressed automatically into a `.gz` archive file to a table stage without calling Snowpipe. ```yaml output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos upload_parallel_threads: 4 compression: AUTO batching: count: 10 period: 3s processors: - archive: format: concatenate ``` ### [](#deflate-compression)DEFLATE compression Upload concatenated messages compressed into a `.deflate` archive file to a table stage and call Snowpipe to load them into a table. ```yaml output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos upload_parallel_threads: 4 compression: DEFLATE snowpipe: BENTHOS_PIPE batching: count: 10 period: 3s processors: - archive: format: concatenate - mapping: | root = content().compress("zlib") ``` ### [](#raw_deflate-compression)RAW\_DEFLATE compression Upload concatenated messages compressed into a `.raw_deflate` archive file to a table stage and call Snowpipe to load them into a table. ```yaml output: snowflake_put: account: benthos user: test@benthos.dev private_key_file: path_to_ssh_key.pem role: ACCOUNTADMIN database: BENTHOS_DB warehouse: COMPUTE_WH schema: PUBLIC stage: "@%BENTHOS_TBL" path: benthos upload_parallel_threads: 4 compression: RAW_DEFLATE snowpipe: BENTHOS_PIPE batching: count: 10 period: 3s processors: - archive: format: concatenate - mapping: | root = content().compress("flate") ``` ## [](#fields)Fields ### [](#account)`account` Account name, which is the same as the [Account Identifier](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#where-are-account-identifiers-used). However, when using an [Account Locator](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#using-an-account-locator-as-an-identifier), the Account Identifier is formatted as `..` and this field needs to be populated using the `` part. **Type**: `string` ### [](#batching-2)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#client_session_keep_alive)`client_session_keep_alive` Enable Snowflake keepalive mechanism to prevent the client session from expiring after 4 hours (error 390114). **Type**: `bool` **Default**: `false` ### [](#cloud)`cloud` Optional cloud platform field which needs to be populated when using an [Account Locator](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#using-an-account-locator-as-an-identifier) and it must be set to the `` part of the Account Identifier (`..`). **Type**: `string` ```yaml # Examples: cloud: aws # --- cloud: gcp # --- cloud: azure ``` ### [](#compression)`compression` Compression type. **Type**: `string` **Default**: `AUTO` | Option | Summary | | --- | --- | | AUTO | Compression (gzip) is applied automatically by the output and messages must contain plain-text JSON. Default file_extension: gz. | | DEFLATE | Messages must be pre-compressed using the zlib algorithm (with zlib header, RFC1950). Default file_extension: deflate. | | GZIP | Messages must be pre-compressed using the gzip algorithm. Default file_extension: gz. | | NONE | No compression is applied and messages must contain plain-text JSON. Default file_extension: json. | | RAW_DEFLATE | Messages must be pre-compressed using the flate algorithm (without header, RFC1951). Default file_extension: raw_deflate. | | ZSTD | Messages must be pre-compressed using the Zstandard algorithm. Default file_extension: zst. | ### [](#database)`database` Database. **Type**: `string` ### [](#file_extension)`file_extension` Stage file extension. Will be derived from the configured `compression` if not set or empty. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: file_extension: csv # --- file_extension: parquet ``` ### [](#file_name)`file_name` Stage file name. Will be equal to the Request ID if not set or empty. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#max_in_flight)`max_in_flight` The maximum number of parallel message batches to have in flight at any given time. **Type**: `int` **Default**: `1` ### [](#password)`password` An optional password. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#path)`path` Stage path. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#private_key)`private_key` Your private SSH key. When using encrypted keys, you must also set a value for [`private_key_pass`](#private_key_pass). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#private_key_file)`private_key_file` The path to a file containing your private SSH key. When using encrypted keys, you must also set a value for [`private_key_pass`](#private_key_pass). **Type**: `string` ### [](#private_key_pass)`private_key_pass` The passphrase for your private SSH key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#region)`region` Optional region field which needs to be populated when using an [Account Locator](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#using-an-account-locator-as-an-identifier) and it must be set to the `` part of the Account Identifier (`..`). **Type**: `string` ```yaml # Examples: region: us-west-2 ``` ### [](#request_id)`request_id` Request ID. Will be assigned a random UUID (v4) string if not set or empty. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#role)`role` Role. **Type**: `string` ### [](#schema)`schema` Schema. **Type**: `string` ### [](#snowpipe-2)`snowpipe` An optional Snowpipe name. Use the `` part from `..`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#stage)`stage` Stage name. Use either one of the [supported](https://docs.snowflake.com/en/user-guide/data-load-local-file-system-create-stage.html) stage types. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#upload_parallel_threads)`upload_parallel_threads` Specifies the number of threads to use for uploading files. **Type**: `int` **Default**: `4` ### [](#user)`user` Username. **Type**: `string` ### [](#warehouse)`warehouse` Warehouse. **Type**: `string` --- # Page 195: snowflake_streaming **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/snowflake_streaming.md --- # snowflake\_streaming --- title: snowflake_streaming latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/snowflake_streaming page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/snowflake_streaming.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/snowflake_streaming.adoc page-git-created-date: "2024-11-19" page-git-modified-date: "2025-02-05" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/snowflake_streaming/ "View the Self-Managed version of this component") Allows Snowflake to ingest data from your data pipeline using [Snowpipe Streaming](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview). To help you configure your own `snowflake_streaming` output, this page includes [example data pipelines](#example-pipelines). #### Common ```yml outputs: label: "" snowflake_streaming: account: "" # No default (required) user: "" # No default (required) role: "" # No default (required) database: "" # No default (required) schema: "" # No default (required) table: "" # No default (required) private_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key_pass: "" # No default (optional) mapping: "" # No default (optional) init_statement: "" # No default (optional) schema_evolution: enabled: "" # No default (required) ignore_nulls: true processors: [] # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 4 ``` #### Advanced ```yml outputs: label: "" snowflake_streaming: account: "" # No default (required) url: "" # No default (optional) user: "" # No default (required) role: "" # No default (required) database: "" # No default (required) schema: "" # No default (required) table: "" # No default (required) private_key: "" # No default (optional) private_key_file: "" # No default (optional) private_key_pass: "" # No default (optional) mapping: "" # No default (optional) init_statement: "" # No default (optional) schema_evolution: enabled: "" # No default (required) ignore_nulls: true processors: [] # No default (optional) build_options: parallelism: 1 chunk_size: 50000 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) max_in_flight: 4 channel_prefix: "" # No default (optional) channel_name: "" # No default (optional) offset_token: "" # No default (optional) commit_backoff: initial_interval: 32ms max_interval: 512ms max_elapsed_time: 60s multiplier: 2 message_format: object timestamp_format: 2006-01-02T15:04:05.999999999Z07:00 ``` ## [](#conversion-of-message-data-into-snowflake-table-rows)Conversion of message data into Snowflake table rows Message data conversion to Snowflake table rows is determined by the: - Output message contents. - [Schema evolution settings](#schema_evolution). - Schema of the [target Snowflake table](#table). The following scenarios highlight how these three factors affect data written to the target table. > 📝 **NOTE** > > For reduced complexity, consider [turning on schema evolution](#schema_evolution), which automatically creates and updates the Snowflake table schema based on message contents. ### [](#scenario-data-and-table-schema-match-schema-evolution-turned-on-or-off)Scenario: Data and table schema match (schema evolution turned on or off) An output message matches the existing table schema, and the `schema_evolution.enabled` field is set to `true` or `false`. The target Snowflake table has two columns: - `product_id` (NUMBER) - `product_code` (STRING) A pipeline generates the following message: ```json {"product_id": 521, "product_code": “EST-PR”} ``` In this scenario: - The JSON keys in the message (`"product_id"` and `"product_code"`) match column names in the target Snowflake table. - The message values match the column data types. (If there was a data mismatch, the message would be rejected.) - Redpanda Connect inserts the message values into a new row in the target Snowflake table. | product_id | product_code | | --- | --- | | 521 | EST-PR | ### [](#scenario-data-and-table-schema-mismatch-schema-evolution-turned-on)Scenario: Data and table schema mismatch (schema evolution turned on) An output message includes schema updates, and the `schema_evolution.enabled` field is set to `true`. The target Snowflake table has the same two columns as the [previous scenario](#scenario-data-and-table-schema-match-schema-evolution-turned-on-or-off): - `product_id` (NUMBER) - `product_code` (STRING) This time, the pipeline generates the following message: ```json {"product_batch": 11111, "product_color": “yellow”} ``` In this scenario: - The JSON keys (`"product_batch"` and `"product_color"`) do not match column names in the target Snowflake table. - As schema evolution is enabled, Redpanda Connect adds two new columns to the target table with data types derived from the output message values. For more information about the mapping of data types, see [Supported data formats for Snowflake columns](#supported-data-formats-for-snowflake-columns). - Redpanda Connect inserts the message values into a new table row. | product_id | product_code | product_batch | product_color | | --- | --- | --- | --- | | (null) | (null) | 11111 | yellow | > 📝 **NOTE** > > You can [configure processors](#schema_evolution-processors) to override the schema updates derived from the message values. ### [](#scenario-data-and-table-schema-mismatch-schema-evolution-turned-off)Scenario: Data and table schema mismatch (schema evolution turned off) An output message includes schema updates, and the `schema_evolution.enabled` field is set to `false`. The target Snowflake table has the same two columns: - `product_id` (NUMBER) - `product_code` (STRING) The pipeline generates the same message as the [previous scenario](#scenario-data-and-table-schema-mismatch-schema-evolution-turned-on): ```json {"product_batch": 11111, "product_color": “yellow”} ``` In this scenario: - The JSON keys (`"product_batch"` and `"product_color"`) do not match any existing column names. - Because schema evolution is turned off, Redpanda Connect ignores the extra column names and values and inserts a row of null values. | product_id | product_code | | --- | --- | | (null) | (null) | ## [](#supported-data-formats-for-snowflake-columns)Supported data formats for Snowflake columns The message data from your output must match the columns in the Snowflake table that you want to write data to. The following table shows you the [column data types supported by Snowflake](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types) and how they correspond to the [Bloblang data types](../../../guides/bloblang/methods/#type) in Redpanda Connect. | Snowflake column data type | Bloblang data types | | --- | --- | | CHAR, VARCHAR | string | | BINARY | string or bytes | | NUMBER | number, or string where the string is parsed into a number | | FLOAT, including special values, such as NaN (Not a Number), -inf (negative infinity), and inf (positive infinity) | number | | BOOLEAN | bool, or number where a non-zero number is true | | TIME, DATE, TIMESTAMP | timestamp, or number where the number is a converted to a Unix timestamp, or string where the string is parsed using RFC 3339 format | | VARIANT, ARRAY, OBJECT | Any data type converted into JSON | | GEOGRAPHY,GEOMETRY | Not supported | ## [](#authentication)Authentication You can authenticate with Snowflake using an [RSA key pair](https://docs.snowflake.com/en/user-guide/key-pair-auth). Either specify: - A PEM-encoded private key, in the [`private_key` field](#private_key). - The path to a file from which the output can load the private RSA key, in the [`private_key_file` field](#private_key_file). ## [](#performance)Performance For improved performance, this output: - Sends multiple messages in parallel. You can tune the maximum number of in-flight messages (or message batches) with the field `max_in_flight`. - Sends messages as a batch. You can configure batches at both the input and output level. For more information, see [Message Batching](../../../configuration/batching/). ### [](#batch-sizes)Batch sizes Redpanda recommends that every message batch writes at least 16 MiB of compressed output to Snowflake. You can monitor batch sizes using the `snowflake_compressed_output_size_bytes` metric. ### [](#metrics)Metrics This output emits the following metrics. | Metric name | Description | | --- | --- | | snowflake_compressed_output_size_bytes | The size in bytes of each message batch uploaded to Snowflake. | | snowflake_convert_latency_ns | The time taken to convert messages into the Snowflake column data types. | | snowflake_serialize_latency_ns | The time taken to serialize the converted columnar data into a file for upload to Snowflake. | | snowflake_build_output_latency_ns | The time taken to build the file that is uploaded to Snowflake. This metric is the sum of snowflake_convert_latency_ns + snowflake_serialize_latency_ns. | | snowflake_upload_latency_ns | The time taken to upload the output file to Snowflake. | | snowflake_register_latency_ns | The time taken to register the uploaded output file with Snowflake. | | snowflake_commit_latency_ns | The time taken to commit the uploaded data updates to the target Snowflake table. | ## [](#fields)Fields ### [](#account)`account` The [Snowflake account name to use](https://docs.snowflake.com/en/user-guide/admin-account-identifier#account-name). Use the format `-` where: - The `` is the name of your Snowflake organization. - The `` is the unique name of your account with your Snowflake organization. To find the correct value for this field, run the following query in Snowflake: ```sql WITH HOSTLIST AS (SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist())))) SELECT REPLACE(VALUE:host,'.snowflakecomputing.com','') AS ACCOUNT_IDENTIFIER FROM HOSTLIST WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT_REGIONLESS'; ``` **Type**: `string` ```yaml # Examples: account: ORG-ACCOUNT ``` ### [](#batching)`batching` Lets you configure a [batching policy](../../../configuration/batching/). Type\*: `object` ```yml # Examples batching: byte_size: 5000 count: 0 period: 1s batching: count: 10 period: 1s batching: check: this.contains("END BATCH") count: 0 period: 1m ``` **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` The number of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. All resulting messages are flushed as a single batch, and therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#build_options)`build_options` Options for optimizing the build of the output data that is sent to Snowflake. Monitor the `snowflake_build_output_latency_ns` metric to assess whether you need to update these options. **Type**: `object` ### [](#build_options-chunk_size)`build_options.chunk_size` The number of table rows to submit in each chunk for processing. **Type**: `int` **Default**: `50000` ### [](#build_options-parallelism)`build_options.parallelism` The maximum amount of parallel processing to use when building the output for Snowflake. **Type**: `int` **Default**: `1` ### [](#channel_name)`channel_name` The channel name to use when connecting to a Snowflake table. Duplicate channel names cause errors and prevent multiple instances of Redpanda Connect from writing at the same time, and so this field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). Redpanda Connect assumes that a message batch contains messages for a single channel, which means that interpolation is only executed on the first message in each batch. If your pipeline uses an input that is partitioned, such as an Apache Kafka topic, batch messages at the input level to make sure all messages are processed by the same channel. You can specify either the `channel_name` or `channel_prefix`, but not both. If neither field is populated, this output creates a channel name based on a table’s fully-qualified name, which results in a single stream per table. > 📝 **NOTE** > > Snowflake limits the number of streams per table to 10,000. If you need to use more than 10,000 streams, contact [Snowflake support](https://www.snowflake.com/en/support/). **Type**: `string` ```yaml # Examples: channel_name: partition-${!@kafka_partition} ``` ### [](#channel_prefix)`channel_prefix` The prefix to use when creating a channel name for connecting to a Snowflake table. Adding a `channel_prefix` avoids the creation of duplicate channel names, which result in errors and prevent multiple instances of Redpanda Connect from writing at the same time. You can specify either the `channel_prefix` or `channel_name`, but not both. If neither field is populated, this output creates a channel name based on a table’s fully-qualified name, which results in a single stream per table. The maximum number of channels open at any time is determined by the value in the `max_in_flight` field. > 📝 **NOTE** > > Snowflake limits the number of streams per table to 10,000. If you need to use more than 10,000 streams, contact [Snowflake support](https://www.snowflake.com/en/support/). **Type**: `string` ```yaml # Examples: channel_prefix: channel-${HOST} ``` ### [](#commit_backoff)`commit_backoff` Control how frequently Snowflake is polled to check if data has been committed. **Type**: `object` ### [](#commit_backoff-initial_interval)`commit_backoff.initial_interval` The initial period to wait between status polls. **Type**: `string` **Default**: `32ms` ### [](#commit_backoff-max_elapsed_time)`commit_backoff.max_elapsed_time` The maximum total time to wait for data to be committed. If zero then no limit is used. **Type**: `string` **Default**: `60s` ### [](#commit_backoff-max_interval)`commit_backoff.max_interval` The maximum period to wait between status polls. **Type**: `string` **Default**: `512ms` ### [](#commit_backoff-multiplier)`commit_backoff.multiplier` The factor by which the poll interval grows on each attempt. **Type**: `float` **Default**: `2` ### [](#database)`database` The Snowflake database you want to write data to. **Type**: `string` ```yaml # Examples: database: MY_DATABASE ``` ### [](#init_statement)`init_statement` Optional SQL statements to execute immediately after this output connects to Snowflake for the first time. This is a useful way to initialize tables before processing data. > 📝 **NOTE** > > Make sure your SQL statements are idempotent, so they do not cause issues when run multiple times after service restarts. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS mytable (amount NUMBER); # --- init_statement: |- ALTER TABLE t1 ALTER COLUMN c1 DROP NOT NULL; ALTER TABLE t1 ADD COLUMN a2 NUMBER; ``` ### [](#mapping)`mapping` The [Bloblang `mapping`](../../../guides/bloblang/about/) to execute on each message. **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this number to improve throughput until performance plateaus. **Type**: `int` **Default**: `4` ### [](#message_format)`message_format` The format to expect incoming messages from the rest of the pipeline. **Type**: `string` **Default**: `object` | Option | Summary | | --- | --- | | array | Messages are an array of values where each position matches the ordinal of the column in Snowflake. | | object | Messages are JSON or Bloblang objects where each key is the Snowflake column name and the value is the column value. | ```yaml # Examples: message_format: array ``` ### [](#offset_token)`offset_token` The offset token to use for exactly-once delivery of data to a Snowflake table. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). This output assumes that messages within a batch are in increasing order by offset token. When data is sent on a channel, the offset token of each message in the batch is compared to the latest token processed by the channel. If the offset token is lexicographically less than the latest token, it’s assumed the message is a duplicate and is dropped. Messages must be delivered to the output in order, otherwise they are processed as duplicates and dropped. To avoid dropping retried messages if later messages have succeeded in the meantime, use a dead-letter queue to process failed messages. See the [Ingesting data exactly once from Redpanda](#example-pipelines) example. > 📝 **NOTE** > > If you’re using a numeric value as an offset token, pad the value so that it’s lexicographically ordered in its string representation because offset tokens are compared in string form. For more details, see the [Ingesting data exactly once from Redpanda](#example-pipelines) example. For more information about offset tokens, see [Snowflake Documentation](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#offset-tokens). **Type**: `string` ```yaml # Examples: offset_token: offset-${!"%016X".format(@kafka_offset)} # --- offset_token: postgres-${!@lsn} ``` ### [](#private_key)`private_key` The PEM-encoded private RSA key to use for authentication with Snowflake. You must specify a value for this field or the `private_key_file` field. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#private_key_file)`private_key_file` A `.p8`, PEM-encoded file to load the private RSA key from. You must specify a value for this field or the `private_key` field. **Type**: `string` ### [](#private_key_pass)`private_key_pass` If the RSA key is encrypted, specify the RSA key passphrase. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#role)`role` The role of the user specified in the `user` field. The user’s role must have the [required privileges](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#required-access-privileges) to call the Snowpipe Streaming APIs. For more information about user roles, see the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/admin-user-management#user-roles). **Type**: `string` ```yaml # Examples: role: ACCOUNTADMIN ``` ### [](#schema)`schema` The schema of the Snowflake database you want to write data to. **Type**: `string` ```yaml # Examples: schema: PUBLIC ``` ### [](#schema_evolution)`schema_evolution` Options to control schema updates when messages are written to the Snowflake table. **Type**: `object` ### [](#schema_evolution-enabled)`schema_evolution.enabled` Whether schema evolution is enabled. When set to `true`, the Snowflake table is automatically created based on the schema of the first message written to it, if the table does not already exist. As new fields are added to subsequent messages in the pipeline, new columns are created in the Snowflake table. Any required columns are marked as `nullable` if new messages do not include data for them. **Type**: `bool` ### [](#schema_evolution-ignore_nulls)`schema_evolution.ignore_nulls` When set to `true` and schema evolution is enabled, new columns that have `null` values _are not_ added to the Snowflake table. This behavior: - Prevents unnecessary schema changes caused by placeholder or incomplete data. - Avoids creating table columns with incorrect data types. > 📝 **NOTE** > > Redpanda does not recommend updating the default setting unless you are confident about the data type of `null` columns in advance. **Type**: `bool` **Default**: `true` ### [](#schema_evolution-processors)`schema_evolution.processors[]` A series of processors to execute when new columns are added to the Snowflake table. You can use these processors to: - Run side effects when the schema evolves. - Enrich the message with additional information to guide the schema changes. For example, a processor could read the schema from the schema registry that a message was produced with and use that schema to determine the data type of the new column in Snowflake. The input to these processors is an object with the value and name of the new column, the original message, and details of the Snowflake table the output writes to. For example: `{"value": 42.3, "name":"new_data_field", "message": {"existing_data_field": 42, "new_data_field": "db_field_name"}, "db": MY_DATABASE", "schema": "MY_SCHEMA", "table": "MY_TABLE"}` The output from the processors must be a valid message, which contains a string that specifies the column type for the new column in Snowflake. The metadata remains the same as in the original message that triggered the schema update. **Type**: `processor` ```yaml # Examples: processors: - mapping: |- root = match this.value.type() { this == "string" => "STRING" this == "bytes" => "BINARY" this == "number" => "DOUBLE" this == "bool" => "BOOLEAN" this == "timestamp" => "TIMESTAMP" _ => "VARIANT" } ``` ### [](#table)`table` The Snowflake table you want to write data to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: table: MY_TABLE ``` ### [](#timestamp_format)`timestamp_format` The format to parse string values for `TIMESTAMP`, `TIMESTAMP_LTZ` and `TIMESTAMP_NTZ` columns. Should be a layout for [time.Parse](https://pkg.go.dev/time#Parse) in Go. **Type**: `string` **Default**: `2006-01-02T15:04:05.999999999Z07:00` ### [](#url)`url` Specify a custom URL to connect to Snowflake. This parameter overrides the default URL, which is automatically generated from the value of `output.snowflake_streaming.account`. By default, the URL is constructed as follows: `[https://.snowflakecomputing.com](https://.snowflakecomputing.com)`. **Type**: `string` ```yaml # Examples: url: https://org-account.privatelink.snowflakecomputing.com ``` ### [](#user)`user` Specify a user to run the Snowpipe Stream. To learn how to create a user, see the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/admin-user-management). **Type**: `string` ## [](#example-pipelines)Example pipelines The following examples show you how to ingest, process, and write data to Snowflake from: - A PostgreSQL table using change data capture (CDC) - A Redpanda cluster - A REST API that posts JSON payloads to a HTTP server See also: [Ingest data into Snowflake cookbook](../../../cookbooks/snowflake_ingestion/) ### Write data exactly once to a Snowflake table using CDC Send data from a PostgreSQL table and write it to Snowflake exactly once using PostgreSQL logical replication. This example includes some important features: - To make sure that a Snowflake streaming channel does not assume that older data is already committed, the configuration sets a 45-second interval between message batches. This interval prevents a message batch from being sent while another batch is retried. - The log sequence number of each data update from the Write-Ahead Log (WAL) in PostgreSQL makes sure that data is only uploaded once to the `snowflake_streaming` output, and that messages sent to the output are already lexicographically ordered. > 📝 **NOTE** > > To do exactly-once data delivery, it’s important that records are delivered in order to the output, and are correctly partitioned. Before you start, read the [`offset_token`](#offset_token) field description. Alternatively, remove the `offset_token` field to use Redpanda Connect’s default at-least-once delivery model. ```yaml input: postgres_cdc: dsn: postgres://foouser:foopass@localhost:5432/foodb schema: "public" tables: ["my_pg_table"] # Use very large batches. Each batch is sent to Snowflake individually, # so to optimize query performance, use the largest file size # your memory allows batching: count: 50000 period: 45s # Set an interval between message batches to prevent multiple batches # from being in flight at once checkpoint_limit: 1 output: snowflake_streaming: # Using the log sequence number makes sure data is only updated exactly once offset_token: "${!@lsn}" # Sending a single ordered log means you can only send one update # at a time and properly increment the offset_token # and use only a single channel. max_in_flight: 1 account: "MYSNOW-ACCOUNT" user: MYUSER role: ACCOUNTADMIN database: "MYDATABASE" schema: "PUBLIC" table: "MY_PG_TABLE" private_key_file: "my/private/key.p8" ``` ### Ingest data exactly once from Redpanda Ingest data from Redpanda using consumer groups, decode the schema using the schema registry, then write the corresponding data into Snowflake. This example includes some important features: - To create multiple Redpanda Connect streams to write to each output table, you need a unique channel prefix per stream. The `channel_prefix` field constructs a unique prefix for each stream using the host name. - To prevent message failures from being retried and changing the order of delivered messages, a dead-letter queue processes them. > 📝 **NOTE** > > To do exactly-once data delivery, it’s important that records are delivered in order to the output, and are correctly partitioned. Before you start, read the [`channel_name`](#channel_name) and [`offset_token`](#offset_token) field descriptions. Alternatively, remove the `offset_token` field to use Redpanda Connect’s default at-least-once delivery model. ```yaml input: redpanda_common: topics: ["my_topic_going_to_snow"] consumer_group: "redpanda_connect_to_snowflake" # Use very large batches. Each batch is sent to Snowflake individually, # so to optimize query performance, use the largest file size # your memory allows fetch_max_bytes: 100MiB fetch_min_bytes: 50MiB partition_buffer_bytes: 100MiB pipeline: processors: - schema_registry_decode: url: "redpanda.example.com:8081" basic_auth: enabled: true username: MY_USER_NAME password: "${TODO}" output: fallback: - snowflake_streaming: # To write an ordered stream of messages, each partition in # Apache Kafka gets its own channel. channel_name: "partition-${!@kafka_partition}" # Offsets are lexicographically sorted in string form by padding with # leading zeros offset_token: offset-${!"%016X".format(@kafka_offset)} account: "MYSNOW-ACCOUNT" user: MYUSER role: ACCOUNTADMIN database: "MYDATABASE" schema: "PUBLIC" table: "MYTABLE" private_key_file: "my/private/key.p8" schema_evolution: enabled: true # To prevent delivery failures from changing the order of # delivered records, it's important that they are immediately # sent to a dead-letter queue. - retry: output: redpanda_common: topic: "dead_letter_queue" ``` ### HTTP server to push data to Snowflake Create a HTTP server input that receives HTTP PUT requests with JSON payloads. The payloads are buffered locally then written to Snowflake in batches. To create multiple Redpanda Connect streams to write to each output table, you need a unique channel prefix per stream. In this example, the `channel_prefix` field constructs a unique prefix for each stream using the host name. > 📝 **NOTE** > > Using a buffer to immediately respond to the HTTP requests may result in data loss if there are delivery failures between the output and Snowflake. For more information about the configuration of buffers, see [buffers](../../buffers/memory/). Alternatively, remove the buffer entirely to respond to the HTTP request only once the data is written to Snowflake. ```yaml input: http_server: path: /snowflake buffer: memory: # Max inflight data before applying backpressure limit: 524288000 # 50MiB # Batching policy the size of the files sent to Snowflake batch_policy: enabled: true byte_size: 33554432 # 32MiB period: "10s" output: snowflake_streaming: account: "MYSNOW-ACCOUNT" user: MYUSER role: ACCOUNTADMIN database: "MYDATABASE" schema: "PUBLIC" table: "MYTABLE" private_key_file: "my/private/key.p8" channel_prefix: "snowflake-channel-for-${HOST}" schema_evolution: enabled: true ``` --- # Page 196: splunk_hec **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/splunk_hec.md --- # splunk\_hec --- title: splunk_hec latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/splunk_hec page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/splunk_hec.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/splunk_hec.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/splunk_hec/ "View the Self-Managed version of this component") Publishes messages to a Splunk HTTP Endpoint Collector (HEC). #### Common ```yml outputs: label: "" splunk_hec: url: "" # No default (required) token: "" # No default (required) gzip: false event_host: "" # No default (optional) event_source: "" # No default (optional) event_sourcetype: "" # No default (optional) event_index: "" # No default (optional) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" splunk_hec: url: "" # No default (required) token: "" # No default (required) gzip: false event_host: "" # No default (optional) event_source: "" # No default (optional) event_sourcetype: "" # No default (optional) event_index: "" # No default (optional) tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#performance)Performance This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field `max_in_flight`. This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#event_host)`event_host` Set the host value to assign to the event data. Overrides existing host field if present. **Type**: `string` ### [](#event_index)`event_index` Set the index value to assign to the event data. Overrides existing index field if present. **Type**: `string` ### [](#event_source)`event_source` Set the source value to assign to the event data. Overrides existing source field if present. **Type**: `string` ### [](#event_sourcetype)`event_sourcetype` Set the sourcetype value to assign to the event data. Overrides existing sourcetype field if present. **Type**: `string` ### [](#gzip)`gzip` Enable gzip compression **Type**: `bool` **Default**: `false` ### [](#max_in_flight)`max_in_flight` The maximum number of messages to have in flight at a given time. Increase this to improve throughput. **Type**: `int` **Default**: `64` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#token)`token` A bot token used for authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#url)`url` Full HTTP Endpoint Collector (HEC) URL. **Type**: `string` ```yaml # Examples: url: https://foobar.splunkcloud.com/services/collector/event ``` --- # Page 197: sql_insert **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/sql_insert.md --- # sql\_insert --- title: sql_insert latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/sql_insert page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/sql_insert.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/sql_insert.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/sql_insert/)[Processor](/redpanda-cloud/develop/connect/components/processors/sql_insert/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/sql_insert/ "View the Self-Managed version of this component") Inserts a row into an SQL database for each message. #### Common ```yml outputs: label: "" sql_insert: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) args_mapping: "" # No default (required) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" sql_insert: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) args_mapping: "" # No default (required) prefix: "" # No default (optional) suffix: "" # No default (optional) options: [] # No default (optional) max_in_flight: 64 init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` ## [](#examples)Examples ### [](#table-insert-mysql)Table Insert (MySQL) Here we insert rows into a database by populating the columns id, name and topic with values extracted from messages and metadata: ```yaml output: sql_insert: driver: mysql dsn: foouser:foopassword@tcp(localhost:3306)/foodb table: footable columns: [ id, name, topic ] args_mapping: | root = [ this.user.id, this.user.name, meta("kafka_topic"), ] ``` ## [](#dynamic-sql-operations)Dynamic SQL operations The `table` and `columns` fields are static strings that do not support Bloblang interpolation. For dynamic table names, dynamic column lists, DELETE operations, or any other SQL that `sql_insert` cannot express, use the [`sql_raw` output](../sql_raw/) instead. There is no dedicated `sql_delete` output. To delete rows, use `sql_raw` with a DELETE statement: ```yaml output: sql_raw: driver: postgres dsn: postgres://user:pass@localhost:5432/mydb?sslmode=disable query: "DELETE FROM my_table WHERE id = $1" args_mapping: root = [ this.id ] ``` To insert into a table determined at runtime, use `sql_raw` with `unsafe_dynamic_query: true`, which enables Bloblang interpolation in the `query` field. > ⚠️ **CAUTION** > > Interpolating user-supplied values into a query can introduce SQL injection risks. Always validate or sanitize the interpolated value beforehand. ```yaml output: sql_raw: driver: postgres dsn: postgres://user:pass@localhost:5432/mydb?sslmode=disable unsafe_dynamic_query: true query: 'INSERT INTO ${! this.table_name } (id, value) VALUES ($1, $2)' args_mapping: root = [ this.id, this.value ] ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of columns specified. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#columns)`columns[]` A list of columns to insert. **Type**: `array` ```yaml # Examples: columns: - foo - bar - baz ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#max_in_flight)`max_in_flight` The maximum number of inserts to run in parallel. **Type**: `int` **Default**: `64` ### [](#options)`options[]` A list of keyword options to add before the INTO clause of the query. **Type**: `array` ```yaml # Examples: options: - DELAYED - IGNORE ``` ### [](#prefix)`prefix` An optional prefix to prepend to the insert query (before INSERT). **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the insert query. **Type**: `string` ```yaml # Examples: suffix: ON CONFLICT (name) DO NOTHING ``` ### [](#table)`table` The table to insert to. **Type**: `string` ```yaml # Examples: table: foo ``` --- # Page 198: sql_raw **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/sql_raw.md --- # sql\_raw --- title: sql_raw latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/sql_raw page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/sql_raw.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/sql_raw.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/sql_raw/)[Input](/redpanda-cloud/develop/connect/components/inputs/sql_raw/)[Processor](/redpanda-cloud/develop/connect/components/processors/sql_raw/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/sql_raw/ "View the Self-Managed version of this component") Executes an arbitrary SQL query for each message. #### Common ```yml outputs: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (optional) args_mapping: "" # No default (optional) queries: [] # No default (optional) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` #### Advanced ```yml outputs: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (optional) unsafe_dynamic_query: false args_mapping: "" # No default (optional) queries: [] # No default (optional) max_in_flight: 64 init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` For some scenarios where you might use this output, see [Examples](#examples). ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) that includes the same number of values in an array as the placeholder arguments in the [`query`](#query) field. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#batching)`batching` Allows you to configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yaml # Examples: batching: byte_size: 5000 count: 0 period: 1s # --- batching: count: 10 period: 1s # --- batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-byte_size)`batching.byte_size` An amount of bytes at which the batch should be flushed. If `0` disables size based batching. **Type**: `int` **Default**: `0` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "end_of_transaction" ``` ### [](#batching-count)`batching.count` A number of messages at which the batch should be flushed. If `0` disables count based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` A period in which an incomplete batch should be flushed regardless of its size. **Type**: `string` **Default**: `""` ```yaml # Examples: period: 1s # --- period: 1m # --- period: 500ms ``` ### [](#batching-processors)`batching.processors[]` A list of [processors](../../processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. **Type**: `processor` ```yaml # Examples: processors: - archive: format: concatenate # --- processors: - archive: format: lines # --- processors: - archive: format: json_array ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#max_in_flight)`max_in_flight` The maximum number of database statements to execute in parallel. **Type**: `int` **Default**: `64` ### [](#queries)`queries[]` A list of database statements to run in addition to your main [`query`](#query). If you specify multiple queries, they are executed within a single transaction. For more information, see [Examples](#examples). **Type**: `object` ### [](#queries-args_mapping)`queries[].args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `query`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#queries-query)`queries[].query` The query to execute. The style of placeholder to use depends on the driver, some drivers require question marks (`?`) whereas others expect incrementing dollar signs (`$1`, `$2`, and so on) or colons (`:1`, `:2` and so on). The style to use is outlined in this table: | Driver | Placeholder Style | |---|---| | `clickhouse` | Dollar sign | | `mysql` | Question mark | | `postgres` | Dollar sign | | `pgx` | Dollar sign | | `mssql` | Question mark | | `sqlite` | Question mark | | `oracle` | Colon | | `snowflake` | Question mark | | `trino` | Question mark | | `gocosmos` | Colon | **Type**: `string` ### [](#queries-when)`queries[].when` An optional [Bloblang mapping](../../../guides/bloblang/about/) that, when set, is evaluated for each message to determine whether to execute this query. The mapping should return a boolean value. The first query in the list whose `when` condition evaluates to `true` (or that has no `when` condition) is executed. This enables conditional query routing based on message content or metadata without requiring `unsafe_dynamic_query`. **Type**: `string` ```yaml # Examples: when: root = meta("kafka_tombstone_message") == "true" # --- when: root = this.operation == "delete" ``` ### [](#query)`query` The query to execute. You must include the correct placeholders for the specified database driver. Some drivers use question marks (`?`), whereas others expect incrementing dollar signs (`$1`, `$2`, and so on) or colons (`:1`, `:2`, and so on). | Driver | Placeholder Style | | --- | --- | | clickhouse | Dollar sign ($) | | gocosmos | Colon (:) | | mysql | Question mark (?) | | mssql | Question mark (?) | | oracle | Colon (:) | | postgres | Dollar sign ($) | | snowflake | Question mark (?) | | spanner | Question mark (?) | | sqlite | Question mark (?) | | trino | Question mark (?) | **Type**: `string` ```yaml # Examples: query: INSERT INTO footable (foo, bar, baz) VALUES (?, ?, ?); ``` ### [](#unsafe_dynamic_query)`unsafe_dynamic_query` Whether to enable [interpolation functions](../../../configuration/interpolation/#bloblang-queries) in the query. Great care should be made to ensure your queries are defended against injection attacks. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#table-insert-mysql)Table Insert (MySQL) Here we insert rows into a database by populating the columns id, name and topic with values extracted from messages and metadata: ```yaml output: sql_raw: driver: mysql dsn: foouser:foopassword@tcp(localhost:3306)/foodb query: "INSERT INTO footable (id, name, topic) VALUES (?, ?, ?);" args_mapping: | root = [ this.user.id, this.user.name, meta("kafka_topic"), ] ``` ### [](#dynamically-creating-tables-postgresql)Dynamically Creating Tables (PostgreSQL) Here we dynamically create output tables transactionally with inserting a record into the newly created table. ```yaml output: processors: - mapping: | root = this # Prevent SQL injection when using unsafe_dynamic_query meta table_name = "\"" + metadata("table_name").replace_all("\"", "\"\"") + "\"" sql_raw: driver: postgres dsn: postgres://localhost/postgres unsafe_dynamic_query: true queries: - query: | CREATE TABLE IF NOT EXISTS ${!metadata("table_name")} (id varchar primary key, document jsonb); - query: | INSERT INTO ${!metadata("table_name")} (id, document) VALUES ($1, $2) ON CONFLICT (id) DO UPDATE SET document = EXCLUDED.document; args_mapping: | root = [ this.id, this.document.string() ] ``` ### [](#conditional-cdc-queries-postgresql)Conditional CDC Queries (PostgreSQL) Route messages to different SQL operations based on message metadata. Tombstone messages trigger a DELETE, while all other messages perform an upsert. All operations within a batch execute in a single transaction, ordered by Kafka partition. ```yaml output: sql_raw: driver: postgres dsn: postgres://localhost/postgres max_in_flight: 8 batching: count: 100 period: 100ms queries: - when: 'root = meta("kafka_tombstone_message") == "true"' query: 'DELETE FROM users WHERE id = $1' args_mapping: 'root = [this.id]' - query: | INSERT INTO users (id, name, updated_at) VALUES ($1, $2, $3) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, updated_at = EXCLUDED.updated_at args_mapping: 'root = [this.id, this.name, this.updated_at]' ``` --- # Page 199: switch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/switch.md --- # switch --- title: switch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/switch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/switch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/switch.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/switch/)[Processor](/redpanda-cloud/develop/connect/components/processors/switch/)[Scanner](/redpanda-cloud/develop/connect/components/scanners/switch/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/switch/ "View the Self-Managed version of this component") The switch output type allows you to route messages to different outputs based on their contents. #### Common ```yml outputs: label: "" switch: retry_until_success: false cases: [] # No default (required) ``` #### Advanced ```yml outputs: label: "" switch: retry_until_success: false strict_mode: false cases: [] # No default (required) ``` Messages that do not pass the check of a single output case are effectively dropped. In order to prevent this outcome set the field [`strict_mode`](#strict_mode) to `true`, in which case messages that do not pass at least one case are considered failed and will be nacked and/or reprocessed depending on your input. ## [](#examples)Examples ### [](#basic-multiplexing)Basic Multiplexing The most common use for a switch output is to multiplex messages across a range of output destinations. The following config checks the contents of the field `type` of messages and sends `foo` type messages to an `amqp_1` output, `bar` type messages to a `gcp_pubsub` output, and everything else to a `redis_streams` output. Outputs can have their own processors associated with them, and in this example the `redis_streams` output has a processor that enforces the presence of a type field before sending it. ```yaml output: switch: cases: - check: this.type == "foo" output: amqp_1: urls: [ amqps://guest:guest@localhost:5672/ ] target_address: queue:/the_foos - check: this.type == "bar" output: gcp_pubsub: project: dealing_with_mike topic: mikes_bars - output: redis_streams: url: tcp://localhost:6379 stream: everything_else processors: - mapping: | root = this root.type = this.type | "unknown" ``` ### [](#control-flow)Control Flow The `continue` field allows messages that have passed a case to be tested against the next one also. This can be useful when combining non-mutually-exclusive case checks. In the following example a message that passes both the check of the first case as well as the second will be routed to both. ```yaml output: switch: cases: - check: 'this.user.interests.contains("walks").catch(false)' output: amqp_1: urls: [ amqps://guest:guest@localhost:5672/ ] target_address: queue:/people_what_think_good continue: true - check: 'this.user.dislikes.contains("videogames").catch(false)' output: gcp_pubsub: project: people topic: that_i_dont_want_to_hang_with ``` ## [](#fields)Fields ### [](#cases)`cases[]` A list of switch cases, outlining outputs that can be routed to. **Type**: `object` ```yaml # Examples: cases: - check: this.urls.contains("http://benthos.dev") continue: true output: cache: key: ${!json("id")} target: foo - output: s3: bucket: bar path: ${!json("id")} ``` ### [](#cases-check)`cases[].check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should be routed to the case output. If left empty the case always passes. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "foo" # --- check: this.contents.urls.contains("https://benthos.dev/") ``` ### [](#cases-continue)`cases[].continue` Indicates whether, if this case passes for a message, the next case should also be tested. **Type**: `bool` **Default**: `false` ### [](#cases-output)`cases[].output` An [output](../about/) for messages that pass the check to be routed to. **Type**: `output` ### [](#retry_until_success)`retry_until_success` If a selected output fails to send a message this field determines whether it is reattempted indefinitely. If set to false the error is instead propagated back to the input level. If a message can be routed to >1 outputs it is usually best to set this to true in order to avoid duplicate messages being routed to an output. **Type**: `bool` **Default**: `false` ### [](#strict_mode)`strict_mode` This field determines whether an error should be reported if no condition is met. If set to true, an error is propagated back to the input level. The default behavior is false, which will drop the message. **Type**: `bool` **Default**: `false` --- # Page 200: sync_response **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/sync_response.md --- # sync\_response --- title: sync_response latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/sync_response page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/sync_response.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/sync_response.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/sync_response/)[Processor](/redpanda-cloud/develop/connect/components/processors/sync_response/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/sync_response/ "View the Self-Managed version of this component") Returns the final message payload back to the input origin of the message, where it is dealt with according to that specific input type. ```yml # Config fields, showing default values output: label: "" sync_response: {} ``` --- # Page 201: timeplus **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/outputs/timeplus.md --- # timeplus --- title: timeplus page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/outputs/timeplus page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/outputs/timeplus.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/outputs/timeplus.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-11-05" page-git-modified-date: "2024-11-19" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Type:** Output ▼ [Output](/redpanda-cloud/develop/connect/components/outputs/timeplus/)[Input](/redpanda-cloud/develop/connect/components/inputs/timeplus/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/outputs/timeplus/ "View the Self-Managed version of this component") Sends messages to a data stream on [Timeplus Enterprise (Cloud or Self-Hosted)](https://docs.timeplus.com/) using the [Ingest API](https://docs.timeplus.com/ingest-api), or directly to the `timeplusd` component in Timeplus Enterprise. #### Common ```yml # Common configuration fields, showing default values output: label: "" timeplus: target: timeplus url: https://us-west-2.timeplus.cloud workspace: "" # No default (optional) stream: "" # No default (required) apikey: "" # No default (optional) username: "" # No default (optional) password: "" # No default (optional) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" ``` #### Advanced ```yml # All configuration fields, showing default values output: label: "" timeplus: target: timeplus url: https://us-west-2.timeplus.cloud workspace: "" # No default (optional) stream: "" # No default (required) apikey: "" # No default (optional) username: "" # No default (optional) password: "" # No default (optional) max_in_flight: 64 batching: count: 0 byte_size: 0 period: "" check: "" processors: [] # No default (optional) ``` This output only accepts structured messages. All messages must: - Contain the same keys. - Use a structure that matches the schema of the destination data stream. If your upstream data source or pipeline returns unstructured messages, such as strings, you can configure an output processor to transform the messages. See the [Unstructured messages](#unstructured-messages) section for examples. ## [](#examples)Examples #### Timeplus Enterprise (Cloud) You must [generate an API key](https://docs.timeplus.com/apikey) using the web console of Timeplus Enterprise (Cloud). ```yaml output: timeplus: workspace: stream: apikey: ``` Replace the following placeholders with your own values: - ``: The ID of the workspace you want to send messages to. - ``: The name of the destination data stream. - ``: The API key for the Ingest API. #### Timeplus Enterprise (Self-Hosted) You must specify the username, password, and URL of the application server. ```yaml output: timeplus: url: http://localhost:8000 workspace: stream: username: password: ``` Replace the following placeholders with your own values: - ``: The ID of the workspace you want to send messages to. - ``: The name of the destination data stream. - ``: The username for the Timeplus application server. - ``: The password for the Timeplus application server. #### timeplusd You must specify the HTTP port for `timeplusd`. ```yaml output: timeplus: url: http://localhost:3218 stream: username: password: ``` Replace the following placeholders with your own values: - ``: The name of the destination data stream. - ``: The username for the Timeplus application server. - ``: The password for the Timeplus application server. ### [](#unstructured-messages)Unstructured messages If your upstream data source or pipeline returns unstructured messages, such as strings, you can configure an output processor to transform them into structured messages and then pass them to the output. In the following example, the `mapping` processor creates a field called `raw`, and uses the functions `content().string()` to store the original string content into it, thereby creating structured messages. If you use this example, you must also add the `raw` field name to the destination data stream, so that your message structure matches the schema of your destination data stream. ```yaml output: timeplus: workspace: stream: apikey: processors: - mapping: | root = {} root.raw = content().string() ``` ## [](#fields)Fields ### [](#target)`target` The destination platform. For Timeplus Enterprise (Cloud or Self-Hosted), enter `timeplus`, or `timeplusd` for the `timeplusd` component. **Type**: `string` **Default**: `timeplus` **Options**: `timeplus`, `timeplusd` ### [](#url)`url` The URL of your Timeplus instance, which should always include the schema and host. **Type**: `string` **Default**: `[https://us-west-2.timeplus.cloud](https://us-west-2.timeplus.cloud)` ```yml # Examples url: http://localhost:8000 url: http://127.0.0.1:3218 ``` ### [](#workspace)`workspace` The ID of the workspace you want to send messages to. This field is required if the `target` field is set to `timeplus`. **Type**: `string` ### [](#stream)`stream` The name of the destination data stream. Make sure the schema of the data stream matches this output. **Type**: `string` ### [](#apikey)`apikey` The API key for the Ingest API. You need to generate this in the web console of Timeplus Enterprise (Cloud). This field is required if you are sending messages to Timeplus Enterprise (Cloud). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#username)`username` The username for the Timeplus application server. This field is required if you are sending messages to Timeplus Enterprise (Self-Hosted) or `timeplusd`. **Type**: `string` ### [](#password)`password` The password for the Timeplus application server. This field is required if you are sending messages to Timeplus Enterprise (Self-Hosted) or `timeplusd`. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#max_in_flight)`max_in_flight` The maximum number of message batches to have in flight at a given time. Increase this number to improve throughput. **Type**: `int` **Default**: `64` ### [](#batching)`batching` Configure a [batching policy](../../../configuration/batching/). **Type**: `object` ```yml # Examples batching: byte_size: 5000 count: 0 period: 1s batching: count: 10 period: 1s batching: check: this.contains("END BATCH") count: 0 period: 1m ``` ### [](#batching-count)`batching.count` The number of messages after which the batch is flushed. Set to `0` to disable count-based batching. **Type**: `int` **Default**: `0` ### [](#batching-byte_size)`batching.byte_size` The amount of bytes at which the batch is flushed. Set to `0` to disable size-based batching. **Type**: `int` **Default**: `0` ### [](#batching-period)`batching.period` The period of time after which an incomplete batch is flushed regardless of its size. **Type**: `string` **Default**: `""` ```yml # Examples period: 1s period: 1m period: 500ms ``` ### [](#batching-check)`batching.check` A [Bloblang query](../../../guides/bloblang/about/) that returns a boolean value indicating whether a message should end a batch. **Type**: `string` **Default**: `""` ```yml # Examples check: this.type == "end_of_transaction" ``` ### [](#batching-processors)`batching.processors` For aggregating and archiving message batches, you can add a list of [processors](../../processors/about/) to apply to a batch as it is flushed. All resulting messages are flushed as a single batch even when you configure processors to split the batch into smaller batches. **Type**: `array` ```yml # Examples processors: - archive: format: concatenate processors: - archive: format: lines processors: - archive: format: json_array ``` --- # Page 202: a2a_message **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/a2a_message.md --- # a2a\_message --- title: a2a_message latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/a2a_message page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/a2a_message.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/a2a_message.adoc categories: "[AI]" description: Sends messages to an A2A (Agent-to-Agent) protocol agent and returns the response. page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-18" --- **Available in:** Cloud Sends messages to an A2A (Agent-to-Agent) protocol agent and returns the response. This processor enables Redpanda Connect pipelines to communicate with A2A protocol agents. Currently only JSON-RPC transport is supported. The processor sends a message to the agent and polls for task completion. The agent’s response is returned as the processor output. For more information about the A2A protocol, see [https://a2a-protocol.org/latest/specification](https://a2a-protocol.org/latest/specification) #### Common ```yml processors: label: "" a2a_message: agent_card_url: "" # No default (required) prompt: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" a2a_message: agent_card_url: "" # No default (required) prompt: "" # No default (optional) final_message_only: true ``` ## [](#fields)Fields ### [](#agent_card_url)`agent_card_url` URL for the A2A agent card. Can be either a base URL (e.g., `[https://example.com](https://example.com)`) or a full path to the agent card (e.g., `[https://example.com/.well-known/agent.json](https://example.com/.well-known/agent.json)`). If no path is provided, defaults to `/.well-known/agent.json`. Authentication uses OAuth2 from environment variables. **Type**: `string` ### [](#final_message_only)`final_message_only` If true, returns only the text from the final agent message (concatenated from all text parts). If false, returns the complete Message or Task object as structured data with full history, artifacts, and metadata. Example with final\_message\_only: true (default): ```none Here is the answer to your question... ``` Example with final\_message\_only: false: ```json { "id": "task-123", "contextId": "ctx-456", "status": { "state": "completed" }, "history": [ {"role": "user", "parts": [{"text": "Your question"}]}, {"role": "agent", "parts": [{"text": "Here is the answer to your question..."}]} ], "artifacts": [] } ``` **Type**: `bool` **Default**: `true` ### [](#prompt)`prompt` The user prompt to send to the agent. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 203: Processors **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/about.md --- # Processors --- title: Processors latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect processors are functions applied to messages passing through a pipeline. The function signature allows a processor to mutate or drop messages depending on the content of the message. There are many types on offer but the most powerful are the [`mapping`](../mapping/) and [`mutation`](../mutation/) processors. Processors are set via config, and depending on where in the config they are placed they will be run either immediately after a specific input (set in the input section), on all messages (set in the pipeline section) or before a specific output (set in the output section). Most processors apply to all messages and can be placed in the pipeline section: ```yaml pipeline: threads: 1 processors: - label: my_cool_mapping mapping: | root.message = this root.meta.link_count = this.links.length() ``` The `threads` field in the pipeline section determines how many parallel processing threads are created. You can read more about parallel processing in the [pipeline guide](../../../configuration/processing_pipelines/). ## [](#labels)Labels Processors have an optional field `label` that can uniquely identify them in observability data such as metrics and logs. This can be useful when running configs with multiple nested processors, otherwise their metrics labels will be generated based on their composition. For more information check out the [metrics documentation](../../metrics/about/). ## [](#error-handling)Error handling Some processors have conditions whereby they might fail. Rather than throw these messages into the abyss Redpanda Connect still attempts to send these messages onwards, and has mechanisms for filtering, recovering or dead-letter queuing messages that have failed which can be read about [here](../../../configuration/error_handling/). ### [](#error-logs)Error logs Errors that occur during processing can be roughly separated into two groups; those that are unexpected intermittent errors such as connectivity problems, and those that are logical errors such as bad input data or unmatched schemas. All processing errors result in the messages being flagged as failed, [error metrics](../../metrics/about/) increasing for the given errored processor, and debug level logs being emitted that describe the error. Only errors that are known to be intermittent are also logged at the error level. The reason for this behavior is to prevent noisy logging in cases where logical errors are expected and will likely be [handled in config](../../../configuration/error_handling/). However, this can also sometimes make it easy to miss logical errors in your configs when they lack error handling. If you suspect you are experiencing processing errors and do not wish to add error handling yet then a quick and easy way to expose those errors is to enable debug level logs with the cli flag `--log.level=debug` or by setting the level in config: ```yaml logger: level: DEBUG ``` ## [](#using-processors-as-outputs)Using processors as outputs It might be the case that a processor that results in a side effect, such as the [`sql_insert`](../sql_insert/) or [`redis`](../redis/) processors, is the only side effect of a pipeline, and therefore could be considered the output. In such cases it’s possible to place these processors within a [`reject` output](../../outputs/reject/) so that they behave the same as regular outputs, where success results in dropping the message with an acknowledgement and failure results in a nack (or retry): ```yaml output: reject: 'failed to send data: ${! error() }' processors: - try: - redis: url: tcp://localhost:6379 command: sadd args_mapping: 'root = [ this.key, this.value ]' - mapping: root = deleted() ``` The way this works is that if your processor with the side effect (`redis` in this case) succeeds then the final `mapping` processor deletes the message which results in an acknowledgement. If the processor fails then the `try` block exits early without executing the `mapping` processor and instead the message is routed to the `reject` output, which nacks the message with an error message containing the error obtained from the `redis` processor. ## [](#batching-and-multiple-part-messages)Batching and multiple-part messages All Redpanda Connect processors support multiple-part messages, which are synonymous with batches. This enables [windowed processing](../../../configuration/windowed_processing/) capabilities. Many processors are able to perform their behaviors on specific parts of a message batch, or on all parts, and have a field `parts` for specifying an array of part indexes they should apply to. If the list of target parts is empty these processors will be applied to all message parts. Part indexes can be negative, and if so the part will be selected from the end counting backwards starting from -1. E.g. if part = -1 then the selected part will be the last part of the message, if part = -2 then the part before the last element will be selected, and so on. Some processors such as [`dedupe`](../dedupe/) act across an entire batch, when instead we might like to perform them on individual messages of a batch. In this case the [`for_each`](../for_each/) processor can be used. You can read more about batching [in this document](../../../configuration/batching/). --- # Page 204: archive **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/archive.md --- # archive --- title: archive latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/archive page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/archive.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/archive.adoc categories: "[\"Parsing\",\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/archive/ "View the Self-Managed version of this component") Archives all the messages of a batch into a single message according to the selected archive format. ```yml # Config fields, showing default values label: "" archive: format: "" # No default (required) path: "" ``` Some archive formats (such as tar, zip) treat each archive item (message part) as a file with a path. Since message parts only contain raw data a unique path must be generated for each part. This can be done by using function interpolations on the 'path' field as described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries). For types that aren’t file based (such as binary) the file field is ignored. The resulting archived message adopts the metadata of the _first_ message part of the batch. The functionality of this processor depends on being applied across messages that are batched. You can find out more about batching [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#format)`format` The archiving format to apply. **Type**: `string` | Option | Summary | | --- | --- | | binary | Archive messages to a binary blob format. | | concatenate | Join the raw contents of each message into a single binary message. | | json_array | Attempt to parse each message as a JSON document and append the result to an array, which becomes the contents of the resulting message. | | lines | Join the raw contents of each message and insert a line break between each one. | | tar | Archive messages to a unix standard tape archive. | | zip | Archive messages to a zip file. | ### [](#path)`path` The path to set for each message in the archive (when applicable). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ```yaml # Examples: path: ${!count("files")}-${!timestamp_unix_nano()}.txt # --- path: ${!meta("kafka_key")}-${!json("id")}.json ``` ## [](#examples)Examples ### [](#tar-archive)Tar Archive If we had JSON messages in a batch each of the form: ```json {"doc":{"id":"foo","body":"hello world 1"}} ``` And we wished to tar archive them, setting their filenames to their respective unique IDs (with the extension `.json`), our config might look like this: ```yaml pipeline: processors: - archive: format: tar path: ${!json("doc.id")}.json ``` --- # Page 205: avro **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/avro.md --- # avro --- title: avro latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/avro page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/avro.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/avro.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/avro/)[Scanner](/redpanda-cloud/develop/connect/components/scanners/avro/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/avro/ "View the Self-Managed version of this component") Performs Avro based operations on messages based on a schema. ```yml # Config fields, showing default values label: "" avro: operator: "" # No default (required) encoding: textual schema: "" schema_path: "" ``` > ⚠️ **WARNING** > > If you are consuming or generating messages using a schema registry service then it is likely this processor will fail as those services require messages to be prefixed with the identifier of the schema version being used. Instead, try the [`schema_registry_encode`](../schema_registry_encode/) and [`schema_registry_decode`](../schema_registry_decode/) processors. ## [](#operators)Operators ### [](#to_json)`to_json` Converts Avro documents into a JSON structure. This makes it easier to manipulate the contents of the document within Benthos. The encoding field specifies how the source documents are encoded. ### [](#from_json)`from_json` Attempts to convert JSON documents into Avro documents according to the specified encoding. ## [](#fields)Fields ### [](#encoding)`encoding` An Avro encoding format to use for conversions to and from a schema. **Type**: `string` **Default**: `textual` **Options**: `textual`, `binary`, `single` ### [](#operator)`operator` The [operator](#operators) to execute **Type**: `string` **Options**: `to_json`, `from_json` ### [](#schema)`schema` A full Avro schema to use. **Type**: `string` **Default**: `""` ### [](#schema_path)`schema_path` The path of a schema document to apply. Use either this or the `schema` field. URLs must begin with `file://` or `http://`. Note that `file://` URLs must use absolute paths (e.g. `[file:///absolute/path/to/spec.avsc](file:///absolute/path/to/spec.avsc)`); relative paths are not supported. **Type**: `string` **Default**: `""` ```yaml # Examples: schema_path: file:///path/to/spec.avsc # --- schema_path: http://localhost:8081/path/to/spec/versions/1 ``` --- # Page 206: aws_bedrock_chat **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/aws_bedrock_chat.md --- # aws\_bedrock\_chat --- title: aws_bedrock_chat latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/aws_bedrock_chat page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/aws_bedrock_chat.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/aws_bedrock_chat.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/aws_bedrock_chat/ "View the Self-Managed version of this component") Generates responses to messages in a chat conversation, using the [AWS Bedrock API](https://aws.amazon.com/bedrock/). #### Common ```yml processors: label: "" aws_bedrock_chat: model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" aws_bedrock_chat: region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) stop: [] # No default (optional) top_p: "" # No default (optional) ``` This processor sends prompts to your chosen large language model (LLM) and generates text from the responses, using the AWS Bedrock API. For more information, see the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide). ## [](#fields)Fields ### [](#credentials)`credentials` Configure which AWS credentials to use (optional). For more information, see [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` The profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` The role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` The external ID to use when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials you want to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials you want to use. You must enter this value when using short-term credentials. **Type**: `string` ### [](#endpoint)`endpoint` A custom endpoint URL for AWS API requests. Use this to connect to AWS-compatible services or local testing environments instead of the standard AWS endpoints. **Type**: `string` ### [](#max_tokens)`max_tokens` The maximum number of tokens to allow in the generated response. **Type**: `int` ### [](#model)`model` The model ID to use. For a full list, see the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). **Type**: `string` ```yaml # Examples: model: amazon.titan-text-express-v1 # --- model: anthropic.claude-3-5-sonnet-20240620-v1:0 # --- model: cohere.command-text-v14 # --- model: meta.llama3-1-70b-instruct-v1:0 # --- model: mistral.mistral-large-2402-v1:0 ``` ### [](#prompt)`prompt` The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. **Type**: `string` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#stop)`stop[]` A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response. **Type**: `array` ### [](#system_prompt)`system_prompt` The system prompt to submit to the AWS Bedrock LLM. **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#temperature)`temperature` The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options. A higher value makes the model more likely to choose lower-probability options. **Type**: `float` ### [](#top_p)`top_p` The percentage of most-likely candidates that the model considers for the next token. For example, if you choose a value of `0.8`, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence. **Type**: `float` --- # Page 207: aws_bedrock_embeddings **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/aws_bedrock_embeddings.md --- # aws\_bedrock\_embeddings --- title: aws_bedrock_embeddings page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/aws_bedrock_embeddings page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/aws_bedrock_embeddings.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/aws_bedrock_embeddings.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-10-16" page-git-modified-date: "2024-10-16" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/aws_bedrock_embeddings/ "View the Self-Managed version of this component") Generates vector embeddings from text prompts, using the [AWS Bedrock API](https://aws.amazon.com/bedrock/). #### Common ```yaml # Common config fields, showing default values label: "" aws_bedrock_embeddings: model: amazon.titan-embed-text-v1 # No default (required) text: "" # No default (optional) ``` #### Advanced ```yaml # All config fields, showing default values label: "" aws_bedrock_embeddings: region: "" endpoint: "" credentials: from_ec2_role: false role: "" role_external_id: "" model: amazon.titan-embed-text-v1 # No default (required) text: "" # No default (optional) ``` This processor sends text prompts to your chosen large language model (LLM), which generates vector embeddings for them using the AWS Bedrock API. For more information, see the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide). ## [](#fields)Fields ### [](#credentials)`credentials` Manually configure the AWS credentials to use (optional). For more information, see the [Amazon Web Services guide](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of the AWS credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` The profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` The role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to use when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the AWS credentials in use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the AWS credentials in use. This is a required value for short-term credentials. **Type**: `string` ### [](#endpoint)`endpoint` A custom endpoint URL for AWS API requests. Use this to connect to AWS-compatible services or local testing environments instead of the standard AWS endpoints. **Type**: `string` ### [](#model)`model` The ID of the LLM that you want to use to generate vector embeddings. For a full list, see the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). **Type**: `string` ```yaml # Examples: model: amazon.titan-embed-text-v1 # --- model: amazon.titan-embed-text-v2:0 # --- model: cohere.embed-english-v3 # --- model: cohere.embed-multilingual-v3 ``` ### [](#region)`region` The region in which your AWS resources are hosted. **Type**: `string` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#text)`text` The prompt you want to generate a vector embedding for. The processor submits the entire payload as a string. **Type**: `string` --- # Page 208: aws_dynamodb_partiql **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/aws_dynamodb_partiql.md --- # aws\_dynamodb\_partiql --- title: aws_dynamodb_partiql latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/aws_dynamodb_partiql page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/aws_dynamodb_partiql.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/aws_dynamodb_partiql.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/aws_dynamodb_partiql/ "View the Self-Managed version of this component") Executes a PartiQL expression against a DynamoDB table for each message. #### Common ```yml processors: label: "" aws_dynamodb_partiql: query: "" # No default (required) args_mapping: "" ``` #### Advanced ```yml processors: label: "" aws_dynamodb_partiql: query: "" # No default (required) unsafe_dynamic_query: false args_mapping: "" region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) ``` Both writes or reads are supported, when the query is a read the contents of the message will be replaced with the result. This processor is more efficient when messages are pre-batched as the whole batch will be executed in a single call. ## [](#examples)Examples ### [](#insert)Insert The following example inserts rows into the table footable with the columns foo, bar and baz populated with values extracted from messages: ```yaml pipeline: processors: - aws_dynamodb_partiql: query: "INSERT INTO footable VALUE {'foo':'?','bar':'?','baz':'?'}" args_mapping: | root = [ { "S": this.foo }, { "S": meta("kafka_topic") }, { "S": this.document.content }, ] ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) that, for each message, creates a list of arguments to use with the query. **Type**: `string` **Default**: `""` ### [](#credentials)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#query)`query` A PartiQL query to execute for each message. **Type**: `string` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#unsafe_dynamic_query)`unsafe_dynamic_query` Whether to enable dynamic queries that support interpolation functions. **Type**: `bool` **Default**: `false` --- # Page 209: aws_lambda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/aws_lambda.md --- # aws\_lambda --- title: aws_lambda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/aws_lambda page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/aws_lambda.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/aws_lambda.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/aws_lambda/ "View the Self-Managed version of this component") Invokes an AWS lambda for each message. The contents of the message is the payload of the request, and the result of the invocation will become the new contents of the message. #### Common ```yml processors: label: "" aws_lambda: parallel: false function: "" # No default (required) ``` #### Advanced ```yml processors: label: "" aws_lambda: parallel: false function: "" # No default (required) rate_limit: "" region: "" # No default (optional) endpoint: "" # No default (optional) tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s credentials: profile: "" # No default (optional) id: "" # No default (optional) secret: "" # No default (optional) token: "" # No default (optional) from_ec2_role: "" # No default (optional) role: "" # No default (optional) role_external_id: "" # No default (optional) timeout: 5s retries: 3 ``` The `rate_limit` field can be used to specify a rate limit [resource](../../rate_limits/about/) to cap the rate of requests across parallel components service wide. In order to map or encode the payload to a specific request body, and map the response back into the original payload instead of replacing it entirely, you can use the [`branch` processor](../branch/). ## [](#error-handling)Error handling When Redpanda Connect is unable to connect to the AWS endpoint or is otherwise unable to invoke the target lambda function it will retry the request according to the configured number of retries. Once these attempts have been exhausted the failed message will continue through the pipeline with it’s contents unchanged, but flagged as having failed, allowing you to use [standard processor error handling patterns](../../../configuration/error_handling/). However, if the invocation of the function is successful but the function itself throws an error, then the message will have it’s contents updated with a JSON payload describing the reason for the failure, and a metadata field `lambda_function_error` will be added to the message allowing you to detect and handle function errors with a [`branch`](../branch/): ```yaml pipeline: processors: - branch: processors: - aws_lambda: function: foo result_map: | root = if meta().exists("lambda_function_error") { throw("Invocation failed due to %v: %v".format(this.errorType, this.errorMessage)) } else { this } output: switch: retry_until_success: false cases: - check: errored() output: reject: ${! error() } - output: resource: somewhere_else ``` ## [](#credentials)Credentials By default Redpanda Connect will use a shared credentials file when connecting to AWS services. It’s also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in [Amazon Web Services](../../../guides/cloud/aws/). ## [](#examples)Examples ### [](#branched-invoke)Branched Invoke This example uses a [`branch` processor](../branch/) to map a new payload for triggering a lambda function with an ID and username from the original message, and the result of the lambda is discarded, meaning the original message is unchanged. ```yaml pipeline: processors: - branch: request_map: '{"id":this.doc.id,"username":this.user.name}' processors: - aws_lambda: function: trigger_user_update ``` ## [](#fields)Fields ### [](#credentials-2)`credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#credentials-from_ec2_role)`credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#credentials-id)`credentials.id` The ID of credentials to use. **Type**: `string` ### [](#credentials-profile)`credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#credentials-role)`credentials.role` A role ARN to assume. **Type**: `string` ### [](#credentials-role_external_id)`credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#credentials-secret)`credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#credentials-token)`credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#endpoint)`endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#function)`function` The function to invoke. **Type**: `string` ### [](#parallel)`parallel` Whether messages of a batch should be dispatched in parallel. **Type**: `bool` **Default**: `false` ### [](#rate_limit)`rate_limit` An optional [`rate_limit`](../../rate_limits/about/) to throttle invocations by. **Type**: `string` **Default**: `""` ### [](#region)`region` The AWS region to target. **Type**: `string` ### [](#retries)`retries` The maximum number of retry attempts for each message. **Type**: `int` **Default**: `3` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum period of time to wait before abandoning an invocation. **Type**: `string` **Default**: `5s` --- # Page 210: azure_cosmosdb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/azure_cosmosdb.md --- # azure\_cosmosdb --- title: azure_cosmosdb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/azure_cosmosdb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/azure_cosmosdb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/azure_cosmosdb.adoc categories: "[\"Azure\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/azure_cosmosdb/)[Input](/redpanda-cloud/develop/connect/components/inputs/azure_cosmosdb/)[Output](/redpanda-cloud/develop/connect/components/outputs/azure_cosmosdb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/azure_cosmosdb/ "View the Self-Managed version of this component") Creates or updates messages as JSON documents in [Azure CosmosDB](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction). #### Common ```yml processors: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) operation: Create item_id: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" azure_cosmosdb: endpoint: "" # No default (optional) account_key: "" # No default (optional) connection_string: "" # No default (optional) database: "" # No default (required) container: "" # No default (required) partition_keys_map: "" # No default (required) operation: Create patch_operations: [] # No default (optional) patch_condition: "" # No default (optional) auto_id: true item_id: "" # No default (optional) enable_content_response_on_write: true ``` When creating documents, each message must have the `id` property (case-sensitive) set (or use `auto_id: true`). It is the unique name that identifies the document, that is, no two documents share the same `id` within a logical partition. The `id` field must not exceed 255 characters. [See details](https://learn.microsoft.com/en-us/rest/api/cosmos-db/documents). The `partition_keys` field must resolve to the same value(s) across the entire message batch. ## [](#credentials)Credentials You can use one of the following authentication mechanisms: - Set the `endpoint` field and the `account_key` field - Set only the `endpoint` field to use [DefaultAzureCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential) - Set the `connection_string` field ## [](#metadata)Metadata This component adds the following metadata fields to each message: ```none - activity_id - request_charge ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#batching)Batching CosmosDB limits the maximum batch size to 100 messages and the payload must not exceed 2MB ([details here](https://learn.microsoft.com/en-us/azure/cosmos-db/concepts-limits#per-request-limits)). ## [](#examples)Examples ### [](#patch-documents)Patch documents Query documents from a container and patch them. ```yaml input: azure_cosmosdb: endpoint: http://localhost:8080 account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== database: blobbase container: blobfish partition_keys_map: root = "AbyssalPlain" query: SELECT * FROM blobfish processors: - mapping: | root = "" meta habitat = json("habitat") meta id = this.id - azure_cosmosdb: endpoint: http://localhost:8080 account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== database: testdb container: blobfish partition_keys_map: root = json("habitat") item_id: ${! meta("id") } operation: Patch patch_operations: # Add a new /diet field - operation: Add path: /diet value_map: root = json("diet") # Remove the first location from the /locations array field - operation: Remove path: /locations/0 # Add new location at the end of the /locations array field - operation: Add path: /locations/- value_map: root = "Challenger Deep" # Return the updated document enable_content_response_on_write: true ``` ## [](#fields)Fields ### [](#account_key)`account_key` Account key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: account_key: C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw== ``` ### [](#auto_id)`auto_id` Automatically set the item `id` field to a random UUID v4. If the `id` field is already set, then it will not be overwritten. Setting this to `false` can improve performance, since the messages will not have to be parsed. **Type**: `bool` **Default**: `true` ### [](#connection_string)`connection_string` Connection string. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: connection_string: AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==; ``` ### [](#container)`container` Container. **Type**: `string` ```yaml # Examples: container: testcontainer ``` ### [](#database)`database` Database. **Type**: `string` ```yaml # Examples: database: testdb ``` ### [](#enable_content_response_on_write)`enable_content_response_on_write` Enable content response on write operations. To save some bandwidth, set this to false if you don’t need to receive the updated message(s) from the server, in which case the processor will not modify the content of the messages which are fed into it. Applies to every operation except Read. **Type**: `bool` **Default**: `true` ### [](#endpoint)`endpoint` CosmosDB endpoint. **Type**: `string` ```yaml # Examples: endpoint: https://localhost:8081 ``` ### [](#item_id)`item_id` ID of item to replace or delete. Only used by the Replace and Delete operations This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: item_id: ${! json("id") } ``` ### [](#operation)`operation` Operation. **Type**: `string` **Default**: `Create` | Option | Summary | | --- | --- | | Create | Create operation. | | Delete | Delete operation. | | Patch | Patch operation. | | Read | Read operation. | | Replace | Replace operation. | | Upsert | Upsert operation. | ### [](#partition_keys_map)`partition_keys_map` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to a single partition key value or an array of partition key values of type string, integer or boolean. Currently, hierarchical partition keys are not supported so only one value may be provided. **Type**: `string` ```yaml # Examples: partition_keys_map: root = "blobfish" # --- partition_keys_map: root = 41 # --- partition_keys_map: root = true # --- partition_keys_map: root = null # --- partition_keys_map: root = json("blobfish").depth ``` ### [](#patch_condition)`patch_condition` Patch operation condition. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: patch_condition: from c where not is_defined(c.blobfish) ``` ### [](#patch_operations)`patch_operations[]` Patch operations to be performed when `operation: Patch` . **Type**: `object` ### [](#patch_operations-operation)`patch_operations[].operation` Operation. **Type**: `string` **Default**: `Add` | Option | Summary | | --- | --- | | Add | Add patch operation. | | Increment | Increment patch operation. | | Remove | Remove patch operation. | | Replace | Replace patch operation. | | Set | Set patch operation. | ### [](#patch_operations-path)`patch_operations[].path` Path. **Type**: `string` ```yaml # Examples: path: /foo/bar/baz ``` ### [](#patch_operations-value_map)`patch_operations[].value_map` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to a value of any type that is supported by CosmosDB. **Type**: `string` ```yaml # Examples: value_map: root = "blobfish" # --- value_map: root = 41 # --- value_map: root = true # --- value_map: root = json("blobfish").depth # --- value_map: root = [1, 2, 3] ``` ## [](#cosmosdb-emulator)CosmosDB emulator If you wish to run the CosmosDB emulator that is referenced in the documentation [here](https://learn.microsoft.com/en-us/azure/cosmos-db/linux-emulator), the following Docker command should do the trick: ```bash > docker run --rm -it -p 8081:8081 --name=cosmosdb -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator ``` Note: `AZURE_COSMOS_EMULATOR_PARTITION_COUNT` controls the number of partitions that will be supported by the emulator. The bigger the value, the longer it takes for the container to start up. Additionally, instead of installing the container self-signed certificate which is exposed via `[https://localhost:8081/_explorer/emulator.pem](https://localhost:8081/_explorer/emulator.pem)`, you can run [mitmproxy](https://mitmproxy.org/) like so: ```bash > mitmproxy -k --mode "reverse:https://localhost:8081" ``` Then you can access the CosmosDB UI via `[http://localhost:8080/_explorer/index.html](http://localhost:8080/_explorer/index.html)` and use `[http://localhost:8080](http://localhost:8080)` as the CosmosDB endpoint. --- # Page 211: benchmark **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/benchmark.md --- # benchmark --- title: benchmark latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/benchmark page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/benchmark.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/benchmark.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-12-16" page-git-modified-date: "2024-12-16" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/benchmark/ "View the Self-Managed version of this component") Logs throughput statistics for processed messages, and provides a summary of those statistics over the lifetime of the processor. ```yml # Configuration fields, showing default values label: "" benchmark: interval: 5s count_bytes: true ``` ## [](#throughput-statistics)Throughput statistics This processor logs the following rolling statistics at a [configurable interval](#interval) to help you to understand the current performance of your pipeline: - The number of messages processed per second. - The number of bytes processed per second (optional). For example: ```bash INFO rolling stats: 1 msg/sec, 407 B/sec ``` When the processor shuts down, it also logs a summary of the number and size of messages processed during its lifetime. For example: ```bash INFO total stats: 1.00186 msg/sec, 425 B/sec ``` ## [](#fields)Fields ### [](#count_bytes)`count_bytes` Whether to measure the number of bytes per second of throughput. If set to `true`, Redpanda Connect must serialize structured data to count the number of bytes processed, which can unnecessarily degrade performance if serialization is not required elsewhere in your pipeline. **Type**: `bool` **Default**: `true` ### [](#interval)`interval` How often to emit rolling statistics. Set to `0`, if you only want to log summary statistics when the processor shuts down. **Type**: `string` **Default**: `5s` --- # Page 212: bloblang **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/bloblang.md --- # bloblang --- title: bloblang latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/bloblang page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/bloblang.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/bloblang.adoc categories: "[\"Mapping\",\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/bloblang/ "View the Self-Managed version of this component") Executes a [Bloblang](../../../guides/bloblang/about/) mapping on messages. ```yml # Config fields, showing default values label: "" bloblang: "" ``` Bloblang is a powerful language that enables a wide range of mapping, transformation and filtering tasks. For more information see [Bloblang](../../../guides/bloblang/about/). If your mapping is large and you’d prefer for it to live in a separate file then you can execute a mapping directly from a file with the expression `from ""`, where the path must be absolute, or relative from the location that Redpanda Connect is executed from. ## [](#component-rename)Component rename This processor was recently renamed to the [`mapping` processor](../mapping/) in order to make the purpose of the processor more prominent. It is still valid to use the existing `bloblang` name but eventually it will be deprecated and replaced by the new name in example configs. ## [](#examples)Examples ### [](#mapping)Mapping Given JSON documents containing an array of fans: ```json { "id":"foo", "description":"a show about foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"grace","obsession":0.21}, {"name":"ali","obsession":0.89}, {"name":"vic","obsession":0.43} ] } ``` We can reduce the fans to only those with an obsession score above 0.5, giving us: ```json { "id":"foo", "description":"a show about foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"ali","obsession":0.89} ] } ``` With the following config: ```yaml pipeline: processors: - bloblang: | root = this root.fans = this.fans.filter(fan -> fan.obsession > 0.5) ``` ### [](#more-mapping)More Mapping When receiving JSON documents of the form: ```json { "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ] } ``` We could collapse the location names from the state of Washington into a field `Cities`: ```json {"Cities": "Bellevue, Olympia, Seattle"} ``` With the following config: ```yaml pipeline: processors: - bloblang: | root.Cities = this.locations. filter(loc -> loc.state == "WA"). map_each(loc -> loc.name). sort().join(", ") ``` ## [](#error-handling)Error handling Bloblang mappings can fail, in which case the message remains unchanged, errors are logged, and the message is flagged as having failed, allowing you to use [standard processor error handling patterns](../../../configuration/error_handling/). However, Bloblang itself also provides powerful ways of ensuring your mappings do not fail by specifying desired fallback behavior, which you can read about in [Error handling](../../../guides/bloblang/about/#error-handling.adoc). --- # Page 213: bounds_check **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/bounds_check.md --- # bounds\_check --- title: bounds_check latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/bounds_check page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/bounds_check.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/bounds_check.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/bounds_check/ "View the Self-Managed version of this component") Removes messages (and batches) that do not fit within certain size boundaries. #### Common ```yml processors: label: "" bounds_check: max_part_size: 1073741824 min_part_size: 1 ``` #### Advanced ```yml processors: label: "" bounds_check: max_part_size: 1073741824 min_part_size: 1 max_parts: 100 min_parts: 1 ``` ## [](#fields)Fields ### [](#max_part_size)`max_part_size` The maximum size of a message to allow (in bytes) **Type**: `int` **Default**: `1073741824` ### [](#max_parts)`max_parts` The maximum size of message batches to allow (in message count) **Type**: `int` **Default**: `100` ### [](#min_part_size)`min_part_size` The minimum size of a message to allow (in bytes) **Type**: `int` **Default**: `1` ### [](#min_parts)`min_parts` The minimum size of message batches to allow (in message count) **Type**: `int` **Default**: `1` --- # Page 214: branch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/branch.md --- # branch --- title: branch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/branch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/branch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/branch.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/branch/ "View the Self-Managed version of this component") The `branch` processor allows you to create a new request message via a [Bloblang mapping](../../../guides/bloblang/about/), execute a list of processors on the request messages, and, finally, map the result back into the source message using another mapping. ```yml # Config fields, showing default values label: "" branch: request_map: "" processors: [] # No default (required) result_map: "" ``` This is useful for preserving the original message contents when using processors that would otherwise replace the entire contents. ## [](#metadata)Metadata Metadata fields that are added to messages during branch processing will not be automatically copied into the resulting message. In order to do this you should explicitly declare in your `result_map` either a wholesale copy with `meta = metadata()`, or selective copies with `meta foo = metadata("bar")` and so on. It is also possible to reference the metadata of the origin message in the `result_map` using the [`@` operator](../../../guides/bloblang/about/#metadata). ## [](#error-handling)Error handling If the `request_map` fails the child processors will not be executed. If the child processors themselves result in an (uncaught) error then the `result_map` will not be executed. If the `result_map` fails the message will remain unchanged. Under any of these conditions standard [error handling methods](../../../configuration/error_handling/) can be used in order to filter, DLQ or recover the failed messages. ## [](#conditional-branching)Conditional branching If the root of your request map is set to `deleted()` then the branch processors are skipped for the given message, this allows you to conditionally branch messages. ## [](#fields)Fields ### [](#processors)`processors[]` A list of processors to apply to mapped requests. When processing message batches the resulting batch must match the size and ordering of the input batch, therefore filtering, grouping should not be performed within these processors. **Type**: `processor` ### [](#request_map)`request_map` A [Bloblang mapping](../../../guides/bloblang/about/) that describes how to create a request payload suitable for the child processors of this branch. If left empty then the branch will begin with an exact copy of the origin message (including metadata). **Type**: `string` **Default**: `""` ```yaml # Examples: request_map: |- root = { "id": this.doc.id, "content": this.doc.body.text } # --- request_map: |- root = if this.type == "foo" { this.foo.request } else { deleted() } ``` ### [](#result_map)`result_map` A [Bloblang mapping](../../../guides/bloblang/about/) that describes how the resulting messages from branched processing should be mapped back into the original payload. If left empty the origin message will remain unchanged (including metadata). **Type**: `string` **Default**: `""` ```yaml # Examples: result_map: |- meta foo_code = metadata("code") root.foo_result = this # --- result_map: |- meta = metadata() root.bar.body = this.body root.bar.id = this.user.id # --- result_map: root.raw_result = content().string() # --- result_map: |- root.enrichments.foo = if metadata("request_failed") != null { throw(metadata("request_failed")) } else { this } # --- result_map: |- # Retain only the updated metadata fields which were present in the origin message meta = metadata().filter(v -> @.get(v.key) != null) ``` ## [](#examples)Examples ### [](#http-request)HTTP Request This example strips the request message into an empty body, grabs an HTTP payload, and places the result back into the original message at the path `image.pull_count`: ```yaml pipeline: processors: - branch: request_map: 'root = ""' processors: - http: url: https://hub.docker.com/v2/repositories/jeffail/benthos verb: GET headers: Content-Type: application/json result_map: root.image.pull_count = this.pull_count # Example input: {"id":"foo","some":"pre-existing data"} # Example output: {"id":"foo","some":"pre-existing data","image":{"pull_count":1234}} ``` ### [](#non-structured-results)Non Structured Results When the result of your branch processors is unstructured and you wish to simply set a resulting field to the raw output use the content function to obtain the raw bytes of the resulting message and then coerce it into your value type of choice: ```yaml pipeline: processors: - branch: request_map: 'root = this.document.id' processors: - cache: resource: descriptions_cache key: ${! content() } operator: get result_map: root.document.description = content().string() # Example input: {"document":{"id":"foo","content":"hello world"}} # Example output: {"document":{"id":"foo","content":"hello world","description":"this is a cool doc"}} ``` ### [](#lambda-function)Lambda Function This example maps a new payload for triggering a lambda function with an ID and username from the original message, and the result of the lambda is discarded, meaning the original message is unchanged. ```yaml pipeline: processors: - branch: request_map: '{"id":this.doc.id,"username":this.user.name}' processors: - aws_lambda: function: trigger_user_update # Example input: {"doc":{"id":"foo","body":"hello world"},"user":{"name":"fooey"}} # Output matches the input, which is unchanged ``` ### [](#conditional-caching)Conditional Caching This example caches a document by a message ID only when the type of the document is a foo: ```yaml pipeline: processors: - branch: request_map: | meta id = this.id root = if this.type == "foo" { this.document } else { deleted() } processors: - cache: resource: TODO operator: set key: ${! @id } value: ${! content() } ``` --- # Page 215: cache **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/cache.md --- # cache --- title: cache latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/cache page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/cache.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/cache.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/cache/)[Output](/redpanda-cloud/develop/connect/components/outputs/cache/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/cache/ "View the Self-Managed version of this component") Performs operations against a [cache resource](../../caches/about/) for each message, allowing you to store or retrieve data within message payloads. #### Common ```yml processors: label: "" cache: resource: "" # No default (required) operator: "" # No default (required) key: "" # No default (required) value: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" cache: resource: "" # No default (required) operator: "" # No default (required) key: "" # No default (required) value: "" # No default (optional) ttl: "" # No default (optional) ``` For use cases where you wish to cache the result of processors, consider using the [`cached` processor](../cached/) instead. This processor will interpolate functions within the `key` and `value` fields individually for each message. This allows you to specify dynamic keys and values based on the contents of the message payloads and metadata. You can find a list of functions in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries). ## [](#examples)Examples ### [](#deduplication)Deduplication Deduplication can be done using the add operator with a key extracted from the message payload, since it fails when a key already exists we can remove the duplicates using a [`mapping` processor](../mapping/): ```yaml pipeline: processors: - cache: resource: foocache operator: add key: '${! json("message.id") }' value: "storeme" - mapping: root = if errored() { deleted() } cache_resources: - label: foocache redis: url: tcp://TODO:6379 ``` ### [](#deduplication-batch-wide)Deduplication Batch-Wide Sometimes it’s necessary to deduplicate a batch of messages (also known as a window) by a single identifying value. This can be done by introducing a [`branch` processor](../branch/), which executes the cache only once on behalf of the batch, in this case with a value make from a field extracted from the first and last messages of the batch: ```yaml pipeline: processors: # Try and add one message to a cache that identifies the whole batch - branch: request_map: | root = if batch_index() == 0 { json("id").from(0) + json("meta.tail_id").from(-1) } else { deleted() } processors: - cache: resource: foocache operator: add key: ${! content() } value: t # Delete all messages if we failed - mapping: | root = if errored().from(0) { deleted() } ``` ### [](#hydration)Hydration It’s possible to enrich payloads with content previously stored in a cache by using the [`branch`](../branch/) processor: ```yaml pipeline: processors: - branch: processors: - cache: resource: foocache operator: get key: '${! json("message.document_id") }' result_map: 'root.message.document = this' # NOTE: If the data stored in the cache is not valid JSON then use # something like this instead: # result_map: 'root.message.document = content().string()' cache_resources: - label: foocache memcached: addresses: [ "TODO:11211" ] ``` ## [](#fields)Fields ### [](#key)`key` A key to use with the cache. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#operator)`operator` The [operation](#operators) to perform with the cache. **Type**: `string` **Options**: `set`, `add`, `get`, `delete`, `exists` ### [](#resource)`resource` The [`cache` resource](../../caches/about/) to target with this processor. **Type**: `string` ### [](#ttl)`ttl` The time to live (TTL) of each individual item as a duration string. After this period an item will be eligible for removal during the next compaction. Not all caches support per-key TTLs, those that do will have a configuration field `default_ttl`, and those that do not will fall back to their generally configured TTL setting. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: ttl: 60s # --- ttl: 5m # --- ttl: 36h ``` ### [](#value)`value` A value to use with the cache (when applicable). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ## [](#operators)Operators ### [](#set)`set` Set a key in the cache to a value. If the key already exists the contents are overridden. ### [](#add)`add` Set a key in the cache to a value. If the key already exists the action fails with a 'key already exists' error, which can be detected with [processor error handling](../../../configuration/error_handling/). ### [](#get)`get` Retrieve the contents of a cached key and replace the original message payload with the result. If the key does not exist the action fails with an error, which can be detected with [processor error handling](../../../configuration/error_handling/). ### [](#exists)`exists` Check whether a specific key is in the cache and replace the original message payload with `true` if the key exists, or `false` if it doesn’t. ### [](#delete)`delete` Delete a key and its contents from the cache. If the key does not exist the action is a no-op and will not fail with an error. --- # Page 216: cached **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/cached.md --- # cached --- title: cached latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/cached page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/cached.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/cached.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/cached/ "View the Self-Managed version of this component") Cache the result of applying one or more processors to messages identified by a key. If the key already exists within the cache the contents of the message will be replaced with the cached result instead of applying the processors. This component is therefore useful in situations where an expensive set of processors need only be executed periodically. ```yml # Config fields, showing default values label: "" cached: cache: "" # No default (required) skip_on: errored() # No default (optional) key: my_foo_result # No default (required) ttl: "" # No default (optional) processors: [] # No default (required) ``` The format of the data when stored within the cache is a custom and versioned schema chosen to balance performance and storage space. It is therefore not possible to point this processor to a cache that is pre-populated with data that this processor has not created itself. ## [](#examples)Examples ### [](#cached-enrichment)Cached Enrichment In the following example we want to we enrich messages consumed from Kafka with data specific to the origin topic partition, we do this by placing an `http` processor within a `branch`, where the HTTP URL contains interpolation functions with the topic and partition in the path. However, it would be inefficient to make this HTTP request for every single message as the result is consistent for all data of a given topic partition. We can solve this by placing our enrichment call within a `cached` processor where the key contains the topic and partition, resulting in messages that originate from the same topic/partition combination using the cached result of the prior. ```yaml pipeline: processors: - branch: processors: - cached: key: '${! meta("kafka_topic") }-${! meta("kafka_partition") }' cache: foo_cache processors: - mapping: 'root = ""' - http: url: http://example.com/enrichment/${! meta("kafka_topic") }/${! meta("kafka_partition") } verb: GET result_map: 'root.enrichment = this' cache_resources: - label: foo_cache memory: # Disable compaction so that cached items never expire compaction_interval: "" ``` ### [](#periodic-global-enrichment)Periodic Global Enrichment In the following example we enrich all messages with the same data obtained from a static URL with an `http` processor within a `branch`. However, we expect the data from this URL to change roughly every 10 minutes, so we configure a `cached` processor with a static key (since this request is consistent for all messages) and a TTL of `10m`. ```yaml pipeline: processors: - branch: request_map: 'root = ""' processors: - cached: key: static_foo cache: foo_cache ttl: 10m processors: - http: url: http://example.com/get/foo.json verb: GET result_map: 'root.foo = this' cache_resources: - label: foo_cache memory: {} ``` ## [](#fields)Fields ### [](#cache)`cache` The cache resource to read and write processor results from. **Type**: `string` ### [](#key)`key` A key to be resolved for each message, if the key already exists in the cache then the cached result is used, otherwise the processors are applied and the result is cached under this key. The key could be static and therefore apply generally to all messages or it could be an interpolated expression that is potentially unique for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: my_foo_result # --- key: ${! this.document.id } # --- key: ${! meta("kafka_key") } # --- key: ${! meta("kafka_topic") } ``` ### [](#processors)`processors[]` The list of processors whose result will be cached. **Type**: `processor` ### [](#skip_on)`skip_on` A condition that can be used to skip caching the results from the processors. **Type**: `string` ```yaml # Examples: skip_on: errored() ``` ### [](#ttl)`ttl` An optional expiry period to set for each cache entry. Some caches only have a general TTL and will therefore ignore this setting. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 217: catch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/catch.md --- # catch --- title: catch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/catch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/catch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/catch.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/catch/ "View the Self-Managed version of this component") Applies a list of child processors _only_ when a previous processing step has failed. ```yml # Config fields, showing default values label: "" catch: [] ``` Behaves similarly to the [`for_each`](../for_each/) processor, where a list of child processors are applied to individual messages of a batch. However, processors are only applied to messages that failed a processing step prior to the catch. For example, with the following config: ```yaml pipeline: processors: - resource: foo - catch: - resource: bar - resource: baz ``` If the processor `foo` fails for a particular message, that message will be fed into the processors `bar` and `baz`. Messages that do not fail for the processor `foo` will skip these processors. When messages leave the catch block their fail flags are cleared. This processor is useful for when it’s possible to recover failed messages, or when special actions (such as logging/metrics) are required before dropping them. More information about error handling can be found in [Error Handling](../../../configuration/error_handling/). --- # Page 218: cohere_chat **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/cohere_chat.md --- # cohere\_chat --- title: cohere_chat page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/cohere_chat page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/cohere_chat.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/cohere_chat.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-10-16" page-git-modified-date: "2024-10-16" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/cohere_chat/ "View the Self-Managed version of this component") Generates responses to messages in a chat conversation, using the [Cohere API](https://docs.cohere.com/docs/chat-api) and external tools. #### Common ```yml processors: label: "" cohere_chat: base_url: https://api.cohere.com api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) response_format: text json_schema: "" # No default (optional) max_tool_calls: 10 tools: [] ``` #### Advanced ```yml processors: label: "" cohere_chat: base_url: https://api.cohere.com api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) response_format: text json_schema: "" # No default (optional) schema_registry: url: "" # No default (required) subject: "" # No default (required) refresh_interval: "" # No default (optional) tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} top_p: "" # No default (optional) frequency_penalty: "" # No default (optional) presence_penalty: "" # No default (optional) seed: "" # No default (optional) stop: [] # No default (optional) max_tool_calls: 10 tools: [] ``` This processor sends the contents of user prompts to the Cohere API, which generates responses using all available context, including supplementary data provided by external tools. By default, the processor submits the entire payload of each message as a string, unless you use the `prompt` field to customize it. To learn more about chat completion, see the [Cohere API documentation](https://docs.cohere.com/docs/chat-api). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for the Cohere API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#base_url)`base_url` The base URL to use for API requests. **Type**: `string` **Default**: `[https://api.cohere.com](https://api.cohere.com)` ### [](#frequency_penalty)`frequency_penalty` A number between `-2.0` and `2.0`. Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model’s likelihood to repeat the same line verbatim. **Type**: `float` ### [](#json_schema)`json_schema` The JSON schema to use when responding in `json_schema` format. To learn more about the JSON schema features supported, see the [Cohere documentation](https://docs.cohere.com/docs/structured-outputs-json). **Type**: `string` ### [](#max_tokens)`max_tokens` The maximum number of tokens to allow in the chat completion. **Type**: `int` ### [](#max_tool_calls)`max_tool_calls` The maximum number of tool calls the model can perform. **Type**: `int` **Default**: `10` ### [](#model)`model` The name of the Cohere large language model (LLM) you want to use. **Type**: `string` ```yaml # Examples: model: command-r-plus # --- model: command-r # --- model: command # --- model: command-light ``` ### [](#presence_penalty)`presence_penalty` A number between `-2.0` and `2.0`. Positive values penalize new tokens based on the frequency of their appearance in the text so far. This increases the model’s likelihood to talk about new topics. **Type**: `float` ### [](#prompt)`prompt` The user prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#response_format)`response_format` Choose the model’s output format. If `json_schema` is specified, then you must also configure a `json_schema` or `schema_registry`. **Type**: `string` **Default**: `text` **Options**: `text`, `json`, `json_schema` ### [](#schema_registry)`schema_registry` The schema registry to dynamically load schemas from when responding in `json_schema` format. Schemas themselves must be in JSON format. To learn more about the JSON schema features supported, see the [Cohere documentation](https://docs.cohere.com/docs/structured-outputs-json). **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Configure basic authentication for requests from this component to your schema registry. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` The username of the account credentials to authenticate as. Used together with `password` for basic authentication. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Configure JSON Web Token (JWT) authentication for secure data transmission from your schema registry to this component. This feature is in beta and may change in future releases. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` Values used to pass the identity of the authenticated entity to the service provider. In this case, between this component and the schema registry. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` The key/value pairs that identify the type of token and signing algorithm. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth)`schema_registry.oauth` Configure OAuth version 1.0 to give this component authorized access to your schema registry. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` The value this component can use to gain access to the data in the schema registry. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` The value used to identify this component or client to your schema registry. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to enable OAuth version 1.0 authentication for requests to the schema registry. **Type**: `bool` **Default**: `false` ### [](#schema_registry-refresh_interval)`schema_registry.refresh_interval` The refresh rate for fetching the latest schema. If not specified the schema does not refresh. **Type**: `string` ### [](#schema_registry-subject)`schema_registry.subject` The subject name to fetch the schema for. **Type**: `string` ### [](#schema_registry-tls)`schema_registry.tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#schema_registry-url)`schema_registry.url` The base URL of the schema registry service. **Type**: `string` ### [](#seed)`seed` If specified, Redpanda Connect makes a best effort to sample deterministically. Repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. **Type**: `int` ### [](#stop)`stop[]` Specify up to four sequences to stop the API from generating further tokens. **Type**: `array` ### [](#system_prompt)`system_prompt` The system prompt to submit along with the user prompt. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#temperature)`temperature` Choose a sampling temperature between `0` and `2`: - Higher values, such as `0.8` make the output more random. - Lower values, such as `0.2` make the output more focused and deterministic. Redpanda recommends adding a value for this field or `top_p`, but not both. **Type**: `float` ### [](#tools)`tools[]` External tools that the model can invoke, such as functions, APIs, or web browsing. You can define a series of processors that describe these tools, enabling the model to use agent-like behavior to decide when and how to invoke them to enhance response generation. **Type**: `object` **Default**: `[]` ### [](#tools-description)`tools[].description` A description of this tool, the LLM uses this to decide if the tool should be used. **Type**: `string` ### [](#tools-name)`tools[].name` The name of this tool. **Type**: `string` ### [](#tools-parameters)`tools[].parameters` The parameters the LLM needs to provide to invoke this tool. **Type**: `object` ### [](#tools-parameters-properties)`tools[].parameters.properties` The properties for the processor’s input data **Type**: `object` ### [](#tools-parameters-properties-description)`tools[].parameters.properties.description` A description of this parameter. **Type**: `string` ### [](#tools-parameters-properties-enum)`tools[].parameters.properties.enum[]` Specifies that this parameter is an enum and only these specific values should be used. **Type**: `array` **Default**: `[]` ### [](#tools-parameters-properties-type)`tools[].parameters.properties.type` The type of this parameter. **Type**: `string` ### [](#tools-parameters-required)`tools[].parameters.required[]` The required parameters for this pipeline. **Type**: `array` **Default**: `[]` ### [](#tools-processors)`tools[].processors[]` The pipeline to execute when the LLM uses this tool. **Type**: `processor` ### [](#top_p)`top_p` An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. For example, a `top_p` of `0.1` means only the tokens comprising the top 10% probability mass are sampled. Redpanda recommends adding a value for this field or `temperature`, but not both. **Type**: `float` ## [](#example)Example In this pipeline configuration, the Command R+ model executes a number of processors, which make a tool call to retrieve weather data for a specific city. ```yaml input: generate: count: 1 mapping: | root = "What is the weather like in Chicago?" pipeline: processors: - cohere_chat: auth_token: my_cohere_api_token model: command-r-plus prompt: "${!content().string()}" tools: - name: GetWeather description: "Retrieve the weather for a specific city" parameters: required: ["city"] properties: city: type: string description: the city to look up the weather for processors: - http: verb: GET url: 'https://wttr.in/${!this.city}?T' headers: User-Agent: curl/8.11.1 # Returns a text string from the weather website output: stdout: {} ``` --- # Page 219: cohere_embeddings **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/cohere_embeddings.md --- # cohere\_embeddings --- title: cohere_embeddings page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/cohere_embeddings page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/cohere_embeddings.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/cohere_embeddings.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-10-16" page-git-modified-date: "2024-10-16" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/cohere_embeddings/ "View the Self-Managed version of this component") Generates vector embeddings to represent input text, using the [Cohere API](https://docs.cohere.com/docs/embeddings). ```yml # Configuration fields, showing default values label: "" cohere_embeddings: base_url: https://api.cohere.com auth_token: "" # No default (required) model: embed-english-v3.0 # No default (required) text_mapping: "" # No default (optional) input_type: search_document dimensions: "" # No default (optional) ``` This processor sends text strings to your chosen large language model (LLM), which generates vector embeddings for them using the Cohere API. By default, the processor submits the entire payload of each message as a string, unless you use the `text_mapping` field to customize it. To learn more about vector embeddings, see the [Cohere API documentation](https://docs.cohere.com/docs/embeddings). ## [](#examples)Examples ### [](#store-embedding-vectors-in-qdrant)Store embedding vectors in Qdrant Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc\[Qdrant\] ```yaml input: generate: interval: 1s mapping: | root = {"text": fake("paragraph")} pipeline: processors: - cohere_embeddings: model: embed-english-v3 api_key: "${COHERE_API_KEY}" text_mapping: "root = this.text" output: qdrant: grpc_host: localhost:6334 collection_name: "example_collection" id: "root = uuid_v4()" vector_mapping: "root = this" ``` ## [](#fields)Fields ### [](#api_key)`api_key` The API key for the Cohere API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#base_url)`base_url` The base URL to use for API requests. **Type**: `string` **Default**: `[https://api.cohere.com](https://api.cohere.com)` ### [](#dimensions)`dimensions` The number of dimensions (numerical values) in each vector embedding generated by this processor. This parameter only supports [`embed-v4.0`](https://docs.cohere.com/v2/docs/embeddings) and newer models. **Type**: `int` ### [](#input_type)`input_type` The type of text input passed to the model. **Type**: `string` **Default**: `search_document` | Option | Summary | | --- | --- | | classification | Used for embeddings passed through a text classifier. | | clustering | Used for the embeddings run through a clustering algorithm. | | search_document | Used for embeddings stored in a vector database for search use-cases. | | search_query | Used for embeddings of search queries run against a vector DB to find relevant documents. | ### [](#model)`model` The name of the Cohere LLM you want to use. **Type**: `string` ```yaml # Examples: model: embed-english-v3.0 # --- model: embed-english-light-v3.0 # --- model: embed-multilingual-v3.0 # --- model: embed-multilingual-light-v3.0 ``` ### [](#text_mapping)`text_mapping` The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string. **Type**: `string` --- # Page 220: cohere_rerank **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/cohere_rerank.md --- # cohere\_rerank --- title: cohere_rerank latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/cohere_rerank page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/cohere_rerank.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/cohere_rerank.adoc page-git-created-date: "2025-05-19" page-git-modified-date: "2025-05-19" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/cohere_rerank/ "View the Self-Managed version of this component") Sends document strings to the [Cohere API](https://docs.cohere.com/reference/rerank), which returns them [ranked by their relevance to a specified query](https://docs.cohere.com/docs/rerank-2). The output of this processor is an array of strings, ordered by their relevance to the query. ```yml # Configuration fields, showing default values label: "" cohere_rerank: base_url: https://api.cohere.com api_key: "" # No default (required) model: rerank-v3.5 # No default (required) query: "" # No default (required) documents: "" # No default (required) top_n: 0 max_tokens_per_doc: 4096 ``` ## [](#metadata)Metadata - `relevance_scores`: An array of scores for each input document that indicates how relevant it is to the query. The scores are in the same order as the documents in the input. The higher the score, the more relevant the document. ## [](#examples)Examples ### [](#rerank-some-documents-based-on-a-query)Rerank some documents based on a query Rerank some documents based on a query ```yaml input: generate: interval: 1s mapping: | root = { "query": fake("sentence"), "docs": [fake("paragraph"), fake("paragraph"), fake("paragraph")], } pipeline: processors: - cohere_rerank: model: rerank-v3.5 api_key: "${COHERE_API_KEY}" query: "${!this.query}" documents: "root = this.docs" output: stdout: {} ``` ## [](#fields)Fields ### [](#api_key)`api_key` Your API key for the Cohere API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#base_url)`base_url` The base URL to use for API requests. **Type**: `string` **Default**: `[https://api.cohere.com](https://api.cohere.com)` ### [](#documents)`documents` A list of text strings that are compared to the specified query. For optimal performance: - Send fewer than 1000 documents in a single request - Send structured data in YAML format **Type**: `string` ### [](#max_tokens_per_doc)`max_tokens_per_doc` This processor automatically truncates long documents to the specified number of tokens. **Type**: `int` **Default**: `4096` ### [](#model)`model` The name of the Cohere LLM you want to use. **Type**: `string` ```yaml # Examples: model: rerank-v3.5 ``` ### [](#query)`query` The search query you want to execute. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#top_n)`top_n` The number of documents to return when the query is executed. If set to `0`, all documents are returned. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `0` --- # Page 221: compress **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/compress.md --- # compress --- title: compress latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/compress page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/compress.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/compress.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/compress/ "View the Self-Managed version of this component") Compresses messages according to the selected algorithm. Supported compression algorithms are: \[flate gzip lz4 pgzip snappy zlib\] ```yml # Config fields, showing default values label: "" compress: algorithm: "" # No default (required) level: -1 ``` The 'level' field might not apply to all algorithms. ## [](#fields)Fields ### [](#algorithm)`algorithm` The compression algorithm to use. **Type**: `string` **Options**: `flate`, `gzip`, `lz4`, `pgzip`, `snappy`, `zlib` ### [](#level)`level` The level of compression to use. May not be applicable to all algorithms. **Type**: `int` **Default**: `-1` --- # Page 222: decompress **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/decompress.md --- # decompress --- title: decompress latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/decompress page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/decompress.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/decompress.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/decompress/)[Scanner](/redpanda-cloud/develop/connect/components/scanners/decompress/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/decompress/ "View the Self-Managed version of this component") Decompresses messages according to the selected algorithm. Supported decompression algorithms are: \[bzip2 flate gzip lz4 pgzip snappy zlib\] ```yml # Config fields, showing default values label: "" decompress: algorithm: "" # No default (required) ``` ## [](#fields)Fields ### [](#algorithm)`algorithm` The decompression algorithm to use. **Type**: `string` **Options**: `bzip2`, `flate`, `gzip`, `lz4`, `pgzip`, `snappy`, `zlib` --- # Page 223: dedupe **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/dedupe.md --- # dedupe --- title: dedupe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/dedupe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/dedupe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/dedupe.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/dedupe/ "View the Self-Managed version of this component") Deduplicates messages by storing a key value in a cache using the `add` operator. If the key already exists within the cache it is dropped. ```yml # Config fields, showing default values label: "" dedupe: cache: "" # No default (required) key: ${! meta("kafka_key") } # No default (required) drop_on_err: true ``` Caches must be configured as resources, for more information check out the [cache documentation](../../caches/about/). When using this processor with an output target that might fail you should always wrap the output within an indefinite [`retry`](../../outputs/retry/) block. This ensures that during outages your messages aren’t reprocessed after failures, which would result in messages being dropped. ## [](#batch-deduplication)Batch deduplication This processor enacts on individual messages only, in order to perform a deduplication on behalf of a batch (or window) of messages instead use the [`cache` processor](../cache/#examples). ## [](#delivery-guarantees)Delivery guarantees Performing deduplication on a stream using a distributed cache voids any at-least-once guarantees that it previously had. This is because the cache will preserve message signatures even if the message fails to leave the Redpanda Connect pipeline, which would cause message loss in the event of an outage at the output sink followed by a restart of the Redpanda Connect instance (or a server crash, etc). This problem can be mitigated by using an in-memory cache and distributing messages to horizontally scaled Redpanda Connect pipelines partitioned by the deduplication key. However, in situations where at-least-once delivery guarantees are important it is worth avoiding deduplication in favour of implement idempotent behavior at the edge of your stream pipelines. ## [](#fields)Fields ### [](#cache)`cache` The [`cache` resource](../../caches/about/) to target with this processor. **Type**: `string` ### [](#drop_on_err)`drop_on_err` Whether messages should be dropped when the cache returns a general error such as a network issue. **Type**: `bool` **Default**: `true` ### [](#key)`key` An interpolated string yielding the key to deduplicate by for each message. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: ${! meta("kafka_key") } # --- key: ${! content().hash("xxhash64") } ``` ## [](#examples)Examples ### [](#deduplicate-based-on-kafka-key)Deduplicate based on Kafka key The following configuration demonstrates a pipeline that deduplicates messages based on the Kafka key. ```yaml pipeline: processors: - dedupe: cache: keycache key: ${! meta("kafka_key") } cache_resources: - label: keycache memory: default_ttl: 60s ``` --- # Page 224: for_each **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/for_each.md --- # for\_each --- title: for_each latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/for_each page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/for_each.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/for_each.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/for_each/ "View the Self-Managed version of this component") A processor that applies a list of child processors to messages of a batch as though they were each a batch of one message. ```yml # Config fields, showing default values label: "" for_each: [] ``` This is useful for forcing batch wide processors such as [`dedupe`](../dedupe/) or interpolations such as the `value` field of the `metadata` processor to execute on individual message parts of a batch instead. Please note that most processors already process per message of a batch, and this processor is not needed in those cases. --- # Page 225: gcp_bigquery_select **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/gcp_bigquery_select.md --- # gcp\_bigquery\_select --- title: gcp_bigquery_select latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/gcp_bigquery_select page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/gcp_bigquery_select.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/gcp_bigquery_select.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/gcp_bigquery_select/)[Input](/redpanda-cloud/develop/connect/components/inputs/gcp_bigquery_select/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/gcp_bigquery_select/ "View the Self-Managed version of this component") Executes a `SELECT` query against BigQuery and replaces messages with the rows returned. ```yml # Config fields, showing default values label: "" gcp_bigquery_select: project: "" # No default (required) credentials_json: "" # No default (optional) table: bigquery-public-data.samples.shakespeare # No default (required) columns: [] # No default (required) where: type = ? and created_at > ? # No default (optional) job_labels: {} args_mapping: root = [ "article", now().ts_format("2006-01-02") ] # No default (optional) prefix: "" # No default (optional) suffix: "" # No default (optional) ``` ## [](#examples)Examples ### [](#word-count)Word count Given a stream of English terms, enrich the messages with the word count from Shakespeare’s public works: ```yaml pipeline: processors: - branch: processors: - gcp_bigquery_select: project: test-project table: bigquery-public-data.samples.shakespeare columns: - word - sum(word_count) as total_count where: word = ? suffix: | GROUP BY word ORDER BY total_count DESC LIMIT 10 args_mapping: root = [ this.term ] result_map: | root.count = this.get("0.total_count") ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `where`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ "article", now().ts_format("2006-01-02") ] ``` ### [](#columns)`columns[]` A list of columns to query. **Type**: `array` ### [](#credentials_json)`credentials_json` Base64-encoded Google Service Account credentials in JSON format (optional). Use this field to authenticate with Google Cloud services. For more information about creating service account credentials, see [Google’s service account documentation](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#job_labels)`job_labels` A list of labels to add to the query job. **Type**: `string` **Default**: `{}` ### [](#prefix)`prefix` An optional prefix to prepend to the select query (before SELECT). **Type**: `string` ### [](#project)`project` GCP project where the query job will execute. **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the select query. **Type**: `string` ### [](#table)`table` Fully-qualified BigQuery table name to query. **Type**: `string` ```yaml # Examples: table: bigquery-public-data.samples.shakespeare ``` ### [](#where)`where` An optional where clause to add. Placeholder arguments are populated with the `args_mapping` field. Placeholders should always be question marks (`?`). **Type**: `string` ```yaml # Examples: where: type = ? and created_at > ? # --- where: user_id = ? ``` --- # Page 226: gcp_vertex_ai_chat **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/gcp_vertex_ai_chat.md --- # gcp\_vertex\_ai\_chat --- title: gcp_vertex_ai_chat latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/gcp_vertex_ai_chat page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/gcp_vertex_ai_chat.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/gcp_vertex_ai_chat.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/gcp_vertex_ai_chat/ "View the Self-Managed version of this component") Generates responses to messages in a chat conversation, using the [Vertex API AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform). #### Common ```yml processors: label: "" gcp_vertex_ai_chat: project: "" # No default (required) credentials_json: "" # No default (optional) location: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) history: "" # No default (optional) attachment: "" # No default (optional) temperature: "" # No default (optional) max_tokens: "" # No default (optional) response_format: text tools: [] ``` #### Advanced ```yml processors: label: "" gcp_vertex_ai_chat: project: "" # No default (required) credentials_json: "" # No default (optional) location: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) history: "" # No default (optional) attachment: "" # No default (optional) temperature: "" # No default (optional) max_tokens: "" # No default (optional) response_format: text top_p: "" # No default (optional) top_k: "" # No default (optional) stop: [] # No default (optional) presence_penalty: "" # No default (optional) frequency_penalty: "" # No default (optional) max_tool_calls: 10 tools: [] ``` This processor sends prompts to your chosen large language model (LLM) and generates text from the responses, using the Vertex AI API. For more information, see the [Vertex AI documentation](https://cloud.google.com/vertex-ai/docs). ## [](#fields)Fields ### [](#attachment)`attachment` Additional data like an image to send with the prompt to the model. The result of the mapping must be a byte array, and the content type is automatically detected. **Type**: `string` ```yaml # Examples: attachment: root = this.image.decode("base64") # decode base64 encoded image ``` ### [](#credentials_json)`credentials_json` An optional field to set a Google Service Account Credentials JSON. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#frequency_penalty)`frequency_penalty` Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. **Type**: `float` ### [](#history)`history` Historical messages to include in the chat request. The result of the bloblang query should be an array of objects of the form of \[{"role": "", "content":""}\], where role is "user" or "model". **Type**: `string` ### [](#location)`location` Specify the location of a fine tuned model. For base models, you can omit this field. **Type**: `string` ```yaml # Examples: location: us-central1 ``` ### [](#max_tokens)`max_tokens` The maximum number of output tokens to generate per message. **Type**: `int` ### [](#max_tool_calls)`max_tool_calls` The maximum number of sequential tool calls. **Type**: `int` **Default**: `10` ### [](#model)`model` The name of the LLM to use. For a full list of models, see the [Vertex AI Model Garden](https://console.cloud.google.com/vertex-ai/model-garden). **Type**: `string` ```yaml # Examples: model: gemini-1.5-pro-001 # --- model: gemini-1.5-flash-001 ``` ### [](#presence_penalty)`presence_penalty` Positive values penalize new tokens if they appear in the text already, increasing the model’s likelihood to include new topics. **Type**: `float` ### [](#project)`project` The GCP project ID to use. **Type**: `string` ### [](#prompt)`prompt` The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#response_format)`response_format` The format of the generated response. You must also prompt the model to output the appropriate response type. **Type**: `string` **Default**: `text` **Options**: `text`, `json` ### [](#stop)`stop[]` Sets the stop sequences to use. When this pattern is encountered the LLM stops generating text and returns the final response. **Type**: `array` ### [](#system_prompt)`system_prompt` The system prompt to submit to the Vertex AI LLM. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#temperature)`temperature` Controls the randomness of predictions. **Type**: `float` ### [](#tools)`tools[]` The tools to allow the LLM to invoke. This allows building subpipelines that the LLM can choose to invoke to execute agentic-like actions. **Type**: `object` **Default**: `[]` ### [](#tools-description)`tools[].description` A description of this tool, the LLM uses this to decide if the tool should be used. **Type**: `string` ### [](#tools-name)`tools[].name` The name of this tool. **Type**: `string` ### [](#tools-parameters)`tools[].parameters` The parameters the LLM needs to provide to invoke this tool. **Type**: `object` ### [](#tools-parameters-properties)`tools[].parameters.properties` The properties for the processor’s input data **Type**: `object` ### [](#tools-parameters-properties-description)`tools[].parameters.properties.description` A description of this parameter. **Type**: `string` ### [](#tools-parameters-properties-enum)`tools[].parameters.properties.enum[]` Specifies that this parameter is an enum and only these specific values should be used. **Type**: `array` **Default**: `[]` ### [](#tools-parameters-properties-type)`tools[].parameters.properties.type` The type of this parameter. **Type**: `string` ### [](#tools-parameters-required)`tools[].parameters.required[]` The required parameters for this pipeline. **Type**: `array` **Default**: `[]` ### [](#tools-processors)`tools[].processors[]` The pipeline to execute when the LLM uses this tool. **Type**: `processor` ### [](#top_k)`top_k` Enables top-k sampling (optional). **Type**: `float` ### [](#top_p)`top_p` Enables nucleus sampling (optional). **Type**: `float` --- # Page 227: gcp_vertex_ai_embeddings **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/gcp_vertex_ai_embeddings.md --- # gcp\_vertex\_ai\_embeddings --- title: gcp_vertex_ai_embeddings page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/gcp_vertex_ai_embeddings page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/gcp_vertex_ai_embeddings.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/gcp_vertex_ai_embeddings.adoc # Beta release status page-beta: "true" page-git-created-date: "2024-10-16" page-git-modified-date: "2024-10-16" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/gcp_vertex_ai_embeddings/ "View the Self-Managed version of this component") Generates vector embeddings to represent a text string, using the [Vertex AI API](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings). ```yml # Configuration fields, showing default values label: "" gcp_vertex_ai_embeddings: project: "" # No default (required) credentials_json: "" # No default (optional) location: us-central1 model: text-embedding-004 # No default (required) task_type: RETRIEVAL_DOCUMENT text: "" # No default (optional) output_dimensions: 0 # No default (optional) ``` This processor sends text strings to the Vertex AI API, which generates vector embeddings for them. By default, the processor submits the entire payload of each message as a string, unless you use the `text` field to customize it. For more information, see the [Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings). ## [](#fields)Fields ### [](#credentials_json)`credentials_json` Set your Google Service Account Credentials as JSON. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#location)`location` The location of the Vertex AI large language model (LLM) that you want to use. **Type**: `string` **Default**: `us-central1` ### [](#model)`model` The name of the LLM to use. For a full list of models, see the [Vertex AI Model Garden](https://console.cloud.google.com/vertex-ai/model-garden). **Type**: `string` ```yaml # Examples: model: text-embedding-004 # --- model: text-multilingual-embedding-002 ``` ### [](#output_dimensions)`output_dimensions` The maximum length of a generated vector embedding. If this value is set, generated embeddings are truncated to this size. **Type**: `int` ### [](#project)`project` The ID of your Google Cloud project. **Type**: `string` ### [](#task_type)`task_type` Use the following options to optimize embeddings that the model generates for specific use cases. **Type**: `string` **Default**: `RETRIEVAL_DOCUMENT` | Option | Summary | | --- | --- | | CLASSIFICATION | optimize for being able classify texts according to preset labels | | CLUSTERING | optimize for clustering texts based on their similarities | | FACT_VERIFICATION | optimize for queries that are proving or disproving a fact such as "apples grow underground" | | QUESTION_ANSWERING | optimize for search proper questions such as "Why is the sky blue?" | | RETRIEVAL_DOCUMENT | optimize for documents that will be searched (also known as a corpus) | | RETRIEVAL_QUERY | optimize for queries such as "What is the best fish recipe?" or "best restaurant in Chicago" | | SEMANTIC_SIMILARITY | optimize for text similarity | ### [](#text)`text` The text you want to generate vector embeddings for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 228: google_drive_download **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/google_drive_download.md --- # google\_drive\_download --- title: google_drive_download latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/google_drive_download page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/google_drive_download.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/google_drive_download.adoc page-git-created-date: "2025-05-19" page-git-modified-date: "2025-05-19" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/google_drive_download/ "View the Self-Managed version of this component") Downloads files from Google Drive that contain matching file IDs. Try out the [example pipeline on this page](#example), which downloads all files from your Google Drive. #### Common ```yml processors: label: "" google_drive_download: credentials_json: "" # No default (optional) file_id: "" # No default (required) mime_type: "" # No default (required) shared_drives: false ``` #### Advanced ```yml processors: label: "" google_drive_download: credentials_json: "" # No default (optional) file_id: "" # No default (required) mime_type: "" # No default (required) export_mime_types: application/vnd.google-apps.document: "text/markdown" application/vnd.google-apps.drawing: "image/png" application/vnd.google-apps.presentation: "application/pdf" application/vnd.google-apps.script: "application/vnd.google-apps.script+json" application/vnd.google-apps.spreadsheet: "text/csv" shared_drives: false ``` ## [](#authentication)Authentication By default, this processor uses [Google Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) to authenticate with Google APIs. To set up local ADC authentication, use the following `gcloud` commands: - Authenticate using Application Default Credentials and grant read-only access to your Google Drive. ```bash gcloud auth application-default login --scopes='openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive.readonly' ``` - Assign a quota project to the Application Default Credentials when using a user account. ```bash gcloud auth application-default set-quota-project ``` Replace the `` placeholder with your Google Cloud project ID To use a service account instead, create a JSON key for the account and add it to the [`credentials_json`](#credentials_json) field. To access Google Drive files using a service account, either: - Explicitly share files with the service account’s email account - Use [domain-wide delegation](https://support.google.com/a/answer/162106) to share all files within a Google Workspace ## [](#fields)Fields ### [](#credentials_json)`credentials_json` The JSON key for your service account (optional). If left empty, Application Default Credentials are used. For more details, see [Authentication](#authentication). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#export_mime_types)`export_mime_types` Maps Google Drive MIME types to [supported file export formats](https://developers.google.com/workspace/drive/api/guides/ref-export-formats). The MIME type is the key, and the export format is the value. **Type**: `string` **Default**: ```yaml application/vnd.google-apps.document: "text/markdown" application/vnd.google-apps.drawing: "image/png" application/vnd.google-apps.presentation: "application/pdf" application/vnd.google-apps.script: "application/vnd.google-apps.script+json" application/vnd.google-apps.spreadsheet: "text/csv" ``` ```yaml # Examples: export_mime_types: application/vnd.google-apps.document: application/pdf application/vnd.google-apps.drawing: application/pdf application/vnd.google-apps.presentation: application/pdf application/vnd.google-apps.spreadsheet: application/pdf # --- export_mime_types: application/vnd.google-apps.document: application/vnd.openxmlformats-officedocument.wordprocessingml.document application/vnd.google-apps.drawing: image/svg+xml application/vnd.google-apps.presentation: application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.google-apps.spreadsheet: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ``` ### [](#file_id)`file_id` The ID of the file to download from Google Drive. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#mime_type)`mime_type` The [MIME type](https://developers.google.com/workspace/drive/api/guides/mime-types) of the file for download. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#shared_drives)`shared_drives` Whether or not to include shared drives. **Type**: `bool` **Default**: `false` ## [](#example)Example This example downloads all files from a Google Drive. ```yaml input: stdin: {} pipeline: processors: - google_drive_search: query: "${!content().string()}" - mutation: 'meta path = this.name' - google_drive_download: file_id: "${!this.id}" mime_type: "${!this.mimeType}" output: file: path: "${!@path}" codec: all-bytes ``` --- # Page 229: google_drive_list_labels **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/google_drive_list_labels.md --- # google\_drive\_list\_labels --- title: google_drive_list_labels latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/google_drive_list_labels page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/google_drive_list_labels.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/google_drive_list_labels.adoc categories: "[\"AI\"]" page-git-created-date: "2025-05-19" page-git-modified-date: "2025-05-19" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/google_drive_list_labels/ "View the Self-Managed version of this component") Lists [labels](https://developers.google.com/workspace/drive/api/guides/about-labels) for files on a Google Drive. ```yml # Configuration fields, showing default values label: "" google_drive_list_labels: credentials_json: "" # No default (optional) ``` ## [](#authentication)Authentication By default, this processor uses [Google Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) to authenticate with Google APIs. To set up local ADC authentication, use the following `gcloud` commands: - Authenticate using Application Default Credentials and grant read-only access to your Google Drive. ```bash gcloud auth application-default login --scopes='openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive.readonly' ``` - Assign a quota project to the Application Default Credentials when using a user account. ```bash gcloud auth application-default set-quota-project ``` Replace the `` placeholder with your Google Cloud project ID To use a service account instead, create a JSON key for the account and add it to the [`credentials_json`](#credentials_json) field. To access Google Drive files using a service account, either: - Explicitly share files with the service account’s email account - Use [domain-wide delegation](https://support.google.com/a/answer/162106) to share all files within a Google Workspace ## [](#fields)Fields ### [](#credentials_json)`credentials_json` The JSON key for your service account (optional). If left empty, Application Default Credentials are used. For more details, see [Authentication](#authentication). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` --- # Page 230: google_drive_search **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/google_drive_search.md --- # google\_drive\_search --- title: google_drive_search latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/google_drive_search page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/google_drive_search.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/google_drive_search.adoc page-git-created-date: "2025-05-19" page-git-modified-date: "2025-05-19" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/google_drive_search/ "View the Self-Managed version of this component") Searches Google Drive for files that match a specified query and emits the results as a batch of messages. Each message contains the [metadata of a Google Drive file](https://developers.google.com/workspace/drive/api/reference/rest/v3/files#File). Try out the [example pipeline on this page](#example), which searches for and downloads all Google Drive files that match the specified query. ```yml # Configuration fields, showing default values label: "" google_drive_search: credentials_json: "" # No default (optional) query: "" # No default (required) projection: - id - name - mimeType - size - labelInfo include_label_ids: "" # No default (optional) max_results: 64 ``` ## [](#authentication)Authentication By default, this processor uses [Google Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) to authenticate with Google APIs. To set up local ADC authentication, use the following `gcloud` commands: - Authenticate using Application Default Credentials and grant read-only access to your Google Drive. ```bash gcloud auth application-default login --scopes='openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive.readonly' ``` - Assign a quota project to the Application Default Credentials when using a user account. ```bash gcloud auth application-default set-quota-project ``` Replace the `` placeholder with your Google Cloud project ID To use a service account instead, create a JSON key for the account and add it to the [`credentials_json`](#credentials_json) field. To access Google Drive files using a service account, either: - Explicitly share files with the service account’s email account - Use [domain-wide delegation](https://support.google.com/a/answer/162106) to share all files within a Google Workspace ## [](#fields)Fields ### [](#credentials_json)`credentials_json` The JSON key for your service account (optional). If left empty, Application Default Credentials are used. For more details, see [Authentication](#authentication). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#include_label_ids)`include_label_ids` A comma delimited list of label IDs to include in the Google Drive search result. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#max_results)`max_results` The maximum number of search results to return. **Type**: `int` **Default**: `64` ### [](#projection)`projection[]` Partial fields to include in the Google Drive search result. **Type**: `array` **Default**: ```yaml - "id" - "name" - "mimeType" - "size" - "labelInfo" ``` ### [](#query)`query` Specify a search query to locate matching files in Google Drive. This field supports: - The same query syntax as the Google Drive UI - [Bloblang interpolation functions](../../../configuration/interpolation/#bloblang-queries) for dynamic query generation **Type**: `string` ### [](#shared_drives)`shared_drives` Whether or not to include shared drives in the result. **Type**: `bool` **Default**: `false` ## [](#example)Example This example searches Google Drive for files matching a query and downloads each file to a specified location. It uses the `google_drive_search` processor to perform the search and the [`google_drive_download` processor](../google_drive_download/) to retrieve the files. ```yaml input: stdin: {} pipeline: processors: - google_drive_search: query: "${!content().string()}" - mutation: 'meta path = this.name' - google_drive_download: file_id: "${!this.id}" mime_type: "${!this.mimeType}" output: file: path: "${!@path}" codec: all-bytes ``` --- # Page 231: group_by_value **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/group_by_value.md --- # group\_by\_value --- title: group_by_value latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/group_by_value page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/group_by_value.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/group_by_value.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/group_by_value/ "View the Self-Managed version of this component") Splits a batch of messages into N batches, where each resulting batch contains a group of messages determined by a [function interpolated string](../../../configuration/interpolation/#bloblang-queries) evaluated per message. ```yml # Config fields, showing default values label: "" group_by_value: value: ${! meta("kafka_key") } # No default (required) ``` This allows you to group messages using arbitrary fields within their content or metadata, process them individually, and send them to unique locations as per their group. The functionality of this processor depends on being applied across messages that are batched. You can find out more about batching [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#value)`value` The interpolated string to group based on. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: value: ${! meta("kafka_key") } # --- value: ${! json("foo.bar") }-${! meta("baz") } ``` ## [](#examples)Examples If we were consuming Kafka messages and needed to group them by their key, archive the groups, and send them to S3 with the key as part of the path we could achieve that with the following: ```yaml pipeline: processors: - group_by_value: value: ${! meta("kafka_key") } - archive: format: tar - compress: algorithm: gzip output: aws_s3: bucket: TODO path: docs/${! meta("kafka_key") }/${! count("files") }-${! timestamp_unix_nano() }.tar.gz ``` --- # Page 232: group_by **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/group_by.md --- # group\_by --- title: group_by latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/group_by page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/group_by.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/group_by.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/group_by/ "View the Self-Managed version of this component") Splits a [batch of messages](../../../configuration/batching/) into N batches, where each resulting batch contains a group of messages determined by a [Bloblang query](../../../guides/bloblang/about/). ```yml # Config fields, showing default values label: "" group_by: [] # No default (required) ``` Once the groups are established a list of processors are applied to their respective grouped batch, which can be used to label the batch as per their grouping. Messages that do not pass the check of any specified group are placed in their own group. The functionality of this processor depends on being applied across messages that are batched. You can find out more about batching [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#check)`check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message belongs to a given group. **Type**: `string` ```yaml # Examples: check: this.type == "foo" # --- check: this.contents.urls.contains("https://benthos.dev/") # --- check: true ``` ### [](#processors)`processors[]` A list of [processors](../about/) to execute on the newly formed group. **Type**: `processor` **Default**: `[]` ## [](#examples)Examples ### [](#grouped-processing)Grouped Processing Imagine we have a batch of messages that we wish to split into a group of foos and everything else, which should be sent to different output destinations based on those groupings. We also need to send the foos as a tar gzip archive. For this purpose we can use the `group_by` processor with a [`switch`](../../outputs/switch/) output: ```yaml pipeline: processors: - group_by: - check: content().contains("this is a foo") processors: - archive: format: tar - compress: algorithm: gzip - mapping: 'meta grouping = "foo"' output: switch: cases: - check: meta("grouping") == "foo" output: gcp_pubsub: project: foo_prod topic: only_the_foos - output: gcp_pubsub: project: somewhere_else topic: no_foos_here ``` --- # Page 233: http **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/http.md --- # http --- title: http latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/http page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/http.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/http.adoc page-git-created-date: "2025-03-04" page-git-modified-date: "2025-03-04" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/http/ "View the Self-Managed version of this component") Performs a HTTP request using a message batch as the request body, and replaces the original message parts with the body of the response. #### Common ```yml processors: label: "" http: url: "" # No default (required) verb: POST headers: {} rate_limit: "" # No default (optional) timeout: 5s parallel: false ``` #### Advanced ```yml processors: label: "" http: url: "" # No default (required) verb: POST headers: {} metadata: include_prefixes: [] include_patterns: [] dump_request_log_level: "" oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] extract_headers: include_prefixes: [] include_patterns: [] rate_limit: "" # No default (optional) timeout: 5s retry_period: 1s max_retry_backoff: 300s retries: 3 follow_redirects: true backoff_on: - 429 drop_on: [] successful_on: [] proxy_url: "" # No default (optional) disable_http2: false batch_as_multipart: false parallel: false ``` ## [](#rate-limit-requests)Rate limit requests You can use the `rate_limit` field to specify a [rate limit resource](../../rate_limits/about/), which restricts the number of requests processed service-wide, regardless of how many components you run in parallel. ## [](#dynamic-url-and-header-settings)Dynamic URL and header settings You can set the [`url`](#url) and [`headers`](#headers) values dynamically using [function interpolations](../../../configuration/interpolation/#bloblang-queries). ## [](#map-payloads-with-the-branch-processor)Map payloads with the branch processor You can use the [`branch` processor](../branch/) to transform or encode the payload into a specific request body format, and map the response back into the original payload instead of replacing it entirely. This example uses a [`branch` processor](../branch/) to strip the request message into an empty body (`request_map: 'root = ""'`), grab an HTTP payload, and place the result back into the original message at the path `repo.status`: ```yaml pipeline: processors: - branch: request_map: 'root = ""' processors: - http: url: https://hub.docker.com/v2/repositories/jeffail/benthos verb: GET headers: Content-Type: application/json result_map: 'root.repo.status = this' ``` ## [](#response-codes)Response codes HTTP response codes in the 200-299 range indicate a successful response. You can use the [`successful_on`](#successful_on) field to add more success status codes. HTTP status codes in the 300-399 range are redirects. The [`follow_redirects` field](#follow_redirects) determines how these responses are handled. If a request returns a response code that matches an entry in: - The [`backoff_on` field](#backoff_on), the request is retried after increasing intervals. - The [`drop_on` field](#drop_on), the request is immediately treated as a failure. ## [](#add-metadata-to-errors)Add metadata to errors If a request returns an error response code, this processor sets a `http_status_code` metadata field in the resulting message. > 💡 **TIP** > > You can use the [`extract_headers`](#extract_headers) field to define rules for copying headers into messages generated from the response. ## [](#error-handling)Error handling When all retry attempts for a message are exhausted, this processor cancels the attempt. By default, the failed message continues through the pipeline unchanged unless you configure other error-handling. For example, you might want to drop failed messages or route them to a dead letter queue. For more information, see [Error Handling](../../../configuration/error_handling/). ## [](#fields)Fields ### [](#backoff_on)`backoff_on[]` A list of status codes that indicate a request failure, and trigger retries with an increasing backoff period between attempts. **Type**: `int` **Default**: ```yaml - 429 ``` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#batch_as_multipart)`batch_as_multipart` When set to `true`, sends all message in a batch as a single request using [RFC1341](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html). When set to `false`, sends messages in a batch as individual requests. **Type**: `bool` **Default**: `false` ### [](#disable_http2)`disable_http2` Whether to disable HTTP/2. By default, HTTP/2 is enabled. **Type**: `bool` **Default**: `false` ### [](#drop_on)`drop_on[]` A list of status codes that indicate a request failure, where the input should not attempt retries. This helps avoid unnecessary retries for requests that are unlikely to succeed. > 📝 **NOTE** > > In these cases, the _request_ is dropped, but the _message_ that triggered the request is retained. **Type**: `int` **Default**: `[]` ### [](#dump_request_log_level)`dump_request_log_level` EXPERIMENTAL: Set the logging level for the request and response payloads of each HTTP request. **Type**: `string` **Default**: `""` **Options**: `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR`, `FATAL`, \`\` ### [](#extract_headers)`extract_headers` Specify which response headers to add to the resulting messages as metadata. Header keys are automatically converted to lowercase before matching, so make sure that your patterns target the lowercase versions of the expected header keys. **Type**: `object` ### [](#extract_headers-include_patterns)`extract_headers.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#extract_headers-include_prefixes)`extract_headers.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#follow_redirects)`follow_redirects` Whether to follow redirects, including all responses with HTTP status codes in the 300-399 range. If set to `false`, the response message includes only the body, status, and headers from the redirect response, and this processor does not make a request to the URL specified in the `Location` header. **Type**: `bool` **Default**: `true` ### [](#headers)`headers` A map of headers to add to the request. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/octet-stream traceparent: ${! tracing_span().traceparent } ``` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication. This feature is in beta and may change in future releases. JWT tokens provide secure, stateless authentication between services. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Additional key-value pairs to include in the JWT header (optional). These headers provide extra metadata for JWT processing. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#max_retry_backoff)`max_retry_backoff` The maximum period to wait between failed requests. **Type**: `string` **Default**: `300s` ### [](#metadata)`metadata` Specify matching rules that determine which metadata keys should be added to the HTTP request as headers. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#oauth)`oauth` Configure OAuth version 1.0 authentication for secure API access. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` The value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2)`oauth2` Allows you to specify open authentication using OAuth version 2 and the client credentials token flow. **Type**: `object` ### [](#oauth2-client_key)`oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#oauth2-client_secret)`oauth2.client_secret` The secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth2-enabled)`oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#oauth2-endpoint_params)`oauth2.endpoint_params` A list of endpoint parameters specified as arrays of strings (optional). **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: bar: - woof foo: - meow - quack ``` ### [](#oauth2-scopes)`oauth2.scopes[]` A list of requested permissions (optional). **Type**: `array` **Default**: `[]` ### [](#oauth2-token_url)`oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#parallel)`parallel` When processing batched messages, this field determines whether messages in the batch are sent in parallel. If set to `false`, messages are sent serially. **Type**: `bool` **Default**: `false` ### [](#proxy_url)`proxy_url` A HTTP proxy URL (optional). **Type**: `string` ### [](#rate_limit)`rate_limit` A [rate limit](../../rate_limits/about/) to throttle requests by (optional). **Type**: `string` ### [](#retries)`retries` The maximum number of retry attempts to make. **Type**: `int` **Default**: `3` ### [](#retry_period)`retry_period` The initial period to wait between failed requests before retrying. **Type**: `string` **Default**: `1s` ### [](#successful_on)`successful_on[]` A list of HTTP status codes that should be considered as successful, even if they are not 2XX codes. This is useful for handling cases where non-2XX codes indicate that the request was processed successfully, such as `303 See Other` or `409 Conflict`. By default, all 2XX codes are considered successful unless they are specified in `backoff_on` or `drop_on` fields. **Type**: `int` **Default**: `[]` ### [](#timeout)`timeout` A static timeout to apply to requests. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL to connect to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#verb)`verb` A verb to connect with. **Type**: `string` **Default**: `POST` ```yaml # Examples: verb: POST # --- verb: GET # --- verb: DELETE ``` --- # Page 234: insert_part **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/insert_part.md --- # insert\_part --- title: insert_part latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/insert_part page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/insert_part.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/insert_part.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/insert_part/ "View the Self-Managed version of this component") Insert a new message into a batch at an index. If the specified index is greater than the length of the existing batch it will be appended to the end. ```yml # Config fields, showing default values label: "" insert_part: index: -1 content: "" ``` The index can be negative, and if so the message will be inserted from the end counting backwards starting from -1. E.g. if index = -1 then the new message will become the last of the batch, if index = -2 then the new message will be inserted before the last message, and so on. If the negative index is greater than the length of the existing batch it will be inserted at the beginning. The new message will have metadata copied from the first pre-existing message of the batch. This processor will interpolate functions within the 'content' field, you can find a list of functions [here](../../../configuration/interpolation/#bloblang-queries). ## [](#fields)Fields ### [](#content)`content` The content of the message being inserted. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ### [](#index)`index` The index within the batch to insert the message at. **Type**: `int` **Default**: `-1` --- # Page 235: jira **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/jira.md --- # jira --- title: jira latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/jira page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/jira.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/jira.adoc categories: "[Services]" description: Queries Jira resources and returns structured data. page-git-created-date: "2025-11-03" page-git-modified-date: "2025-11-03" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/jira/ "View the Self-Managed version of this component") Queries Jira resources and returns structured data. #### Common ```yaml processors: label: "" jira: username: "" # No default (required) api_token: "" # No default (required) max_results_per_page: 50 base_url: "" # No default (required) timeout: 5s ``` #### Advanced ```yaml processors: label: "" jira: username: "" # No default (required) api_token: "" # No default (required) max_results_per_page: 50 base_url: "" # No default (required) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] proxy_url: "" disable_http2: false tps_limit: 0 tps_burst: 1 backoff: initial_interval: 1s max_interval: 30s max_retries: 3 tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s http: max_idle_conns: 100 max_idle_conns_per_host: 0 max_conns_per_host: 64 idle_conn_timeout: 1m30s tls_handshake_timeout: 10s expect_continue_timeout: 1s response_header_timeout: 0s disable_keep_alives: false disable_compression: false max_response_header_bytes: 1048576 max_response_body_bytes: 10485760 write_buffer_size: 4096 read_buffer_size: 4096 h2: strict_max_concurrent_requests: false max_decoder_header_table_size: 4096 max_encoder_header_table_size: 4096 max_read_frame_size: 16384 max_receive_buffer_per_connection: 1048576 max_receive_buffer_per_stream: 1048576 send_ping_timeout: 0s ping_timeout: 15s write_byte_timeout: 0s access_log_level: "" access_log_body_limit: 0 ``` Executes Jira API queries based on input messages and returns structured results. The processor handles pagination, retries, and field expansion automatically. Supports querying the following Jira resources: - Issues (JQL queries) - Issue transitions - Users - Roles - Project versions - Project categories - Project types - Projects The processor authenticates using basic authentication with username and API token. Input messages should contain valid Jira queries in JSON format. ## [](#fields)Fields ### [](#access_log_body_limit)`access_log_body_limit` Maximum bytes of request/response body to include in logs. 0 to skip body logging. **Type**: `int` **Default**: `0` ### [](#access_log_level)`access_log_level` Log level for HTTP request/response logging. Empty disables logging. **Type**: `string` **Default**: `""` **Options**: `` `, `TRACE ``, `DEBUG`, `INFO`, `WARN`, `ERROR` ### [](#api_token)`api_token` The Jira API token for the specified account. You can generate an API token from your [Atlassian account settings](https://id.atlassian.com/manage-profile/security/api-tokens). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#backoff)`backoff` Adaptive backoff configuration for 429 (Too Many Requests) responses. Always active. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` Initial interval between retries on 429 responses. **Type**: `string` **Default**: `1s` ### [](#backoff-max_interval)`backoff.max_interval` Maximum interval between retries on 429 responses. **Type**: `string` **Default**: `30s` ### [](#backoff-max_retries)`backoff.max_retries` Maximum number of retries on 429 responses. **Type**: `int` **Default**: `3` ### [](#base_url)`base_url` The base URL of the Jira instance (for example, `[https://your-domain.atlassian.net](https://your-domain.atlassian.net)`). **Type**: `string` ### [](#disable_http2)`disable_http2` Disable HTTP/2 and force HTTP/1.1. **Type**: `bool` **Default**: `false` ### [](#http)`http` HTTP transport settings controlling connection pooling, timeouts, and HTTP/2. **Type**: `object` ### [](#http-disable_compression)`http.disable_compression` Disable automatic decompression of gzip responses. **Type**: `bool` **Default**: `false` ### [](#http-disable_keep_alives)`http.disable_keep_alives` Disable HTTP keep-alive connections; each request uses a new connection. **Type**: `bool` **Default**: `false` ### [](#http-expect_continue_timeout)`http.expect_continue_timeout` Maximum time to wait for a server’s 100-continue response before sending the body. 0 means the body is sent immediately. **Type**: `string` **Default**: `1s` ### [](#http-h2)`http.h2` HTTP/2-specific transport settings. Only applied when HTTP/2 is enabled. **Type**: `object` ### [](#http-h2-max_decoder_header_table_size)`http.h2.max_decoder_header_table_size` Upper limit in bytes for the HPACK header table used to decode headers from the peer. Must be less than 4 MiB. **Type**: `int` **Default**: `4096` ### [](#http-h2-max_encoder_header_table_size)`http.h2.max_encoder_header_table_size` Upper limit in bytes for the HPACK header table used to encode headers sent to the peer. Must be less than 4 MiB. **Type**: `int` **Default**: `4096` ### [](#http-h2-max_read_frame_size)`http.h2.max_read_frame_size` Largest HTTP/2 frame this endpoint will read. Valid range: 16 KiB to 16 MiB. **Type**: `int` **Default**: `16384` ### [](#http-h2-max_receive_buffer_per_connection)`http.h2.max_receive_buffer_per_connection` Maximum flow-control window size in bytes for data received on a connection. Must be at least 64 KiB and less than 4 MiB. **Type**: `int` **Default**: `1048576` ### [](#http-h2-max_receive_buffer_per_stream)`http.h2.max_receive_buffer_per_stream` Maximum flow-control window size in bytes for data received on a single stream. Must be less than 4 MiB. **Type**: `int` **Default**: `1048576` ### [](#http-h2-ping_timeout)`http.h2.ping_timeout` Timeout waiting for a PING response before closing the connection. **Type**: `string` **Default**: `15s` ### [](#http-h2-send_ping_timeout)`http.h2.send_ping_timeout` Idle timeout after which a PING frame is sent to verify connection health. 0 disables health checks. **Type**: `string` **Default**: `0s` ### [](#http-h2-strict_max_concurrent_requests)`http.h2.strict_max_concurrent_requests` When true, new requests block when a connection’s concurrency limit is reached instead of opening a new connection. **Type**: `bool` **Default**: `false` ### [](#http-h2-write_byte_timeout)`http.h2.write_byte_timeout` Timeout for writing data to a connection. The timer resets whenever bytes are written. 0 disables the timeout. **Type**: `string` **Default**: `0s` ### [](#http-idle_conn_timeout)`http.idle_conn_timeout` How long an idle connection remains in the pool before being closed. 0 disables the timeout. **Type**: `string` **Default**: `1m30s` ### [](#http-max_conns_per_host)`http.max_conns_per_host` Maximum total connections (active + idle) per host. 0 means unlimited. **Type**: `int` **Default**: `64` ### [](#http-max_idle_conns)`http.max_idle_conns` Maximum total number of idle (keep-alive) connections across all hosts. 0 means unlimited. **Type**: `int` **Default**: `100` ### [](#http-max_idle_conns_per_host)`http.max_idle_conns_per_host` Maximum idle connections to keep per host. 0 (the default) uses GOMAXPROCS+1. **Type**: `int` **Default**: `0` ### [](#http-max_response_body_bytes)`http.max_response_body_bytes` Maximum bytes of response body the client will read. The response body is wrapped with a limit reader; reads beyond this cap return EOF. 0 disables the limit. **Type**: `int` **Default**: `10485760` ### [](#http-max_response_header_bytes)`http.max_response_header_bytes` Maximum bytes of response headers to allow. **Type**: `int` **Default**: `1048576` ### [](#http-read_buffer_size)`http.read_buffer_size` Size in bytes of the per-connection read buffer. **Type**: `int` **Default**: `4096` ### [](#http-response_header_timeout)`http.response_header_timeout` Maximum time to wait for response headers after writing the full request. 0 disables the timeout. **Type**: `string` **Default**: `0s` ### [](#http-tls_handshake_timeout)`http.tls_handshake_timeout` Maximum time to wait for a TLS handshake to complete. 0 disables the timeout. **Type**: `string` **Default**: `10s` ### [](#http-write_buffer_size)`http.write_buffer_size` Size in bytes of the per-connection write buffer. **Type**: `int` **Default**: `4096` ### [](#max_results_per_page)`max_results_per_page` The maximum number of results to return per page when calling the Jira API. [Pagination](https://docs.atlassian.com/software/jira/docs/api/REST/9.17.0/#pagination) in the Jira API is zero-based, so the first page starts at `0`. **Type**: `int` **Default**: `50` ### [](#proxy_url)`proxy_url` HTTP proxy URL. Empty string disables proxying. **Type**: `string` **Default**: `""` ### [](#tcp)`tcp` TCP socket configuration. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` HTTP request timeout. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tps_burst)`tps_burst` Maximum burst size for rate limiting. **Type**: `int` **Default**: `1` ### [](#tps_limit)`tps_limit` Rate limit in requests per second. 0 disables rate limiting. **Type**: `float` **Default**: `0` ### [](#username)`username` The username or email address of the Jira account. **Type**: `string` ## [](#examples)Examples ### [](#minimal-configuration)Minimal configuration Basic Jira processor setup with required fields only ```yaml pipeline: processors: - jira: base_url: "https://your-domain.atlassian.net" username: "${JIRA_USERNAME}" api_token: "${JIRA_API_TOKEN}" ``` ### [](#full-configuration-with-tuning)Full configuration with tuning Complete configuration with pagination and timeout settings ```yaml pipeline: processors: - jira: base_url: "https://your-domain.atlassian.net" username: "${JIRA_USERNAME}" api_token: "${JIRA_API_TOKEN}" max_results_per_page: 200 timeout: "30s" ``` --- # Page 236: jmespath **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/jmespath.md --- # jmespath --- title: jmespath latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/jmespath page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/jmespath.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/jmespath.adoc categories: "[\"Mapping\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/jmespath/ "View the Self-Managed version of this component") Executes a [JMESPath query](http://jmespath.org/) on JSON documents and replaces the message with the resulting document. ```yml # Config fields, showing default values label: "" jmespath: query: "" # No default (required) ``` > 💡 **TIP: Try out Bloblang** > > Try out Bloblang > > For better performance and improved capabilities try native Redpanda Connect mapping with the [`mapping` processor](../mapping/). ## [](#fields)Fields ### [](#query)`query` The JMESPath query to apply to messages. **Type**: `string` nclude::redpanda-connect:components:partial$examples/processors/jmespath.adoc\[\] --- # Page 237: jq **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/jq.md --- # jq --- title: jq latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/jq page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/jq.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/jq.adoc categories: "[\"Mapping\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/jq/ "View the Self-Managed version of this component") Transforms and filters messages using jq queries. #### Common ```yml processors: label: "" jq: query: "" # No default (required) ``` #### Advanced ```yml processors: label: "" jq: query: "" # No default (required) raw: false output_raw: false ``` > 💡 **TIP: Try out Bloblang** > > Try out Bloblang > > For better performance and improved capabilities try out native Redpanda Connect mapping with the [`mapping` processor](../mapping/). The provided query is executed on each message, targeting either the contents as a structured JSON value or as a raw string using the field `raw`, and the message is replaced with the query result. Message metadata is also accessible within the query from the variable `$metadata`. This processor uses the [gojq library](https://github.com/itchyny/gojq), and therefore does not require jq to be installed as a dependency. However, this also means there are some [differences in how these queries are executed](https://github.com/itchyny/gojq#difference-to-jq) versus the jq cli. If the query does not emit any value then the message is filtered, if the query returns multiple values then the resulting message will be an array containing all values. The full query syntax is described in [jq’s documentation](https://stedolan.github.io/jq/manual/). ## [](#error-handling)Error handling Queries can fail, in which case the message remains unchanged, errors are logged, and the message is flagged as having failed, allowing you to use [standard processor error handling patterns](../../../configuration/error_handling/). ## [](#fields)Fields ### [](#output_raw)`output_raw` Whether to output raw text (unquoted) instead of JSON strings when the emitted values are string types. **Type**: `bool` **Default**: `false` ### [](#query)`query` The jq query to filter and transform messages with. **Type**: `string` ### [](#raw)`raw` Whether to process the input as a raw string instead of as JSON. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#mapping)Mapping When receiving JSON documents of the form: ```json { "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ] } ``` We could collapse the location names from the state of Washington into a field `Cities`: ```json {"Cities": "Bellevue, Olympia, Seattle"} ``` With the following config: ```yaml pipeline: processors: - jq: query: '{Cities: .locations | map(select(.state == "WA").name) | sort | join(", ") }' ``` --- # Page 238: json_schema **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/json_schema.md --- # json\_schema --- title: json_schema latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/json_schema page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/json_schema.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/json_schema.adoc categories: "[\"Mapping\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/json_schema/ "View the Self-Managed version of this component") Checks messages against a provided JSONSchema definition but does not change the payload under any circumstances. If a message does not match the schema it can be caught using [error handling methods](../../../configuration/error_handling/). ```yml # Config fields, showing default values label: "" json_schema: schema: "" # No default (optional) schema_path: "" # No default (optional) ``` Please refer to the [JSON Schema website](https://json-schema.org/) for information and tutorials regarding the syntax of the schema. ## [](#fields)Fields ### [](#schema)`schema` A schema to apply. Use either this or the `schema_path` field. **Type**: `string` ### [](#schema_path)`schema_path` The path of a schema document to apply. Use either this or the `schema` field. **Type**: `string` ## [](#examples)Examples With the following JSONSchema document: ```json { "$id": "https://example.com/person.schema.json", "$schema": "http://json-schema.org/draft-07/schema#", "title": "Person", "type": "object", "properties": { "firstName": { "type": "string", "description": "The person's first name." }, "lastName": { "type": "string", "description": "The person's last name." }, "age": { "description": "Age in years which must be equal to or greater than zero.", "type": "integer", "minimum": 0 } } } ``` And the following Redpanda Connect configuration: ```yaml pipeline: processors: - json_schema: schema_path: "file://path_to_schema.json" - catch: - log: level: ERROR message: "Schema validation failed due to: ${!error()}" - mapping: 'root = deleted()' # Drop messages that fail ``` If a payload being processed looked like: ```json {"firstName":"John","lastName":"Doe","age":-21} ``` Then a log message would appear explaining the fault and the payload would be dropped. --- # Page 239: log **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/log.md --- # log --- title: log latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/log page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/log.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/log.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/log/ "View the Self-Managed version of this component") Prints a log event for each message. Messages always remain unchanged. The log message can be set using function interpolations described in [Bloblang queries](../../../configuration/interpolation/#bloblang-queries) which allows you to log the contents and metadata of messages. ```yml # Config fields, showing default values label: "" log: level: INFO fields_mapping: |- # No default (optional) root.reason = "cus I wana" root.id = this.id root.age = this.user.age.number() root.kafka_topic = meta("kafka_topic") message: "" ``` The `level` field determines the log level of the printed events and can be any of the following values: TRACE, DEBUG, INFO, WARN, ERROR. ## [](#structured-fields)Structured fields It’s also possible add custom fields to logs when the format is set to a structured form such as `json` or `logfmt` with the config field [`fields_mapping`](#fields_mapping): ```yaml pipeline: processors: - log: level: DEBUG message: hello world fields_mapping: | root.reason = "cus I wana" root.id = this.id root.age = this.user.age root.kafka_topic = meta("kafka_topic") ``` ## [](#fields)Fields ### [](#fields_mapping)`fields_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) that can be used to specify extra fields to add to the log. If log fields are also added with the `fields` field then those values will override matching keys from this mapping. **Type**: `string` ```yaml # Examples: fields_mapping: |- root.reason = "cus I wana" root.id = this.id root.age = this.user.age.number() root.kafka_topic = meta("kafka_topic") ``` ### [](#level)`level` The log level to use. **Type**: `string` **Default**: `INFO` **Options**: `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE` ### [](#message)`message` The message to print. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` --- # Page 240: mapping **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/mapping.md --- # mapping --- title: mapping latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/mapping page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/mapping.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/mapping.adoc categories: "[\"Mapping\",\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/mapping/ "View the Self-Managed version of this component") Executes a [Bloblang](../../../guides/bloblang/about/) mapping on messages, creating a new document that replaces (or filters) the original message. ```yml # Config fields, showing default values label: "" mapping: "" # No default (required) ``` Bloblang is a powerful language that enables a wide range of mapping, transformation and filtering tasks. For more information, see [Bloblang](../../../guides/bloblang/about/). If your mapping is large and you’d prefer for it to live in a separate file then you can execute a mapping directly from a file with the expression `from ""`, where the path must be absolute, or relative from the location that Redpanda Connect is executed from. Note: This processor is equivalent to the [Bloblang](../bloblang/#component-rename) one. The latter will be deprecated in a future release. ## [](#input-document-immutability)Input document immutability Mapping operates by creating an entirely new object during assignments, this has the advantage of treating the original referenced document as immutable and therefore queryable at any stage of your mapping. For example, with the following mapping: ```bloblang root.id = this.id root.invitees = this.invitees.filter(i -> i.mood >= 0.5) root.rejected = this.invitees.filter(i -> i.mood < 0.5) # In: {"id":"party-2024","invitees":[{"name":"Alice","mood":0.8},{"name":"Bob","mood":0.3},{"name":"Carol","mood":0.9}]} ``` Notice that we mutate the value of `invitees` in the resulting document by filtering out objects with a lower mood. However, even after doing so we’re still able to reference the unchanged original contents of this value from the input document in order to populate a second field. Within this mapping we also have the flexibility to reference the mutable mapped document by using the keyword `root` (i.e. `root.invitees`) on the right-hand side instead. Mapping documents is advantageous in situations where the result is a document with a dramatically different shape to the input document, since we are effectively rebuilding the document in its entirety and might as well keep a reference to the unchanged input document throughout. However, in situations where we are only performing minor alterations to the input document, the rest of which is unchanged, it might be more efficient to use the [`mutation` processor](../mutation/) instead. ## [](#error-handling)Error handling Bloblang mappings can fail, in which case the message remains unchanged, errors are logged, and the message is flagged as having failed, allowing you to use [standard processor error handling patterns](../../../configuration/error_handling/). However, Bloblang itself also provides powerful ways of ensuring your mappings do not fail by specifying desired [fallback behavior](../../../guides/bloblang/about/#error-handling). ## [](#examples)Examples ### [](#mapping)Mapping Given JSON documents containing an array of fans: ```json { "id":"foo", "description":"a show about foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"grace","obsession":0.21}, {"name":"ali","obsession":0.89}, {"name":"vic","obsession":0.43} ] } ``` We can reduce the documents down to just the ID and only those fans with an obsession score above 0.5, giving us: ```json { "id":"foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"ali","obsession":0.89} ] } ``` With the following config: ```yaml pipeline: processors: - mapping: | root.id = this.id root.fans = this.fans.filter(fan -> fan.obsession > 0.5) ``` ### [](#more-mapping)More Mapping When receiving JSON documents of the form: ```json { "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ] } ``` We could collapse the location names from the state of Washington into a field `Cities`: ```json {"Cities": "Bellevue, Olympia, Seattle"} ``` With the following config: ```yaml pipeline: processors: - mapping: | root.Cities = this.locations. filter(loc -> loc.state == "WA"). map_each(loc -> loc.name). sort().join(", ") ``` --- # Page 241: metric **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/metric.md --- # metric --- title: metric latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/metric page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/metric.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/metric.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/metric/ "View the Self-Managed version of this component") Emit custom metrics by extracting values from messages. ```yml # Config fields, showing default values label: "" metric: type: "" # No default (required) name: "" # No default (required) labels: {} # No default (optional) value: "" ``` This processor works by evaluating an [interpolated field `value`](../../../configuration/interpolation/#bloblang-queries) for each message and updating a emitted metric according to the [type](#types). Custom metrics such as these are emitted along with Redpanda Connect internal metrics, where you can customize where metrics are sent, which metric names are emitted and rename them as/when appropriate. For more information see the [metrics docs](../../metrics/about/). ## [](#fields)Fields ### [](#labels)`labels` A map of label names and values that can be used to enrich metrics. Labels are not supported by some metric destinations, in which case the metrics series are combined. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: labels: topic: ${! meta("kafka_topic") } type: ${! json("doc.type") } ``` ### [](#name)`name` The name of the metric to create, this must be unique across all Redpanda Connect components otherwise it will overwrite those other metrics. **Type**: `string` ### [](#type)`type` The metric [type](#types) to create. **Type**: `string` **Options**: `counter`, `counter_by`, `gauge`, `timing` ### [](#value)`value` For some metric types specifies a value to set, increment. Certain metrics exporters such as Prometheus support floating point values, but those that do not will cast a floating point value into an integer. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `""` ## [](#examples)Examples ### [](#counter)Counter In this example we emit a counter metric called `Foos`, which increments for every message processed, and we label the metric with some metadata about where the message came from and a field from the document that states what type it is. We also configure our metrics to emit to CloudWatch, and explicitly only allow our custom metric and some internal Redpanda Connect metrics to emit. ```yaml pipeline: processors: - metric: name: Foos type: counter labels: topic: ${! meta("kafka_topic") } partition: ${! meta("kafka_partition") } type: ${! json("document.type").or("unknown") } metrics: mapping: | root = if ![ "Foos", "input_received", "output_sent" ].contains(this) { deleted() } aws_cloudwatch: namespace: ProdConsumer ``` ### [](#gauge)Gauge In this example we emit a gauge metric called `FooSize`, which is given a value extracted from JSON messages at the path `foo.size`. We then also configure our Prometheus metric exporter to only emit this custom metric and nothing else. We also label the metric with some metadata. ```yaml pipeline: processors: - metric: name: FooSize type: gauge labels: topic: ${! meta("kafka_topic") } value: ${! json("foo.size") } metrics: mapping: 'if this != "FooSize" { deleted() }' prometheus: {} ``` ## [](#types)Types ### [](#counter-2)`counter` Increments a counter by exactly 1, the contents of `value` are ignored by this type. ### [](#counter_by)`counter_by` If the contents of `value` can be parsed as a positive integer value then the counter is incremented by this value. For example, the following configuration will increment the value of the `count.custom.field` metric by the contents of `field.some.value`: ```yaml pipeline: processors: - metric: type: counter_by name: CountCustomField value: ${!json("field.some.value")} ``` ### [](#gauge-2)`gauge` If the contents of `value` can be parsed as a positive integer value then the gauge is set to this value. For example, the following configuration will set the value of the `gauge.custom.field` metric to the contents of `field.some.value`: ```yaml pipeline: processors: - metric: type: gauge name: GaugeCustomField value: ${!json("field.some.value")} ``` ### [](#timing)`timing` Equivalent to `gauge` where instead the metric is a timing. It is recommended that timing values are recorded in nanoseconds in order to be consistent with standard Redpanda Connect timing metrics, as in some cases these values are automatically converted into other units such as when exporting timings as histograms with Prometheus metrics. --- # Page 242: mongodb **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/mongodb.md --- # mongodb --- title: mongodb latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/mongodb page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/mongodb.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/mongodb.adoc categories: "[\"Services\"]" page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/mongodb/)[Cache](/redpanda-cloud/develop/connect/components/caches/mongodb/)[Input](/redpanda-cloud/develop/connect/components/inputs/mongodb/)[Output](/redpanda-cloud/develop/connect/components/outputs/mongodb/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/mongodb/ "View the Self-Managed version of this component") Performs operations against MongoDB for each message, allowing you to store or retrieve data within message payloads. #### Common ```yml processors: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" collection: "" # No default (required) operation: insert-one write_concern: w: majority j: false w_timeout: "" document_map: "" filter_map: "" hint_map: "" upsert: false ``` #### Advanced ```yml processors: label: "" mongodb: url: "" # No default (required) database: "" # No default (required) username: "" password: "" app_name: benthos collection: "" # No default (required) operation: insert-one write_concern: w: majority j: false w_timeout: "" document_map: "" filter_map: "" hint_map: "" upsert: false json_marshal_mode: canonical ``` ## [](#fields)Fields ### [](#app_name)`app_name` The client application name. **Type**: `string` **Default**: `benthos` ### [](#collection)`collection` The name of the target collection. **Type**: `string` ### [](#database)`database` The name of the target MongoDB database. **Type**: `string` ### [](#document_map)`document_map` A Bloblang map that represents a document to store in MongoDB, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). The `document_map` parameter is required for the following database operations: `insert-one`, `replace-one`, `update-one`, and `aggregate`. **Type**: `string` **Default**: `""` ```yaml # Examples: document_map: |- root.a = this.foo root.b = this.bar ``` ### [](#filter_map)`filter_map` A Bloblang map that represents a filter for a MongoDB command, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). The `filter_map` parameter is required for all database operations except `insert-one`. This output uses `filter_map` to find documents for the specified operation. For example, for a `delete-one` operation, the filter map should include the fields required to locate the document for deletion. **Type**: `string` **Default**: `""` ```yaml # Examples: filter_map: |- root.a = this.foo root.b = this.bar ``` ### [](#hint_map)`hint_map` A Bloblang map that represents a hint or index for a MongoDB command to use, expressed as [extended JSON in canonical form](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). This map is optional, and is used with all operations except `insert-one`. Define a `hint_map` to improve performance when finding documents in the MongoDB database. **Type**: `string` **Default**: `""` ```yaml # Examples: hint_map: |- root.a = this.foo root.b = this.bar ``` ### [](#json_marshal_mode)`json_marshal_mode` Controls the format of the output message (optional). **Type**: `string` **Default**: `canonical` | Option | Summary | | --- | --- | | canonical | A string format that emphasizes type preservation at the expense of readability and interoperability. That is, conversion from canonical to BSON will generally preserve type information except in certain specific cases. | | relaxed | A string format that emphasizes readability and interoperability at the expense of type preservation. That is, conversion from relaxed format to BSON can lose type information. | ### [](#operation)`operation` The MongoDB database operation to perform. **Type**: `string` **Default**: `insert-one` **Options**: `insert-one`, `delete-one`, `delete-many`, `replace-one`, `update-one`, `find-one`, `aggregate` ### [](#password)`password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#upsert)`upsert` The `upsert` parameter is optional, and only applies for `update-one` and `replace-one` operations. If the filter specified in `filter_map` matches an existing document, this operation updates or replaces the document, otherwise a new document is created. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target MongoDB server. **Type**: `string` ```yaml # Examples: url: mongodb://localhost:27017 ``` ### [](#username)`username` The username required to connect to the database. **Type**: `string` **Default**: `""` ### [](#write_concern)`write_concern` The [write concern settings](https://www.mongodb.com/docs/manual/reference/write-concern/) for the MongoDB connection. **Type**: `object` ### [](#write_concern-j)`write_concern.j` The `j` requests acknowledgement from MongoDB, which is created when write operations are written to the journal. **Type**: `bool` **Default**: `false` ### [](#write_concern-w)`write_concern.w` The `w` requests acknowledgement, which write operations propagate to the specified number of MongoDB instances. **Type**: `string` **Default**: `majority` ### [](#write_concern-w_timeout)`write_concern.w_timeout` The write concern timeout. **Type**: `string` **Default**: `""` --- # Page 243: mutation **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/mutation.md --- # mutation --- title: mutation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/mutation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/mutation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/mutation.adoc categories: "[\"Mapping\",\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/mutation/ "View the Self-Managed version of this component") Executes a [Bloblang](../../../guides/bloblang/about/) mapping and directly transforms the contents of messages, mutating (or deleting) them. ```yml # Config fields, showing default values label: "" mutation: "" # No default (required) ``` Bloblang is a powerful language that enables a wide range of mapping, transformation and filtering tasks. For more information, see [Bloblang](../../../guides/bloblang/about/). If your mapping is large and you’d prefer for it to live in a separate file then you can execute a mapping directly from a file with the expression `from ""`, where the path must be absolute, or relative from the location that Redpanda Connect is executed from. ## [](#input-document-mutability)Input document mutability A mutation is a mapping that transforms input documents directly, this has the advantage of reducing the need to copy the data fed into the mapping. However, this also means that the referenced document is mutable and therefore changes throughout the mapping. For example, with the following Bloblang: ```bloblang root.rejected = this.invitees.filter(i -> i.mood < 0.5) root.invitees = this.invitees.filter(i -> i.mood >= 0.5) # In: {"invitees":[{"name":"Alice","mood":0.8},{"name":"Bob","mood":0.3},{"name":"Carol","mood":0.9}]} ``` Notice that we create a field `rejected` by copying the array field `invitees` and filtering out objects with a high mood. We then overwrite the field `invitees` by filtering out objects with a low mood, resulting in two array fields that are each a subset of the original. If we were to reverse the ordering of these assignments like so: ```bloblang root.invitees = this.invitees.filter(i -> i.mood >= 0.5) root.rejected = this.invitees.filter(i -> i.mood < 0.5) # In: {"invitees":[{"name":"Alice","mood":0.8},{"name":"Bob","mood":0.3},{"name":"Carol","mood":0.9}]} ``` Then the new field `rejected` would be empty as we have already mutated `invitees` to exclude the objects that it would be populated by. We can solve this problem either by carefully ordering our assignments or by capturing the original array using a variable (`let invitees = this.invitees`). Mutations are advantageous over a standard mapping in situations where the result is a document with mostly the same shape as the input document, since we can avoid unnecessarily copying data from the referenced input document. However, in situations where we are creating an entirely new document shape it can be more convenient to use the traditional [`mapping` processor](../mapping/) instead. ## [](#error-handling)Error handling Bloblang mappings can fail, in which case the error is logged and the message is flagged as having failed, allowing you to use [standard processor error handling patterns](../../../configuration/error_handling/). However, Bloblang itself also provides powerful ways of ensuring your mappings do not fail by specifying desired [fallback behavior](../../../guides/bloblang/about/#error-handling). ## [](#examples)Examples ### [](#mapping)Mapping Given JSON documents containing an array of fans: ```json { "id":"foo", "description":"a show about foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"grace","obsession":0.21}, {"name":"ali","obsession":0.89}, {"name":"vic","obsession":0.43} ] } ``` We can reduce the documents down to just the ID and only those fans with an obsession score above 0.5, giving us: ```json { "id":"foo", "fans":[ {"name":"bev","obsession":0.57}, {"name":"ali","obsession":0.89} ] } ``` With the following config: ```yaml pipeline: processors: - mutation: | root.description = deleted() root.fans = this.fans.filter(fan -> fan.obsession > 0.5) ``` ### [](#more-mapping)More Mapping When receiving JSON documents of the form: ```json { "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ] } ``` We could collapse the location names from the state of Washington into a field `Cities`: ```json {"Cities": "Bellevue, Olympia, Seattle"} ``` With the following config: ```yaml pipeline: processors: - mutation: | root.Cities = this.locations. filter(loc -> loc.state == "WA"). map_each(loc -> loc.name). sort().join(", ") ``` --- # Page 244: nats_kv **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/nats_kv.md --- # nats\_kv --- title: nats_kv latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/nats_kv page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/nats_kv.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/nats_kv.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/nats_kv/)[Cache](/redpanda-cloud/develop/connect/components/caches/nats_kv/)[Input](/redpanda-cloud/develop/connect/components/inputs/nats_kv/)[Output](/redpanda-cloud/develop/connect/components/outputs/nats_kv/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/nats_kv/ "View the Self-Managed version of this component") Perform operations on a NATS key-value bucket. #### Common ```yml processors: label: "" nats_kv: urls: [] # No default (required) bucket: "" # No default (required) operation: "" # No default (required) key: "" # No default (required) ``` #### Advanced ```yml processors: label: "" nats_kv: urls: [] # No default (required) max_reconnects: "" # No default (optional) bucket: "" # No default (required) operation: "" # No default (required) key: "" # No default (required) revision: "" # No default (optional) timeout: 5s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) ``` ## [](#kv-operations)KV operations The NATS KV processor supports many KV operations using the [`operation`](#operation) field. Along with `get`, `put`, and `delete`, this processor supports atomic operations like `update` and `create`, as well as utility operations like `purge`, `history`, and `keys`. ## [](#metadata)Metadata This processor adds the following metadata fields to each message, depending on the chosen `operation`: ### [](#get-get_revision)get, get\_revision ```text - nats_kv_key - nats_kv_bucket - nats_kv_revision - nats_kv_delta - nats_kv_operation - nats_kv_created ``` ### [](#create-update-delete-purge)create, update, delete, purge ```text - nats_kv_key - nats_kv_bucket - nats_kv_revision - nats_kv_operation ``` ### [](#keys)keys ```text - nats_kv_bucket ``` ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#bucket)`bucket` The name of the KV bucket. **Type**: `string` ```yaml # Examples: bucket: my_kv_bucket ``` ### [](#key)`key` The key for each message. Supports [wildcards](https://docs.nats.io/nats-concepts/subjects#wildcards) for the `history` and `keys` operations. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: key: foo # --- key: foo.bar.baz # --- key: foo.* # --- key: foo.> # --- key: foo.${! json("meta.type") } ``` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#operation)`operation` The operation to perform on the KV bucket. **Type**: `string` | Option | Summary | | --- | --- | | create | Adds the key/value pair if it does not exist. Returns an error if it already exists. | | delete | Deletes the key/value pair, but keeps historical values. | | get | Returns the latest value for key. | | get_revision | Returns the value of key for the specified revision. | | history | Returns historical values of key as an array of objects containing the following fields: key, value, bucket, revision, delta, operation, created. | | keys | Returns the keys in the bucket which match the keys_filter as an array of strings. | | purge | Deletes the key/value pair and all historical values. | | put | Places a new value for the key into the store. | | update | Updates the value for key only if the revision matches the latest revision. | ### [](#revision)`revision` The revision of the key to operate on. Used for `get_revision` and `update` operations. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: revision: 42 # --- revision: ${! @nats_kv_revision } ``` ### [](#timeout)`timeout` The maximum period to wait on an operation before aborting and returning an error. **Type**: `string` **Default**: `5s` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 245: nats_request_reply **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/nats_request_reply.md --- # nats\_request\_reply --- title: nats_request_reply latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/nats_request_reply page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/nats_request_reply.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/nats_request_reply.adoc categories: "[\"Services\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/nats_request_reply/ "View the Self-Managed version of this component") Sends a message to a NATS subject and expects a reply back from a NATS subscriber acting as a responder. #### Common ```yml processors: label: "" nats_request_reply: urls: [] # No default (required) subject: "" # No default (required) headers: {} metadata: include_prefixes: [] include_patterns: [] timeout: 3s ``` #### Advanced ```yml processors: label: "" nats_request_reply: urls: [] # No default (required) max_reconnects: "" # No default (optional) subject: "" # No default (required) inbox_prefix: "" # No default (optional) headers: {} metadata: include_prefixes: [] include_patterns: [] timeout: 3s tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] tls_handshake_first: false auth: nkey_file: "" # No default (optional) nkey: "" # No default (optional) user_credentials_file: "" # No default (optional) user_jwt: "" # No default (optional) user_nkey_seed: "" # No default (optional) user: "" # No default (optional) password: "" # No default (optional) token: "" # No default (optional) ``` ## [](#metadata)Metadata This input adds the following metadata fields to each message: ```text - nats_subject - nats_sequence_stream - nats_sequence_consumer - nats_num_delivered - nats_num_pending - nats_domain - nats_timestamp_unix_nano ``` You can access these metadata fields using [function interpolation](../../../configuration/interpolation/#bloblang-queries). ## [](#connection-name)Connection name When monitoring and managing a production [NATS system](https://docs.nats.io/nats-concepts/overview), it is often useful to know which connection a message was sent or received from. To achieve this, set the connection name option when creating a NATS connection. Redpanda Connect can then automatically set the connection name to the NATS component label, so that monitoring tools between NATS and Redpanda Connect can stay in sync. ## [](#authentication)Authentication A number of Redpanda Connect components use NATS services. Each of these components support optional, advanced authentication parameters for [NKeys](https://docs.nats.io/nats-server/configuration/securing_nats/auth_intro/nkey_auth) and [user credentials](https://docs.nats.io/using-nats/developer/connecting/creds). For an in-depth guide, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt). ### [](#nkeys)NKeys NATS server can use NKeys in several ways for authentication. The simplest approach is to configure the server with a list of user’s public keys. The server can then generate a challenge for each connection request from a client, and the client must respond to the challenge by signing it with its private NKey, configured in the `nkey_file` or `nkey` field. For more details, see the [NATS documentation](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth). ### [](#user-credentials)User credentials NATS server also supports decentralized authentication based on JSON Web Tokens (JWTs). When a server is configured to use this authentication scheme, clients need a [user JWT](https://docs.nats.io/nats-server/configuration/securing_nats/jwt#json-web-tokens) and a corresponding [NKey secret](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth) to connect. You can use either of the following methods to supply the user JWT and NKey secret: - In the `user_credentials_file` field, enter the path to a file containing both the private key and the JWT. You can generate the file using the [nsc tool](https://docs.nats.io/nats-tools/nsc). - In the `user_jwt` field, enter a plain text JWT, and in the `user_nkey_seed` field, enter the plain text NKey seed or private key. For more details about authentication using JWTs, see the [NATS documentation](https://docs.nats.io/using-nats/developer/connecting/creds). ## [](#fields)Fields ### [](#auth)`auth` Optional configuration of NATS authentication parameters. **Type**: `object` ### [](#auth-nkey)`auth.nkey` Your NKey seed or private key for NATS authentication. NKeys provide secure, cryptographic authentication without passwords. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ```yaml # Examples: nkey: UDXU4RCSJNZOIQHZNWXHXORDPRTGNJAHAHFRGZNEEJCPQTT2M7NLCNF4 ``` ### [](#auth-nkey_file)`auth.nkey_file` An optional file containing a NKey seed. **Type**: `string` ```yaml # Examples: nkey_file: ./seed.nk ``` ### [](#auth-password)`auth.password` An optional plain text password (given along with the corresponding user name). > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-token)`auth.token` An optional plain text token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user)`auth.user` An optional plain text user name (given along with the corresponding user password). **Type**: `string` ### [](#auth-user_credentials_file)`auth.user_credentials_file` An optional file containing user credentials which consist of a user JWT and corresponding NKey seed. **Type**: `string` ```yaml # Examples: user_credentials_file: ./user.creds ``` ### [](#auth-user_jwt)`auth.user_jwt` An optional plaintext user JWT to use along with the corresponding user NKey seed. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#auth-user_nkey_seed)`auth.user_nkey_seed` An optional plaintext user NKey seed to use along with the corresponding user JWT. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#headers)`headers` Explicit message headers to add to messages. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` **Default**: `{}` ```yaml # Examples: headers: Content-Type: application/json Timestamp: ${!meta("Timestamp")} ``` ### [](#inbox_prefix)`inbox_prefix` Set an explicit inbox prefix for the response subject **Type**: `string` ```yaml # Examples: inbox_prefix: _INBOX_joe ``` ### [](#max_reconnects)`max_reconnects` The maximum number of times to attempt to reconnect to the server. If negative, it will never stop trying to reconnect. **Type**: `int` ### [](#metadata-2)`metadata` Determine which (if any) metadata values should be added to messages as headers. **Type**: `object` ### [](#metadata-include_patterns)`metadata.include_patterns[]` Provide a list of explicit metadata key regular expression (re2) patterns to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_patterns: - .* # --- include_patterns: - _timestamp_unix$ ``` ### [](#metadata-include_prefixes)`metadata.include_prefixes[]` Provide a list of explicit metadata key prefixes to match against. **Type**: `array` **Default**: `[]` ```yaml # Examples: include_prefixes: - foo_ - bar_ # --- include_prefixes: - kafka_ # --- include_prefixes: - content- ``` ### [](#subject)`subject` A subject to write to. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: subject: foo.bar.baz # --- subject: ${! meta("kafka_topic") } # --- subject: foo.${! json("meta.type") } ``` ### [](#timeout)`timeout` A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, -1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h. **Type**: `string` **Default**: `3s` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls_handshake_first)`tls_handshake_first` Whether to perform the initial TLS handshake before sending the NATS INFO protocol message. This is required when connecting to some NATS servers that expect TLS to be established immediately after connection, before any protocol negotiation. **Type**: `bool` **Default**: `false` ### [](#urls)`urls[]` A list of URLs to connect to. If a list item contains commas, it will be expanded into multiple URLs. **Type**: `array` ```yaml # Examples: urls: - "nats://127.0.0.1:4222" # --- urls: - "nats://username:password@127.0.0.1:4222" ``` --- # Page 246: noop **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/noop.md --- # noop --- title: noop latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/noop page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/noop.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/noop.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/noop/)[Cache](/redpanda-cloud/develop/connect/components/caches/noop/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/noop/ "View the Self-Managed version of this component") Noop is a processor that does nothing, the message passes through unchanged. Why? Sometimes doing nothing is the braver option. ```yml # Config fields, showing default values label: "" noop: {} ``` --- # Page 247: ollama_chat **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/ollama_chat.md --- # ollama\_chat --- title: ollama_chat latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/ollama_chat page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/ollama_chat.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/ollama_chat.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Self-Managed > 📝 **NOTE** > > Ollama connectors are currently only available on BYOC GCP clusters. > ⚠️ **CAUTION** > > When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications. Generates responses to messages in a chat conversation using the Ollama API and external tools. #### Common ```yml processors: label: "" ollama_chat: model: "" # No default (required) prompt: "" # No default (optional) image: "" # No default (optional) response_format: text max_tokens: "" # No default (optional) temperature: "" # No default (optional) save_prompt_metadata: false history: "" # No default (optional) tools: [] runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" ollama_chat: model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) image: "" # No default (optional) response_format: text max_tokens: "" # No default (optional) temperature: "" # No default (optional) num_keep: "" # No default (optional) seed: "" # No default (optional) top_k: "" # No default (optional) top_p: "" # No default (optional) repeat_penalty: "" # No default (optional) presence_penalty: "" # No default (optional) frequency_penalty: "" # No default (optional) stop: [] # No default (optional) save_prompt_metadata: false history: "" # No default (optional) max_tool_calls: 3 tools: [] runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) cache_directory: "" # No default (optional) download_url: "" # No default (optional) ``` This processor sends prompts to your chosen Ollama large language model (LLM) and generates text from the responses using the Ollama API and external tools. By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can [download and install Ollama from the Ollama website](https://ollama.com/download). For more information, see the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) and [examples](#examples). ## [](#fields)Fields ### [](#cache_directory)`cache_directory` If `server_address` is not set - the directory to download the Ollama binary and use as a model cache. **Type**: `string` ```yaml # Examples: cache_directory: /opt/cache/connect/ollama ``` ### [](#download_url)`download_url` If `server_address` is not set - the URL to download the Ollama binary from. Defaults to the official Ollama GitHub release for this platform. **Type**: `string` ### [](#frequency_penalty)`frequency_penalty` Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model’s likelihood to repeat the same line verbatim. **Type**: `float` ### [](#history)`history` Include historical messages in a chat request. You must use a Bloblang query to create an array of objects in the form of `[{"role": "", "content":""}]` where: - `role` is the sender of the original messages, either `system`, `user`, `assistant`, or `tool`. - `content` is the text of the original messages. **Type**: `string` ### [](#image)`image` An optional image to submit along with the [`prompt`](#prompt) value. The result is a byte array. **Type**: `string` ```yaml # Examples: image: root = this.image.decode("base64") # decode base64 encoded image ``` ### [](#max_tokens)`max_tokens` The maximum number of tokens to predict and output. Limiting the amount of output means that requests are processed faster and have a fixed limit on the cost. **Type**: `int` ### [](#max_tool_calls)`max_tool_calls` The maximum number of sequential calls you can make to external tools to retrieve additional information to answer a prompt. **Type**: `int` **Default**: `3` ### [](#model)`model` The name of the Ollama LLM to use. For a full list of models, see the [Ollama website](https://ollama.com/models). **Type**: `string` ```yaml # Examples: model: llama3.1 # --- model: gemma2 # --- model: qwen2 # --- model: phi3 ``` ### [](#num_keep)`num_keep` Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to `4`. Use `-1` to retain all tokens from the initial prompt. **Type**: `int` ### [](#presence_penalty)`presence_penalty` Positive values penalize new tokens if they have appeared in the text so far. This increases the model’s likelihood to talk about new topics. **Type**: `float` ### [](#prompt)`prompt` The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#repeat_penalty)`repeat_penalty` Sets how strongly to penalize repetitions. A higher value, for example 1.5, will penalize repetitions more strongly. A lower value, for example 0.9, will be more lenient. **Type**: `float` ### [](#response_format)`response_format` The format of the response the Ollama model generates. If specifying JSON output, then the `prompt` should specify that the output should be in JSON as well. **Type**: `string` **Default**: `text` **Options**: `text`, `json` ### [](#runner)`runner` Options for the model runner that are used when the model is first loaded into memory. **Type**: `object` ### [](#runner-batch_size)`runner.batch_size` The maximum number of requests to process in parallel. **Type**: `int` ### [](#runner-context_size)`runner.context_size` Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process. **Type**: `int` ### [](#runner-gpu_layers)`runner.gpu_layers` This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically. **Type**: `int` ### [](#runner-threads)`runner.threads` Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads. **Type**: `int` ### [](#runner-use_mmap)`runner.use_mmap` Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed. **Type**: `bool` ### [](#save_prompt_metadata)`save_prompt_metadata` Set to `true` to save the prompt value to a metadata field (`@prompt`) on the corresponding output message. If you use the `system_prompt` field, its value is also saved to an `@system_prompt` metadata field on each output message. **Type**: `bool` **Default**: `false` ### [](#seed)`seed` Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. **Type**: `int` ```yaml # Examples: seed: 42 ``` ### [](#server_address)`server_address` The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server. **Type**: `string` ```yaml # Examples: server_address: http://127.0.0.1:11434 ``` ### [](#stop)`stop[]` Sets the stop sequences to use. When this pattern is encountered, the LLM stops generating text and returns the final response. **Type**: `array` ### [](#system_prompt)`system_prompt` The system prompt to submit to the Ollama LLM. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#temperature)`temperature` The temperature of the model. Increasing the temperature makes the model answer more creatively. **Type**: `int` ### [](#tools)`tools[]` The external tools the LLM can invoke, such as functions, APIs, or web browsing. You can build a series of processors that include definitions of these tools, and the specified LLM can choose when to invoke them to help answer a prompt. For more information, see [examples](#examples). **Type**: `object` **Default**: `[]` ### [](#tools-description)`tools[].description` A description of this tool, the LLM uses this to decide if the tool should be used. **Type**: `string` ### [](#tools-name)`tools[].name` The name of this tool. **Type**: `string` ### [](#tools-parameters)`tools[].parameters` The parameters the LLM needs to provide to invoke this tool. **Type**: `object` ### [](#tools-parameters-properties)`tools[].parameters.properties` The properties for the processor’s input data **Type**: `object` ### [](#tools-parameters-properties-description)`tools[].parameters.properties.description` A description of this parameter. **Type**: `string` ### [](#tools-parameters-properties-enum)`tools[].parameters.properties.enum[]` Specifies that this parameter is an enum and only these specific values should be used. **Type**: `array` **Default**: `[]` ### [](#tools-parameters-properties-type)`tools[].parameters.properties.type` The type of this parameter. **Type**: `string` ### [](#tools-parameters-required)`tools[].parameters.required[]` The required parameters for this pipeline. **Type**: `array` **Default**: `[]` ### [](#tools-processors)`tools[].processors[]` The pipeline to execute when the LLM uses this tool. **Type**: `processor` ### [](#top_k)`top_k` Reduces the probability of generating nonsense. A higher value, for example `100`, will give more diverse answers. A lower value, for example `10`, will be more conservative. **Type**: `int` ### [](#top_p)`top_p` Works together with `top-k`. A higher value, for example 0.95, will lead to more diverse text. A lower value, for example 0.5, will generate more focused and conservative text. **Type**: `float` ## [](#examples)Examples ### [](#use-llava-to-analyze-an-image)Use Llava to analyze an image This example fetches image URLs from stdin and has a multimodal LLM describe the image. ```yaml input: stdin: scanner: lines: {} pipeline: processors: - http: verb: GET url: "${!content().string()}" - ollama_chat: model: llava prompt: "Describe the following image" image: "root = content()" output: stdout: codec: lines ``` ### [](#use-subpipelines-as-tool-calls)Use subpipelines as tool calls This example allows llama3.2 to execute a subpipeline as a tool call to get more data. ```yaml input: generate: count: 1 mapping: | root = "What is the weather like in Chicago?" pipeline: processors: - ollama_chat: model: llama3.2 prompt: "${!content().string()}" tools: - name: GetWeather description: "Retrieve the weather for a specific city" parameters: required: ["city"] properties: city: type: string description: the city to lookup the weather for processors: - http: verb: GET url: 'https://wttr.in/${!this.city}?T' headers: # Spoof curl user-ageent to get a plaintext text User-Agent: curl/8.11.1 output: stdout: {} ``` --- # Page 248: ollama_embeddings **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/ollama_embeddings.md --- # ollama\_embeddings --- title: ollama_embeddings latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/ollama_embeddings page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/ollama_embeddings.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/ollama_embeddings.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Self-Managed > 📝 **NOTE** > > Ollama connectors are currently only available on BYOC GCP clusters. > ⚠️ **CAUTION** > > When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications. Generates vector embeddings from text, using the Ollama API. #### Common ```yml processors: label: "" ollama_embeddings: model: "" # No default (required) text: "" # No default (optional) runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" ollama_embeddings: model: "" # No default (required) text: "" # No default (optional) runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) cache_directory: "" # No default (optional) download_url: "" # No default (optional) ``` This processor sends text to your chosen Ollama large language model (LLM) and creates vector embeddings, using the Ollama API. Vector embeddings are long arrays of numbers that represent values or objects, in this case text. By default, the processor starts and runs a locally installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can [download and install Ollama from the Ollama website](https://ollama.com/download). For more information, see the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs). ## [](#fields)Fields ### [](#cache_directory)`cache_directory` If `server_address` is not set - the directory to download the ollama binary and use as a model cache. **Type**: `string` ```yaml # Examples: cache_directory: /opt/cache/connect/ollama ``` ### [](#download_url)`download_url` If `server_address` is not set - the URL to download the ollama binary from. Defaults to the official Ollama GitHub release for this platform. **Type**: `string` ### [](#model)`model` The name of the Ollama LLM to use. For a full list of models, see the [Ollama website](https://ollama.com/models). **Type**: `string` ```yaml # Examples: model: nomic-embed-text # --- model: mxbai-embed-large # --- model: snowflake-artic-embed # --- model: all-minilm ``` ### [](#runner)`runner` Options for the model runner that are used when the model is first loaded into memory. **Type**: `object` ### [](#runner-batch_size)`runner.batch_size` The maximum number of requests to process in parallel. **Type**: `int` ### [](#runner-context_size)`runner.context_size` Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to processor. **Type**: `int` ### [](#runner-gpu_layers)`runner.gpu_layers` This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically. **Type**: `int` ### [](#runner-threads)`runner.threads` Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads. **Type**: `int` ### [](#runner-use_mmap)`runner.use_mmap` Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed. **Type**: `bool` ### [](#server_address)`server_address` The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server. **Type**: `string` ```yaml # Examples: server_address: http://127.0.0.1:11434 ``` ### [](#text)`text` The text you want to create vector embeddings for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 249: ollama_moderation **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/ollama_moderation.md --- # ollama\_moderation --- title: ollama_moderation page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/ollama_moderation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/ollama_moderation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/ollama_moderation.adoc # Beta release status page-beta: "true" page-git-created-date: "2025-01-28" page-git-modified-date: "2025-01-28" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta **Available in:** Self-Managed > 📝 **NOTE** > > Ollama connectors are currently only available on BYOC GCP clusters. > ⚠️ **CAUTION** > > When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications. Generates responses to messages in a chat conversation using the Ollama API, and checks the responses to make sure they do not violate [safety or security standards](https://mlcommons.org/2024/04/mlc-aisafety-v0-5-poc/). #### Common ```yml processors: label: "" ollama_moderation: model: "" # No default (required) prompt: "" # No default (required) response: "" # No default (required) runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" ollama_moderation: model: "" # No default (required) prompt: "" # No default (required) response: "" # No default (required) runner: context_size: "" # No default (optional) batch_size: "" # No default (optional) gpu_layers: "" # No default (optional) threads: "" # No default (optional) use_mmap: "" # No default (optional) server_address: "" # No default (optional) cache_directory: "" # No default (optional) download_url: "" # No default (optional) ``` This processor checks the safety of responses from your chosen large language model (LLM) using either [Llama Guard 3](https://ollama.com/library/llama-guard3) or [ShieldGemma](https://ollama.com/library/shieldgemma). By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can [download and install Ollama from the Ollama website](https://ollama.com/download). For more information, see the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) and [Examples](#examples). To check the safety of your prompts, see the [`ollama_chat` processor](../ollama_chat/#examples) documentation. ## [](#fields)Fields ### [](#cache_directory)`cache_directory` If the `server_address` is not set, download the Ollama binary to this directory and use it as a model cache. **Type**: `string` ```yaml # Examples: cache_directory: /opt/cache/connect/ollama ``` ### [](#download_url)`download_url` If `server_address` is not set, download the Ollama binary from this URL. The default value is the official Ollama GitHub release for this platform. **Type**: `string` ### [](#model)`model` The name of the Ollama LLM to use. **Type**: `string` | Option | Summary | | --- | --- | | llama-guard3 | When using llama-guard3, two pieces of metadata is added: @safe with the value of yes or no and the second being @category for the safety category violation. For more information see the Llama Guard 3 Model Card. | | shieldgemma | When using shieldgemma, the model output is a single piece of metadata of @safe with a value of yes or no if the response is not in violation of its defined safety policies. | ```yaml # Examples: model: llama-guard3 # --- model: shieldgemma ``` ### [](#prompt)`prompt` The prompt you used to generate a response from an LLM. If you’re using the `ollama_chat` processor, you can set the `save_prompt_metadata` field to save the contents of your prompts. You can then run them through `ollama_moderation` processor to check the model responses for safety. For more details, see [Examples](#examples). You can also check the safety of your prompts. For more information, see the [`ollama_chat` processor](../ollama_chat/#examples) documentation. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#response)`response` The LLM’s response that you want to check for safety. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#runner)`runner` Options for the model runner that are used when the model is first loaded into memory. **Type**: `object` ### [](#runner-batch_size)`runner.batch_size` The maximum number of requests to process in parallel. **Type**: `int` ### [](#runner-context_size)`runner.context_size` Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process. **Type**: `int` ### [](#runner-gpu_layers)`runner.gpu_layers` Sets the number of layers to offload to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically. **Type**: `int` ### [](#runner-threads)`runner.threads` Sets the number of threads to use during response generation. For optimal performance, set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads. **Type**: `int` ### [](#runner-use_mmap)`runner.use_mmap` Map the model into memory. Set to `true` to load only the necessary parts of the model into memory. This setting is only supported on Unix systems. **Type**: `bool` ### [](#server_address)`server_address` The address of the Ollama server to use. Leave this field blank and the processor starts and runs a local Ollama server, or specify the address of your own local or remote server. **Type**: `string` ```yaml # Examples: server_address: http://127.0.0.1:11434 ``` ## [](#examples)Examples ### [](#use-llama-guard-3-classify-a-llm-response)Use Llama Guard 3 classify a LLM response This example uses Llama Guard 3 to check if another model responded with a safe or unsafe content. ```yaml input: stdin: scanner: lines: {} pipeline: processors: - ollama_chat: model: llava prompt: "${!content().string()}" save_prompt_metadata: true - ollama_moderation: model: llama-guard3 prompt: "${!@prompt}" response: "${!content().string()}" - mapping: | root.response = content().string() root.is_safe = @safe output: stdout: codec: lines ``` --- # Page 250: openai_chat_completion **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_chat_completion.md --- # openai\_chat\_completion --- title: openai_chat_completion latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_chat_completion page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_chat_completion.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_chat_completion.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_chat_completion/ "View the Self-Managed version of this component") Generates responses to messages in a chat conversation, using the OpenAI API and external tools. #### Common ```yml processors: label: "" openai_chat_completion: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) history: "" # No default (optional) image: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) user: "" # No default (optional) response_format: text json_schema: name: "" # No default (required) description: "" # No default (optional) schema: "" # No default (required) tools: [] # No default (required) ``` #### Advanced ```yml processors: label: "" openai_chat_completion: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) system_prompt: "" # No default (optional) history: "" # No default (optional) image: "" # No default (optional) max_tokens: "" # No default (optional) temperature: "" # No default (optional) user: "" # No default (optional) response_format: text json_schema: name: "" # No default (required) description: "" # No default (optional) schema: "" # No default (required) schema_registry: url: "" # No default (required) name_prefix: schema_registry_id_ subject: "" # No default (required) refresh_interval: "" # No default (optional) tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} top_p: "" # No default (optional) frequency_penalty: "" # No default (optional) presence_penalty: "" # No default (optional) seed: "" # No default (optional) stop: [] # No default (optional) tools: [] # No default (required) ``` This processor sends user prompts to the OpenAI API, and the specified large language model (LLM) generates responses using all available context, including supplementary data provided by [external tools](#tools). By default, the processor submits the entire payload of each message as a string, unless you use the `prompt` configuration field to customize it. To learn more about chat completion, see the [OpenAI API documentation](https://platform.openai.com/docs/guides/chat-completions), and [Examples](#Examples). ## [](#fields)Fields ### [](#api_key)`api_key` The API secret key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#frequency_penalty)`frequency_penalty` Specify a number between `-2.0` and `2.0`. Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model’s likelihood to repeat the same line verbatim. **Type**: `float` ### [](#history)`history` Include messages from a prior conversation. You must use a Bloblang query to create an array of objects in the form of `[{"role": "user", "content": ""}, {"role":"assistant", "content":""}]` where: - `role` is the sender of the original messages, either `system`, `user`, or `assistant`. - `content` is the text of the original messages. For more information, see [Examples](#Examples). **Type**: `string` ### [](#image)`image` An optional image to submit along with the prompt. The result of the Bloblang mapping must be a byte array. **Type**: `string` ```yaml # Examples: image: root = this.image.decode("base64") # decode base64 encoded image ``` ### [](#json_schema)`json_schema` The JSON schema used by the model when generating responses in `json_schema` format. To learn more about supported JSON schema features, see the [OpenAI documentation](https://platform.openai.com/docs/guides/structured-outputs/supported-schemas). **Type**: `object` ### [](#json_schema-description)`json_schema.description` An optional description, which helps the model understand the schema’s purpose. **Type**: `string` ### [](#json_schema-name)`json_schema.name` The name of the JSON schema to use. **Type**: `string` ### [](#json_schema-schema)`json_schema.schema` The JSON schema for the model to use when generating the output. **Type**: `string` ### [](#max_tokens)`max_tokens` The maximum number of tokens to generate for chat completion. **Type**: `int` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: gpt-4o # --- model: gpt-4o-mini # --- model: gpt-4 # --- model: gpt4-turbo ``` ### [](#presence_penalty)`presence_penalty` Specify a number between `-2.0` and `2.0`. Positive values penalize new tokens if they have appeared in the text so far. This increases the model’s likelihood to talk about new topics. **Type**: `float` ### [](#prompt)`prompt` The user prompt for which a response is generated. By default, the processor sends the entire payload as a string unless customized using this field. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#response_format)`response_format` Specify the configured [model’s](#model) output format. If you choose the `json_schema` option, you must also configure a `json_schema` or `schema_registry`. **Type**: `string` **Default**: `text` **Options**: `text`, `json`, `json_schema` ### [](#schema_registry)`schema_registry` The schema registry to dynamically load schemas for model responses in `json_schema` format. Schemas must be in JSON format. To learn more about supported JSON schema features, see the [OpenAI documentation](https://platform.openai.com/docs/guides/structured-outputs/supported-schemas). **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Configure basic authentication for requests from this component to your schema registry. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` The password to use for authentication. Used together with `username` for basic authentication or with encrypted private keys for secure access. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` The username of the account credentials to authenticate as. Used together with `password` for basic authentication. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` Values used to pass the identity of the authenticated entity to the service provider. In this case, between this component and the schema registry. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` The key/value pairs that identify the type of token and signing algorithm (optional). **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#schema_registry-name_prefix)`schema_registry.name_prefix` A prefix to add to the schema registry name. To form the complete schema registry name, the schema ID is appended as a suffix. **Type**: `string` **Default**: `schema_registry_id_` ### [](#schema_registry-oauth)`schema_registry.oauth` Configure OAuth version 1.0 to give this component authorized access to your schema registry. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` The value this component can use to gain access to the data in the schema registry. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` The secret that establishes ownership of the `oauth.access_token` in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` The value used to identify this component or client to your schema registry. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` The secret that establishes ownership of the consumer key in OAuth 1.0 authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to enable OAuth version 1.0 authentication for requests to the schema registry. **Type**: `bool` **Default**: `false` ### [](#schema_registry-refresh_interval)`schema_registry.refresh_interval` How frequently to poll the schema registry for updates. If not specified, the schema does not refresh automatically. **Type**: `string` ### [](#schema_registry-subject)`schema_registry.subject` The subject name used to fetch the schema from the schema registry. **Type**: `string` ### [](#schema_registry-tls)`schema_registry.tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#schema_registry-url)`schema_registry.url` The base URL of the schema registry service. **Type**: `string` ### [](#seed)`seed` When set to a specific number, Redpanda Connect attempts to generate consistent responses for requests that use the same prompt, seed, and parameters. **Type**: `int` ### [](#server_address)`server_address` The OpenAI API endpoint to which the processor sends requests. Update the default value to use a different OpenAI-compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` ### [](#stop)`stop[]` Specify up to four stop sequences to use. When the model encounters a stop pattern, it stops generating text and returns the final response. **Type**: `array` ### [](#system_prompt)`system_prompt` The system prompt to submit along with the user prompt. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#temperature)`temperature` Choose a sampling temperature between `0` and `2`: - Higher values, such as `0.8` make the output more random. - Lower values, such as `0.2` make the output more focused and deterministic. Redpanda recommends adding a value for this field or [`top_p`](#top_p), but not both. **Type**: `float` ### [](#tools)`tools[]` External tools the model can invoke, such as functions, APIs, or web browsing. You can build a series of processors that include definitions of these tools, and the specified model can choose when to invoke them to help answer a prompt. For more information, see [Examples](#Examples). > 📝 **NOTE** > > If you don’t want to use external tools, enter an empty array `tools:[]`. **Type**: `object` ### [](#tools-description)`tools[].description` A description of this tool, the LLM uses this to decide if the tool should be used. **Type**: `string` ### [](#tools-name)`tools[].name` The name of this tool. **Type**: `string` ### [](#tools-parameters)`tools[].parameters` The parameters the LLM needs to provide to invoke this tool. **Type**: `object` **Default**: `[]` ### [](#tools-parameters-properties)`tools[].parameters.properties` The properties for the processor’s input data **Type**: `object` ### [](#tools-parameters-properties-description)`tools[].parameters.properties.description` A description of this parameter. **Type**: `string` ### [](#tools-parameters-properties-enum)`tools[].parameters.properties.enum[]` Specifies that this parameter is an enum and only these specific values should be used. **Type**: `array` **Default**: `[]` ### [](#tools-parameters-properties-type)`tools[].parameters.properties.type` The type of this parameter. **Type**: `string` ### [](#tools-parameters-required)`tools[].parameters.required[]` The required parameters for this pipeline. **Type**: `array` **Default**: `[]` ### [](#tools-processors)`tools[].processors[]` The pipeline to execute when the LLM uses this tool. **Type**: `processor` ### [](#top_p)`top_p` An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. For example, a `top_p` of `0.1` means only the tokens comprising the top 10% probability mass are sampled. Redpanda recommends adding a value for this field or `temperature`, but not both. **Type**: `float` ### [](#user)`user` A unique identifier that represents the end-user generating the prompt. This value can help OpenAI monitor and detect [platform abuse](https://openai.com/policies/usage-policies/). This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` nclude::redpanda-connect:components:partial$examples/processors/openai\_chat\_completion.adoc\[\] --- # Page 251: openai_embeddings **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_embeddings.md --- # openai\_embeddings --- title: openai_embeddings latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_embeddings page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_embeddings.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_embeddings.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_embeddings/ "View the Self-Managed version of this component") Generates vector embeddings to represent input text, using the OpenAI API. ```yml # Config fields, showing default values label: "" openai_embeddings: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: text-embedding-3-large # No default (required) text_mapping: "" # No default (optional) ``` This processor sends text strings to the OpenAI API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the `text_mapping` configuration field to customize it. To learn more about vector embeddings, see the [OpenAI API documentation](https://platform.openai.com/docs/guides/embeddings). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#dimensions)`dimensions` The number of dimensions the resulting output embeddings should have. Only supported in `text-embedding-3` and later models. **Type**: `int` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: text-embedding-3-large # --- model: text-embedding-3-small # --- model: text-embedding-ada-002 ``` ### [](#server_address)`server_address` The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` ### [](#text_mapping)`text_mapping` The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string. **Type**: `string` --- # Page 252: openai_image_generation **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_image_generation.md --- # openai\_image\_generation --- title: openai_image_generation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_image_generation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_image_generation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_image_generation.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_image_generation/ "View the Self-Managed version of this component") Generates an image from a text description and other attributes, using OpenAI API. #### Common ```yml processors: label: "" openai_image_generation: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" openai_image_generation: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) prompt: "" # No default (optional) quality: "" # No default (optional) size: "" # No default (optional) style: "" # No default (optional) ``` This processor sends an image description and other attributes, such as image size and quality to the OpenAI API, which generates an image. By default, the processor submits the entire payload of each message as a string, unless you use the `prompt` configuration field to customize it. To learn more about image generation, see the [OpenAI API documentation](https://platform.openai.com/docs/guides/images). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: dall-e-3 # --- model: dall-e-2 ``` ### [](#prompt)`prompt` A text description of the image you want to generate. The `prompt` field accepts a maximum of 1000 characters for `dall-e-2` and 4000 characters for `dall-e-3`. **Type**: `string` ### [](#quality)`quality` The quality of the image to generate. Use `hd` to create images with finer details and greater consistency across the image. This parameter is only supported for `dall-e-3` models. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: quality: standard # --- quality: hd ``` ### [](#server_address)`server_address` The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` ### [](#size)`size` The size of the generated image. Choose from `256x256`, `512x512`, or `1024x1024` for `dall-e-2`. Choose from `1024x1024`, `1792x1024`, or `1024x1792` for `dall-e-3` models. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: size: 1024x1024 # --- size: 512x512 # --- size: 1792x1024 # --- size: 1024x1792 ``` ### [](#style)`style` The style of the generated image. Choose from `vivid` or `natural`. Vivid causes the model to lean towards generating hyperreal and dramatic images. Natural causes the model to produce more natural, less hyperreal looking images. This parameter is only supported for `dall-e-3`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: style: vivid # --- style: natural ``` --- # Page 253: openai_speech **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_speech.md --- # openai\_speech --- title: openai_speech latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_speech page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_speech.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_speech.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_speech/ "View the Self-Managed version of this component") Generates audio from a text description and other attributes, using OpenAI API. #### Common ```yml processors: label: "" openai_speech: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) input: "" # No default (optional) voice: "" # No default (required) ``` #### Advanced ```yml processors: label: "" openai_speech: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) input: "" # No default (optional) voice: "" # No default (required) response_format: "" # No default (optional) ``` This processor sends a text description and other attributes, such as a voice type and format to the OpenAI API, which generates audio. By default, the processor submits the entire payload of each message as a string, unless you use the `input` configuration field to customize it. To learn more about turning text into spoken audio, see the [OpenAI API documentation](https://platform.openai.com/docs/guides/text-to-speech). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#input)`input` A text description of the audio you want to generate. The `input` field accepts a maximum of 4096 characters. **Type**: `string` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: tts-1 # --- model: tts-1-hd ``` ### [](#response_format)`response_format` The format to generate audio in. Default is `mp3`. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: response_format: mp3 # --- response_format: opus # --- response_format: aac # --- response_format: flac # --- response_format: wav # --- response_format: pcm ``` ### [](#server_address)`server_address` The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` ### [](#voice)`voice` The type of voice to use when generating the audio. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: voice: alloy # --- voice: echo # --- voice: fable # --- voice: onyx # --- voice: nova # --- voice: shimmer ``` --- # Page 254: openai_transcription **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_transcription.md --- # openai\_transcription --- title: openai_transcription latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_transcription page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_transcription.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_transcription.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_transcription/ "View the Self-Managed version of this component") Generates a transcription of spoken audio in the input language, using the OpenAI API. #### Common ```yml processors: label: "" openai_transcription: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) file: "" # No default (required) ``` #### Advanced ```yml processors: label: "" openai_transcription: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) file: "" # No default (required) language: "" # No default (optional) prompt: "" # No default (optional) ``` This processor sends an audio file object along with the input language to OpenAI API to generate a transcription. By default, the processor submits the entire payload of each message as a string, unless you use the `file` configuration field to customize it. To learn more about audio transcription, see the: [OpenAI API documentation](https://platform.openai.com/docs/guides/speech-to-text). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#file)`file` The audio file object (not file name) to transcribe, in one of the following formats: `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, or `webm`. **Type**: `string` ### [](#language)`language` The language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: language: en # --- language: fr # --- language: de # --- language: zh ``` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: whisper-1 ``` ### [](#prompt)`prompt` Optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#server_address)`server_address` The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` --- # Page 255: openai_translation **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/openai_translation.md --- # openai\_translation --- title: openai_translation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/openai_translation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/openai_translation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/openai_translation.adoc categories: "[\"AI\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/openai_translation/ "View the Self-Managed version of this component") Translates spoken audio into English, using the OpenAI API. #### Common ```yml processors: label: "" openai_translation: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) file: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" openai_translation: server_address: https://api.openai.com/v1 api_key: "" # No default (required) model: "" # No default (required) file: "" # No default (optional) prompt: "" # No default (optional) ``` This processor sends an audio file object to OpenAI API to generate a translation. By default, the processor submits the entire payload of each message as a string, unless you use the `file` configuration field to customize it. To learn more about translation, see the [OpenAI API documentation](https://platform.openai.com/docs/guides/speech-to-text). ## [](#fields)Fields ### [](#api_key)`api_key` The API key for OpenAI API. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#file)`file` The audio file object (not file name) to translate, in one of the following formats: `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, or `webm`. **Type**: `string` ### [](#model)`model` The name of the OpenAI model to use. **Type**: `string` ```yaml # Examples: model: whisper-1 ``` ### [](#prompt)`prompt` Optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#server_address)`server_address` The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service. **Type**: `string` **Default**: `[https://api.openai.com/v1](https://api.openai.com/v1)` --- # Page 256: parallel **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/parallel.md --- # parallel --- title: parallel latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/parallel page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/parallel.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/parallel.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/parallel/ "View the Self-Managed version of this component") A processor that applies a list of child processors to messages of a batch as though they were each a batch of one message (similar to the [`for_each`](../for_each/) processor), but where each message is processed in parallel. ```yml # Config fields, showing default values label: "" parallel: cap: 0 processors: [] # No default (required) ``` The field `cap`, if greater than zero, caps the maximum number of parallel processing threads. The functionality of this processor depends on being applied across messages that are batched. You can find out more about batching in [Message Batching](../../../configuration/batching/). ## [](#fields)Fields ### [](#cap)`cap` The maximum number of messages to have processing at a given time. **Type**: `int` **Default**: `0` ### [](#processors)`processors[]` A list of child processors to apply. **Type**: `processor` --- # Page 257: parquet_decode **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/parquet_decode.md --- # parquet\_decode --- title: parquet_decode latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/parquet_decode page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/parquet_decode.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/parquet_decode.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/parquet_decode/ "View the Self-Managed version of this component") Decodes [Parquet files](https://parquet.apache.org/docs/) into a batch of structured messages. ```yml # Configuration fields, showing default values label: "" parquet_decode: handle_logical_types: v1 ``` ## [](#fields)Fields ### [](#handle_logical_types)`handle_logical_types` Set to `v2` to enable enhanced decoding of logical types, or keep the default value (`v1`) to ignore logical type metadata when decoding values. In Parquet format, logical types are represented using standard physical types along with metadata that provides additional context. For example, UUIDs are stored as a `FIXED_LEN_BYTE_ARRAY` physical type, but the schema metadata identifies them as UUIDs. By enabling `v2`, this processor uses the metadata descriptions of logical types to produce more meaningful values during decoding. > 📝 **NOTE** > > For backward compatibility, this field enables logical-type handling for the specified Parquet format version, and all earlier versions. When creating new pipelines, Redpanda recommends that you use the newest documented version. **Type**: `string` **Default**: `v1` | Option | Summary | | --- | --- | | v1 | No special handling of logical types | | v2 | TIMESTAMP - decodes as an RFC3339 string describing the time. If the isAdjustedToUTC flag is set to true in the parquet file, the time zone will be set to UTC. If it is set to false the time zone will be set to local time.UUID - decodes as a string, i.e. 00112233-4455-6677-8899-aabbccddeeff. | ```yaml # Examples: handle_logical_types: v2 ``` ## [](#examples)Examples ### [](#reading-parquet-files-from-aws-s3)Reading Parquet Files from AWS S3 In this example we consume files from AWS S3 as they’re written by listening onto an SQS queue for upload events. We make sure to use the `to_the_end` scanner which means files are read into memory in full, which then allows us to use a `parquet_decode` processor to expand each file into a batch of messages. Finally, we write the data out to local files as newline delimited JSON. ```yaml input: aws_s3: bucket: TODO prefix: foos/ scanner: to_the_end: {} sqs: url: TODO processors: - parquet_decode: {} output: file: codec: lines path: './foos/${! meta("s3_key") }.jsonl' ``` --- # Page 258: parquet_encode **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/parquet_encode.md --- # parquet\_encode --- title: parquet_encode latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/parquet_encode page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/parquet_encode.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/parquet_encode.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/parquet_encode/ "View the Self-Managed version of this component") Encodes [Parquet files](https://parquet.apache.org/docs/) from a batch of structured messages. #### Common ```yml processors: label: "" parquet_encode: schema: [] # No default (optional) schema_metadata: "" default_compression: uncompressed ``` #### Advanced ```yml processors: label: "" parquet_encode: schema: [] # No default (optional) schema_metadata: "" default_compression: uncompressed default_encoding: DELTA_LENGTH_BYTE_ARRAY ``` ## [](#fields)Fields ### [](#default_compression)`default_compression` The default compression type to use for fields. **Type**: `string` **Default**: `uncompressed` **Options**: `uncompressed`, `snappy`, `gzip`, `brotli`, `zstd`, `lz4raw` ### [](#default_encoding)`default_encoding` The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support `DELTA_LENGTH_BYTE_ARRAY`. **Type**: `string` **Default**: `DELTA_LENGTH_BYTE_ARRAY` **Options**: `DELTA_LENGTH_BYTE_ARRAY`, `PLAIN` ### [](#schema)`schema[]` Parquet schema. **Type**: `object` ### [](#schema-fields)`schema[].fields[]` A list of child fields. **Type**: `array` ```yaml # Examples: fields: - name: foo type: INT64 - name: bar type: BYTE_ARRAY ``` ### [](#schema-name)`schema[].name` The name of the column. **Type**: `string` ### [](#schema-optional)`schema[].optional` Whether the field is optional. **Type**: `bool` **Default**: `false` ### [](#schema-repeated)`schema[].repeated` Whether the field is repeated. **Type**: `bool` **Default**: `false` ### [](#schema-type)`schema[].type` The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8. **Type**: `string` **Options**: `BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BYTE_ARRAY`, `UTF8`, `TIMESTAMP`, `BSON`, `ENUM`, `JSON`, `UUID` ### [](#schema_metadata)`schema_metadata` Optionally specify a metadata field containing a schema definition to use for encoding instead of a statically defined schema. For batches of messages, the first message’s schema will be applied to all subsequent messages of the batch. **Type**: `string` **Default**: `""` ## [](#examples)Examples ### [](#writing-parquet-files-to-aws-s3)Writing Parquet Files to AWS S3 In this example we use the batching mechanism of an `aws_s3` output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it. ```yaml output: aws_s3: bucket: TODO path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet' batching: count: 1000 period: 10s processors: - parquet_encode: schema: - name: id type: INT64 - name: weight type: DOUBLE - name: content type: BYTE_ARRAY default_compression: zstd ``` --- # Page 259: parse_log **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/parse_log.md --- # parse\_log --- title: parse_log latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/parse_log page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/parse_log.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/parse_log.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/parse_log/ "View the Self-Managed version of this component") Parses common log [Formats](#formats) into [structured data](#codecs). #### Common ```yml processors: label: "" parse_log: format: "" # No default (required) ``` #### Advanced ```yml processors: label: "" parse_log: format: "" # No default (required) best_effort: true allow_rfc3339: true default_year: current default_timezone: UTC ``` ## [](#fields)Fields ### [](#allow_rfc3339)`allow_rfc3339` Also accept timestamps in rfc3339 format while parsing. Applicable to format `syslog_rfc3164`. **Type**: `bool` **Default**: `true` ### [](#best_effort)`best_effort` Still returns partially parsed messages even if an error occurs. **Type**: `bool` **Default**: `true` ### [](#default_timezone)`default_timezone` Sets the strategy to decide the timezone for rfc3164 timestamps. Applicable to format `syslog_rfc3164`. This value should follow the [time.LoadLocation](https://golang.org/pkg/time/#LoadLocation) format. **Type**: `string` **Default**: `UTC` ### [](#default_year)`default_year` Sets the strategy used to set the year for rfc3164 timestamps. Applicable to format `syslog_rfc3164`. When set to `current` the current year will be set, when set to an integer that value will be used. Leave this field empty to not set a default year at all. **Type**: `string` **Default**: `current` ### [](#format)`format` A common log [format](#formats) to parse. **Type**: `string` **Options**: `syslog_rfc5424`, `syslog_rfc3164` ## [](#codecs)Codecs Currently the only supported structured data codec is `json`. ## [](#formats)Formats ### [](#syslog_rfc5424)`syslog_rfc5424` Attempts to parse a log following the [Syslog RFC5424](https://tools.ietf.org/html/rfc5424) spec. The resulting structured document may contain any of the following fields: - `message` (string) - `timestamp` (string, RFC3339) - `facility` (int) - `severity` (int) - `priority` (int) - `version` (int) - `hostname` (string) - `procid` (string) - `appname` (string) - `msgid` (string) - `structureddata` (object) ### [](#syslog_rfc3164)`syslog_rfc3164` Attempts to parse a log following the [Syslog rfc3164](https://tools.ietf.org/html/rfc3164) spec. The resulting structured document may contain any of the following fields: - `message` (string) - `timestamp` (string, RFC3339) - `facility` (int) - `severity` (int) - `priority` (int) - `hostname` (string) - `procid` (string) - `appname` (string) - `msgid` (string) --- # Page 260: processors **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/processors.md --- # processors --- title: processors latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/processors page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/processors.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/processors.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/processors/ "View the Self-Managed version of this component") A processor grouping several sub-processors. ```yml # Config fields, showing default values label: "" processors: [] ``` This processor is useful in situations where you want to collect several processors under a single resource identifier, whether it is for making your configuration easier to read and navigate, or for improving the testability of your configuration. The behavior of child processors will match exactly the behavior they would have under any other processors block. ## [](#examples)Examples ### [](#grouped-processing)Grouped Processing Imagine we have a collection of processors who cover a specific functionality. We could use this processor to group them together and make it easier to read and mock during testing by giving the whole block a label: ```yaml pipeline: processors: - label: my_super_feature processors: - log: message: "Let's do something cool" - archive: format: json_array - mapping: root.items = this ``` --- # Page 261: protobuf **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/protobuf.md --- # protobuf --- title: protobuf latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/protobuf page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/protobuf.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/protobuf.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Self-Managed Handles conversions between JSON documents and protobuf messages using reflection, which allows you to make conversions from or to the target `.proto` files. For more information about JSON mapping of protobuf messages, see [ProtoJSON Format](https://protobuf.dev/programming-guides/json/) and [Examples](#examples). ```yml # Configuration fields, showing default values label: "" protobuf: operator: "" # No default (required) message: "" # No default (required) discard_unknown: false use_proto_names: false import_paths: [] use_enum_numbers: false ``` ## [](#performance-considerations)Performance considerations Processing protobuf messages using reflection is less performant than using generated native code. For scenarios where performance is critical, consider using [Redpanda Connect plugins](https://github.com/benthosdev/benthos-plugin-example). ## [](#operators)Operators ### [](#to_json)`to_json` Converts protobuf messages into a generic JSON structure, which makes it easier to manipulate the contents of the JSON document within Redpanda Connect. ### [](#from_json)`from_json` Attempts to create a target protobuf message from a generic JSON structure. ## [](#fields)Fields ### [](#bsr)`bsr[]` Buf Schema Registry configuration. Either this field or `import_paths` must be populated. Note that this field is an array, and multiple BSR configurations can be provided. **Type**: `object` **Default**: `[]` ### [](#bsr-api_key)`bsr[].api_key` Buf Schema Registry server API key, can be left blank for a public registry. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#bsr-module)`bsr[].module` Module to fetch from a Buf Schema Registry e.g. 'buf.build/exampleco/mymodule'. **Type**: `string` ### [](#bsr-url)`bsr[].url` Buf Schema Registry URL, leave blank to extract from module. **Type**: `string` **Default**: `""` ### [](#bsr-version)`bsr[].version` Version to retrieve from the Buf Schema Registry, leave blank for latest. **Type**: `string` **Default**: `""` ### [](#discard_unknown)`discard_unknown` When set to `true`, the `from_json` operator discards fields that are unknown to the schema. **Type**: `bool` **Default**: `false` ### [](#import_paths)`import_paths[]` A list of directories that contain `.proto` files, including all definitions required for parsing the target message. If left empty, the current directory is used. This processor imports all `.proto` files listed within specified or default directories. **Type**: `array` **Default**: `[]` ### [](#message)`message` The fully-qualified name of the protobuf message to convert from or to JSON. **Type**: `string` ### [](#operator)`operator` The [operator](#operators) to execute. **Type**: `string` **Options**: `to_json`, `from_json`, `decode` ### [](#use_enum_numbers)`use_enum_numbers` When set to `true`, the `to_json` operator deserializes enumeration fields as their numerical values instead of their string names. For example, an enum field with a value of `ENUM_VALUE_ONE` is represented as `1` in the JSON output. **Type**: `bool` **Default**: `false` ### [](#use_proto_names)`use_proto_names` When set to `true`, the `to_json` operator deserializes fields exactly as named in schema file. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#json-to-protobuf-using-schema-from-disk)JSON to Protobuf using Schema from Disk If we have the following protobuf definition within a directory called `testing/schema`: ```protobuf syntax = "proto3"; package testing; import "google/protobuf/timestamp.proto"; message Person { string first_name = 1; string last_name = 2; string full_name = 3; int32 age = 4; int32 id = 5; // Unique ID number for this person. string email = 6; google.protobuf.Timestamp last_updated = 7; } ``` And a stream of JSON documents of the form: ```json { "firstName": "caleb", "lastName": "quaye", "email": "caleb@myspace.com" } ``` We can convert the documents into protobuf messages with the following config: ```yaml pipeline: processors: - protobuf: operator: from_json message: testing.Person import_paths: [ testing/schema ] ``` ### [](#protobuf-to-json-using-schema-from-disk)Protobuf to JSON using Schema from Disk If we have the following protobuf definition within a directory called `testing/schema`: ```protobuf syntax = "proto3"; package testing; import "google/protobuf/timestamp.proto"; message Person { string first_name = 1; string last_name = 2; string full_name = 3; int32 age = 4; int32 id = 5; // Unique ID number for this person. string email = 6; google.protobuf.Timestamp last_updated = 7; } ``` And a stream of protobuf messages of the type `Person`, we could convert them into JSON documents of the format: ```json { "firstName": "caleb", "lastName": "quaye", "email": "caleb@myspace.com" } ``` With the following config: ```yaml pipeline: processors: - protobuf: operator: to_json message: testing.Person import_paths: [ testing/schema ] ``` ### [](#json-to-protobuf-using-buf-schema-registry)JSON to Protobuf using Buf Schema Registry If we have the following protobuf definition within a BSR module hosted at `buf.build/exampleco/mymodule`: ```protobuf syntax = "proto3"; package testing; import "google/protobuf/timestamp.proto"; message Person { string first_name = 1; string last_name = 2; string full_name = 3; int32 age = 4; int32 id = 5; // Unique ID number for this person. string email = 6; google.protobuf.Timestamp last_updated = 7; } ``` And a stream of JSON documents of the form: ```json { "firstName": "caleb", "lastName": "quaye", "email": "caleb@myspace.com" } ``` We can convert the documents into protobuf messages with the following config: ```yaml pipeline: processors: - protobuf: operator: from_json message: testing.Person bsr: - module: buf.build/exampleco/mymodule api_key: xxx ``` ### [](#protobuf-to-json-using-buf-schema-registry)Protobuf to JSON using Buf Schema Registry If we have the following protobuf definition within a BSR module hosted at `buf.build/exampleco/mymodule`: ```protobuf syntax = "proto3"; package testing; import "google/protobuf/timestamp.proto"; message Person { string first_name = 1; string last_name = 2; string full_name = 3; int32 age = 4; int32 id = 5; // Unique ID number for this person. string email = 6; google.protobuf.Timestamp last_updated = 7; } ``` And a stream of protobuf messages of the type `Person`, we could convert them into JSON documents of the format: ```json { "firstName": "caleb", "lastName": "quaye", "email": "caleb@myspace.com" } ``` With the following config: ```yaml pipeline: processors: - protobuf: operator: to_json message: testing.Person bsr: - module: buf.build/exampleco/mymodule api_key: xxxx ``` --- # Page 262: qdrant **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/qdrant.md --- # qdrant --- title: qdrant latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/qdrant page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/qdrant.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/qdrant.adoc page-git-created-date: "2025-05-19" page-git-modified-date: "2025-05-19" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/qdrant/)[Output](/redpanda-cloud/develop/connect/components/outputs/qdrant/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/qdrant/ "View the Self-Managed version of this component") Query items within a [Qdrant collection](https://qdrant.tech/documentation/concepts/collections/) and filter the returned results. #### Common ```yml processors: label: "" qdrant: grpc_host: "" # No default (required) api_token: "" collection_name: "" # No default (required) vector_mapping: "" # No default (required) filter: "" # No default (optional) payload_fields: [] payload_filter: include limit: 10 ``` #### Advanced ```yml processors: label: "" qdrant: grpc_host: "" # No default (required) api_token: "" tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] collection_name: "" # No default (required) vector_mapping: "" # No default (required) filter: "" # No default (optional) payload_fields: [] payload_filter: include limit: 10 ``` ## [](#fields)Fields ### [](#api_token)`api_token` The Qdrant API token to use for authentication, which defaults to an empty string. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#collection_name)`collection_name` The name of the Qdrant collection you want to query. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#filter)`filter` Specify additional filtering to perform on returned results. Mappings must return [a valid filter](https://qdrant.tech/documentation/concepts/filtering/) using the proto3-encoded form. **Type**: `string` ```yaml # Examples: filter: |- root.must = [ {"has_id":{"has_id":[{"num": 8}, { "uuid":"1234-5678-90ab-cdef" }]}}, {"field":{"key": "city", "match": {"text": "London"}}}, ] # --- filter: |- root.must = [ {"field":{"key": "city", "match": {"text": "London"}}}, ] root.must_not = [ {"field":{"color": "city", "match": {"text": "red"}}}, ] ``` ### [](#grpc_host)`grpc_host` The gRPC host of the Qdrant server. **Type**: `string` ```yaml # Examples: grpc_host: localhost:6334 # --- grpc_host: xyz-example.eu-central.aws.cloud.qdrant.io:6334 ``` ### [](#limit)`limit` The maximum number of points to return from the collection. **Type**: `int` **Default**: `10` ### [](#payload_fields)`payload_fields[]` The fields to include or exclude in returned results. Use this field in combination with `payload_filter`. **Type**: `array` **Default**: `[]` ### [](#payload_filter)`payload_filter` Whether to include or exclude the fields specified in `payload_fields` from the returned results. **Type**: `string` **Default**: `include` | Option | Summary | | --- | --- | | exclude | Exclude the payload fields specified in payload_fields. | | include | Include the payload fields specified in payload_fields. | ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#vector_mapping)`vector_mapping` A mapping to extract search vectors from the returned document. **Type**: `string` ```yaml # Examples: vector_mapping: root = [1.2, 0.5, 0.76] # --- vector_mapping: root = this.vector # --- vector_mapping: root = [[0.352,0.532,0.532,0.234],[0.352,0.532,0.532,0.234]] # --- vector_mapping: root = {"some_sparse": {"indices":[23,325,532],"values":[0.352,0.532,0.532]}} # --- vector_mapping: root = {"some_multi": [[0.352,0.532,0.532,0.234],[0.352,0.532,0.532,0.234]]} # --- vector_mapping: root = {"some_dense": [0.352,0.532,0.532,0.234]} ``` --- # Page 263: rate_limit **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/rate_limit.md --- # rate\_limit --- title: rate_limit latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/rate_limit page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/rate_limit.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/rate_limit.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/rate_limit/ "View the Self-Managed version of this component") Throttles the throughput of a pipeline according to a specified [`rate_limit`](../../rate_limits/about/) resource. Rate limits are shared across components and therefore apply globally to all processing pipelines. ```yml # Config fields, showing default values label: "" rate_limit: resource: "" # No default (required) ``` ## [](#fields)Fields ### [](#resource)`resource` The target [`rate_limit` resource](../../rate_limits/about/). **Type**: `string` --- # Page 264: redis_script **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/redis_script.md --- # redis\_script --- title: redis_script latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/redis_script page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/redis_script.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/redis_script.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/redis_script/ "View the Self-Managed version of this component") Performs actions against Redis using [LUA scripts](https://redis.io/docs/latest/develop/programmability/eval-intro/). #### Common ```yml processors: label: "" redis_script: url: "" # No default (required) script: "" # No default (required) args_mapping: "" # No default (required) keys_mapping: "" # No default (required) ``` #### Advanced ```yml processors: label: "" redis_script: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] script: "" # No default (required) args_mapping: "" # No default (required) keys_mapping: "" # No default (required) retries: 3 retry_period: 500ms ``` Actions are performed for each message and the message contents are replaced with the result. In order to merge the result into the original message compose this processor within a [`branch` processor](../branch/). ## [](#examples)Examples ### [](#running-a-script)Running a script The following example will use a script execution to get next element from a sorted set and set its score with timestamp unix nano value. ```yaml pipeline: processors: - redis_script: url: TODO script: | local value = redis.call("ZRANGE", KEYS[1], '0', '0') if next(elements) == nil then return '' end redis.call("ZADD", "XX", KEYS[1], ARGV[1], value) return value keys_mapping: 'root = [ meta("key") ]' args_mapping: 'root = [ timestamp_unix_nano() ]' ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of arguments required for the specified Redis script. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.key ] # --- args_mapping: root = [ meta("kafka_key"), "hardcoded_value" ] ``` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#keys_mapping)`keys_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of keys matching in size to the number of arguments required for the specified Redis script. **Type**: `string` ```yaml # Examples: keys_mapping: root = [ this.key ] # --- keys_mapping: root = [ meta("kafka_key"), this.count ] ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#retries)`retries` The maximum number of retries before abandoning a request. **Type**: `int` **Default**: `3` ### [](#retry_period)`retry_period` The time to wait before consecutive retry attempts. **Type**: `string` **Default**: `500ms` ### [](#script)`script` A script to use for the target operator. It has precedence over the 'command' field. **Type**: `string` ```yaml # Examples: script: return redis.call('set', KEYS[1], ARGV[1]) ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 265: redis **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/redis.md --- # redis --- title: redis latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/redis page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/redis.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/redis.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/redis/)[Cache](/redpanda-cloud/develop/connect/components/caches/redis/)[Rate\_limit](/redpanda-cloud/develop/connect/components/rate_limits/redis/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/redis/ "View the Self-Managed version of this component") Performs actions against Redis that aren’t possible using a [`cache`](../cache/) processor. Actions are performed for each message and the message contents are replaced with the result. In order to merge the result into the original message compose this processor within a [`branch` processor](../branch/). #### Common ```yml processors: label: "" redis: url: "" # No default (required) command: "" # No default (optional) args_mapping: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" redis: url: "" # No default (required) kind: simple master: "" client_name: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] command: "" # No default (optional) args_mapping: "" # No default (optional) retries: 3 retry_period: 500ms ``` ## [](#examples)Examples ### [](#querying-cardinality)Querying Cardinality If given payloads containing a metadata field `set_key` it’s possible to query and store the cardinality of the set for each message using a [`branch` processor](../branch/) in order to augment rather than replace the message contents: ```yaml pipeline: processors: - branch: processors: - redis: url: TODO command: scard args_mapping: 'root = [ meta("set_key") ]' result_map: 'root.cardinality = this' ``` ### [](#running-total)Running Total If we have JSON data containing number of friends visited during covid 19: ```json {"name":"ash","month":"feb","year":2019,"friends_visited":10} {"name":"ash","month":"apr","year":2019,"friends_visited":-2} {"name":"bob","month":"feb","year":2019,"friends_visited":3} {"name":"bob","month":"apr","year":2019,"friends_visited":1} ``` We can add a field that contains the running total number of friends visited: ```json {"name":"ash","month":"feb","year":2019,"friends_visited":10,"total":10} {"name":"ash","month":"apr","year":2019,"friends_visited":-2,"total":8} {"name":"bob","month":"feb","year":2019,"friends_visited":3,"total":3} {"name":"bob","month":"apr","year":2019,"friends_visited":1,"total":4} ``` Using the `incrby` command: ```yaml pipeline: processors: - branch: processors: - redis: url: TODO command: incrby args_mapping: 'root = [ this.name, this.friends_visited ]' result_map: 'root.total = this' ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of arguments required for the specified Redis command. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.key ] # --- args_mapping: root = [ meta("kafka_key"), this.count ] ``` ### [](#client_name)`client_name` Set the client name for the Redis connection. **Type**: `string` **Default**: `redpanda-connect` ### [](#command)`command` The command to execute. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: command: scard # --- command: incrby # --- command: ${! meta("command") } ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `simple` **Options**: `simple`, `cluster`, `failover` ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yaml # Examples: master: mymaster ``` ### [](#retries)`retries` The maximum number of retries before abandoning a request. **Type**: `int` **Default**: `3` ### [](#retry_period)`retry_period` The time to wait before consecutive retry attempts. **Type**: `string` **Default**: `500ms` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yaml # Examples: url: redis://:6379 # --- url: redis://localhost:6379 # --- url: redis://foousername:foopassword@redisplace:6379 # --- url: redis://:foopassword@redisplace:6379 # --- url: redis://localhost:6379/1 # --- url: redis://localhost:6379/1,redis://localhost:6380/1 ``` --- # Page 266: resource **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/resource.md --- # resource --- title: resource latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/resource page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/resource.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/resource.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/resource/)[Input](/redpanda-cloud/develop/connect/components/inputs/resource/)[Output](/redpanda-cloud/develop/connect/components/outputs/resource/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/resource/ "View the Self-Managed version of this component") Resource is a processor type that runs a processor resource identified by its label. ```yml # Config fields, showing default values resource: "" ``` This processor allows you to reference the same configured processor resource in multiple places, and can also tidy up large nested configs. For example, the config: ```yaml pipeline: processors: - mapping: | root.message = this root.meta.link_count = this.links.length() root.user.age = this.user.age.number() ``` Is equivalent to: ```yaml pipeline: processors: - resource: foo_proc processor_resources: - label: foo_proc mapping: | root.message = this root.meta.link_count = this.links.length() root.user.age = this.user.age.number() ``` --- # Page 267: retry **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/retry.md --- # retry --- title: retry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/retry page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/retry.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/retry.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/retry/)[Output](/redpanda-cloud/develop/connect/components/outputs/retry/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/retry/ "View the Self-Managed version of this component") Attempts to execute a series of child processors until success. ```yml # Config fields, showing default values label: "" retry: backoff: initial_interval: 500ms max_interval: 10s max_elapsed_time: 1m processors: [] # No default (required) parallel: false max_retries: 0 ``` Executes child processors and if a resulting message is errored then, after a specified backoff period, the same original message will be attempted again through those same processors. If the child processors result in more than one message then the retry mechanism will kick in if _any_ of the resulting messages are errored. It is important to note that any mutations performed on the message during these child processors will be discarded for the next retry, and therefore it is safe to assume that each execution of the child processors will always be performed on the data as it was when it first reached the retry processor. By default the retry backoff has a specified [`max_elapsed_time`](#backoffmax_elapsed_time), if this time period is reached during retries and an error still occurs these errored messages will proceed through to the next processor after the retry (or your outputs). Normal [error handling patterns](../../../configuration/error_handling/) can be used on these messages. In order to avoid permanent loops any error associated with messages as they first enter a retry processor will be cleared. ## [](#metadata)Metadata This processor adds the following metadata fields to each message: ```text - retry_count - The number of retry attempts. - backoff_duration - The total time elapsed while performing retries. ``` > ⚠️ **CAUTION: Batching** > > Batching > > If you wish to wrap a batch-aware series of processors then take a look at the [batching section](#batching). ## [](#examples)Examples ### [](#stop-ignoring-me-taz)Stop ignoring me Taz Here we have a config where I generate animal noises and send them to Taz via HTTP. Taz has a tendency to stop his servers whenever I dispatch my animals upon him, and therefore these HTTP requests sometimes fail. However, I have the retry processor and with this super power I can specify a back off policy and it will ensure that for each animal noise the HTTP processor is attempted until either it succeeds or my Redpanda Connect instance is stopped. I even go as far as to zero-out the maximum elapsed time field, which means that for each animal noise I will wait indefinitely, because I really really want Taz to receive every single animal noise that he is entitled to. ```yaml input: generate: interval: 1s mapping: 'root.noise = [ "woof", "meow", "moo", "quack" ].index(random_int(min: 0, max: 3))' pipeline: processors: - retry: backoff: initial_interval: 100ms max_interval: 5s max_elapsed_time: 0s processors: - http: url: 'http://example.com/try/not/to/dox/taz' verb: POST output: # Drop everything because it's junk data, I don't want it lol drop: {} ``` ## [](#fields)Fields ### [](#backoff)`backoff` Determine time intervals and cut offs for retry attempts. **Type**: `object` ### [](#backoff-initial_interval)`backoff.initial_interval` The initial period to wait between retry attempts. The retry interval increases for each failed attempt, up to the `backoff.max_interval` value. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `500ms` ```yaml # Examples: initial_interval: 50ms # --- initial_interval: 1s ``` ### [](#backoff-max_elapsed_time)`backoff.max_elapsed_time` The maximum overall period of time to spend on retry attempts before the request is aborted. Setting this value to a zeroed duration (such as `0s`) will result in unbounded retries. **Type**: `string` **Default**: `1m` ```yaml # Examples: max_elapsed_time: 1m # --- max_elapsed_time: 1h ``` ### [](#backoff-max_interval)`backoff.max_interval` The maximum period to wait between retry attempts **Type**: `string` **Default**: `10s` ```yaml # Examples: max_interval: 5s # --- max_interval: 1m ``` ### [](#max_retries)`max_retries` The maximum number of retry attempts before the request is aborted. Setting this value to `0` will result in unbounded number of retries. **Type**: `int` **Default**: `0` ### [](#parallel)`parallel` When processing batches of messages these batches are ignored and the processors apply to each message sequentially. However, when this field is set to `true` each message will be processed in parallel. Caution should be made to ensure that batch sizes do not surpass a point where this would cause resource (CPU, memory, API limits) contention. **Type**: `bool` **Default**: `false` ### [](#processors)`processors[]` A list of [processors](../about/) to execute on each message. **Type**: `processor` ## [](#batching)Batching When messages are batched the child processors of a retry are executed for each individual message in isolation, performed serially by default but in parallel when the field [`parallel`](#parallel) is set to `true`. This is an intentional limitation of the retry processor and is done in order to ensure that errors are correctly associated with a given input message. Otherwise, the archiving, expansion, grouping, filtering and so on of the child processors could obfuscate this relationship. If the target behavior of your retried processors is "batch aware", in that you wish to perform some processing across the entire batch of messages and repeat it in the event of errors, you can use an [`archive` processor](../archive/) to collapse the batch into an individual message. Then, within these child processors either perform your batch aware processing on the archive, or use an [`unarchive` processor](../unarchive/) in order to expand the single message back out into a batch. For example, if the retry processor were being used to wrap an HTTP request where the payload data is a batch archived into a JSON array it should look something like this: ```yaml pipeline: processors: - archive: format: json_array - retry: processors: - http: url: example.com/nope verb: POST - unarchive: format: json_array ``` --- # Page 268: schema_registry_decode **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/schema_registry_decode.md --- # schema\_registry\_decode --- title: schema_registry_decode latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/schema_registry_decode page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/schema_registry_decode.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/schema_registry_decode.adoc categories: "[\"Parsing\",\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/schema_registry_decode/ "View the Self-Managed version of this component") Automatically decodes and validates messages with schemas from a Confluent Schema Registry service. This processor uses the [Franz Kafka Schema Registry client](https://github.com/twmb/franz-go/tree/master/pkg/sr). #### Common ```yml processors: label: "" schema_registry_decode: avro: raw_unions: "" # No default (optional) preserve_logical_types: false translate_kafka_connect_types: false mapping: "" # No default (optional) store_schema_metadata: "" # No default (optional) protobuf: use_proto_names: false use_enum_numbers: false emit_unpopulated: false emit_default_values: false serialize_to_json: true cache_duration: 10m url: "" # No default (required) default_schema_id: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" schema_registry_decode: avro: raw_unions: "" # No default (optional) preserve_logical_types: false translate_kafka_connect_types: false mapping: "" # No default (optional) store_schema_metadata: "" # No default (optional) protobuf: use_proto_names: false use_enum_numbers: false emit_unpopulated: false emit_default_values: false serialize_to_json: true cache_duration: 10m url: "" # No default (required) default_schema_id: "" # No default (optional) oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] ``` Decodes messages automatically from a schema stored within a [Confluent Schema Registry service](https://docs.confluent.io/platform/current/schema-registry/index.html) by extracting a schema ID from the message and obtaining the associated schema from the registry. If a message fails to match against the schema then it will remain unchanged and the error can be caught using [error-handling methods](../../../configuration/error_handling/). Avro, Protobuf and JSON schemas are supported, all are capable of expanding from schema references as of v4.22.0. ## [](#avro-json-format)Avro JSON format By default, this processor expects documents formatted as [Avro JSON](https://avro.apache.org/docs/current/specification/) when decoding with Avro schemas. In this format, the value of a union is encoded in JSON as follows: - If the union’s type is `null`, it is encoded as a JSON `null`. - Otherwise, the union is encoded as a JSON object with one name/value pair. The name is the type’s name, and the value is the recursively-encoded value. The user-specified name is used for Avro’s named types (record, fixed, or enum). For other types, the type name is used. For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would encode: - `null` as a JSON `null` - The string `"a"` as `{"string": "a"}` - A `Transaction` instance as `{"Transaction": {…​}}`, where `{…​}` indicates the JSON encoding of a `Transaction` instance Alternatively, you can create documents in [standard/raw JSON format](https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull) by setting the field [`avro.raw_unions`](#avro-raw_unions) to `true`. ## [](#protobuf-format)Protobuf format This processor decodes Protobuf messages to JSON documents. For more information about the JSON mapping of Protobuf messages, see the [Protocol Buffers documentation](https://developers.google.com/protocol-buffers/docs/proto3#json). ## [](#metadata)Metadata This processor adds the following metadata to processed messages: - `schema_id`: The ID of the schema in the schema registry associated with the message. ## [](#fields)Fields ### [](#avro)`avro` Configuration for how to decode schemas that are of type AVRO. **Type**: `object` ### [](#avro-mapping)`avro.mapping` Define a custom mapping to apply to the JSON representation of Avro schemas. You can use mappings to convert custom types emitted by other tools, such as Debezium, into standard Avro types. **Type**: `string` ```yaml # Examples: mapping: |- map isDebeziumTimestampType { root = this.type == "long" && this."connect.name" == "io.debezium.time.Timestamp" && !this.exists("logicalType") } map debeziumTimestampToAvroTimestamp { let mapped_fields = this.fields.or([]).map_each(item -> item.apply("debeziumTimestampToAvroTimestamp")) root = match { this.type == "record" => this.assign({"fields": $mapped_fields}) this.type.type() == "array" => this.assign({"type": this.type.map_each(item -> item.apply("debeziumTimestampToAvroTimestamp"))}) # Add a logical type so that it's decoded as a timestamp instead of a long. this.type.type() == "object" && this.type.apply("isDebeziumTimestampType") => this.merge({"type":{"logicalType": "timestamp-millis"}}) _ => this } } root = this.apply("debeziumTimestampToAvroTimestamp") ``` ### [](#avro-preserve_logical_types)`avro.preserve_logical_types` Choose whether to: - Transform logical types into their primitive type (default). For example, decimals become raw bytes and timestamps become plain integers. - Preserve logical types. Set to `true` to preserve logical types. **Type**: `bool` **Default**: `false` ### [](#avro-raw_unions)`avro.raw_unions` Whether Avro messages should be decoded into normal JSON (JSON that meets the expectations of regular internet JSON) rather than [Avro JSON](https://avro.apache.org/docs/current/specification/). If set to `false`, Avro messages are decoded as [Avro JSON](https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodec). For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would be decoded as: - A `null` as a JSON `null` - The string `"a"` as `{"string": "a"}` - A `Transaction` instance as `{"Transaction": {…​}}`, where `{…​}` indicates the JSON encoding of a `Transaction` instance. If set to `true`, Avro messages are decoded as [standard JSON](https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull). For example, the same union schema `["null","string","Transaction"]` is decoded as: - A `null` as JSON `null` - The string `"a"` as `"a"` - A `Transaction` instance as `{…​}`, where `{…​}` indicates the JSON encoding of a `Transaction` instance. For more details on the difference between standard JSON and Avro JSON, see the [comment in Goavro](https://github.com/linkedin/goavro/blob/5ec5a5ee7ec82e16e6e2b438d610e1cab2588393/union.go#L224-L249) and the [underlying library used for Avro serialization](https://github.com/linkedin/goavro). **Type**: `bool` ### [](#avro-store_schema_metadata)`avro.store_schema_metadata` Optionally store the schema used to decode messages as a metadata field under the given name. This field can later be referenced in other components such as a `parquet_encode` processor in order to automatically infer their schema. **Type**: `string` ### [](#avro-translate_kafka_connect_types)`avro.translate_kafka_connect_types` Only valid if preserve\_logical\_types is true. This decodes various Kafka Connect types into their bloblang equivalents when not representable by standard logical types according to the Avro standard. Types that are currently translated: | Type Name | Bloblang Type | Description | | --- | --- | --- | | io.debezium.time.Date | timestamp | Date without time (days since epoch) | | io.debezium.time.Timestamp | timestamp | Timestamp without timezone (milliseconds since epoch) | | io.debezium.time.MicroTimestamp | timestamp | Timestamp with microsecond precision | | io.debezium.time.NanoTimestamp | timestamp | Timestamp with nanosecond precision | | io.debezium.time.ZonedTimestamp | timestamp | Timestamp with timezone (ISO-8601 format) | | io.debezium.time.Year | timestamp at January 1st at 00:00:00 | Year value | | io.debezium.time.Time | timestamp at the unix epoch | Time without date (milliseconds past midnight) | | io.debezium.time.MicroTime | timestamp at the unix epoch | Time with microsecond precision | | io.debezium.time.NanoTime | timestamp at the unix epoch | Time with nanosecond precision | **Type**: `bool` **Default**: `false` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#cache_duration)`cache_duration` The duration after which a cached schema is considered stale and is removed from the cache. **Type**: `string` **Default**: `10m` ```yaml # Examples: cache_duration: 1h # --- cache_duration: 5m ``` ### [](#default_schema_id)`default_schema_id` This schema ID is used when a message’s schema header cannot be read (`ErrBadHeader`). If this value is not set, schema header errors are returned. This configuration does not work with protobuf schemas. > 💡 **TIP** > > You can also use the [`with_schema_registry_header`](../../../guides/bloblang/functions/#with_schema_registry_header) bloblang function to add a schema ID to messages. **Type**: `int` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication. This feature is in beta and may change in future releases. JWT tokens provide secure, stateless authentication between services. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Additional key-value pairs to include in the JWT header (optional). These headers provide extra metadata for JWT processing. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#oauth)`oauth` Configure OAuth version 1.0 authentication for secure API access. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#protobuf)`protobuf` Configuration for how to decode schemas that are of type PROTOBUF. **Type**: `object` ### [](#protobuf-emit_default_values)`protobuf.emit_default_values` Whether to emit default-valued primitive fields, empty lists, and empty maps. emit\_unpopulated takes precedence over emit\_default\_values **Type**: `bool` **Default**: `false` ### [](#protobuf-emit_unpopulated)`protobuf.emit_unpopulated` Whether to emit unpopulated fields. It does not emit unpopulated oneof fields or unpopulated extension fields. **Type**: `bool` **Default**: `false` ### [](#protobuf-serialize_to_json)`protobuf.serialize_to_json` If messages should be serialized to JSON bytes. If false then the message is kept in decoded form, which means that 64 bit integers are not converted to strings and types for bytes and google.protobuf.Timestamp are preserved (as they are not serialized to JSON strings). **Type**: `bool` **Default**: `true` ### [](#protobuf-use_enum_numbers)`protobuf.use_enum_numbers` Emits enum values as numbers. **Type**: `bool` **Default**: `false` ### [](#protobuf-use_proto_names)`protobuf.use_proto_names` Use proto field name instead of lowerCamelCase name. **Type**: `bool` **Default**: `false` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The base URL of the schema registry service. **Type**: `string` --- # Page 269: schema_registry_encode **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/schema_registry_encode.md --- # schema\_registry\_encode --- title: schema_registry_encode latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/schema_registry_encode page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/schema_registry_encode.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/schema_registry_encode.adoc categories: "[\"Parsing\",\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/schema_registry_encode/ "View the Self-Managed version of this component") Automatically encodes and validates messages with schemas from a Confluent Schema Registry service. This processor uses the [Franz Kafka Schema Registry client](https://github.com/twmb/franz-go/tree/master/pkg/sr). #### Common ```yml processors: label: "" schema_registry_encode: url: "" # No default (required) subject: "" # No default (required) refresh_period: 10m schema_metadata: "" format: "" # No default (optional) avro: raw_json: "" # No default (optional) record_name: "" namespace: "" ``` #### Advanced ```yml processors: label: "" schema_registry_encode: url: "" # No default (required) subject: "" # No default (required) refresh_period: 10m schema_metadata: "" format: "" # No default (optional) normalize: true avro: raw_json: "" # No default (optional) record_name: "" namespace: "" oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] ``` Encodes messages automatically from schemas obtains from a [Confluent Schema Registry service](https://docs.confluent.io/platform/current/schema-registry/index.html) by polling the service for the latest schema version for target subjects. If a message fails to encode under the schema then it will remain unchanged and the error can be caught using [error-handling methods](../../../configuration/error_handling/). Avro, Protobuf and JSON schemas are supported, all are capable of expanding from schema references as of v4.22.0. ## [](#avro-json-format)Avro JSON format By default, this processor expects documents formatted as [Avro JSON](https://avro.apache.org/docs/current/specification/) when encoding with Avro schemas. In this format, the value of a union is encoded in JSON as follows: - If the union’s type is `null`, it is encoded as a JSON `null`. - Otherwise, the union is encoded as a JSON object with one name/value pair. The name is the type’s name, and the value is the recursively-encoded value. The user-specified name is used for Avro’s named types (record, fixed, or enum). For other types, the type name is used. For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would encode: - A `null` as a JSON `null` - The string `"a"` as `{"string": "a"}` - A `Transaction` instance as `{"Transaction": {…​}}`, where `{…​}` indicates the JSON encoding of a `Transaction` instance Alternatively, you can consume documents in [standard/raw JSON format](https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull) by setting the field [`avro_raw_json`](#avro_raw_json) to `true`. ### [](#known-issues)Known issues Important! There is an outstanding issue in the [avro serializing library](https://github.com/linkedin/goavro) that Redpanda Connect uses which means it [doesn’t encode logical types correctly](https://github.com/linkedin/goavro/issues/252). It’s still possible to encode logical types that are in-line with the spec if `avro_raw_json` is set to true, though now of course non-logical types will not be in-line with the spec. ## [](#protobuf-format)Protobuf format This processor encodes Protobuf messages either from any format parsed within Redpanda Connect (encoded as JSON by default), or from raw JSON documents. For more information about the JSON mapping of Protobuf messages, see the [Protocol Buffers documentation](https://developers.google.com/protocol-buffers/docs/proto3#json). ### [](#multiple-message-support)Multiple message support When a target subject presents a Protobuf schema that contains multiple messages it becomes ambiguous which message definition a given input data should be encoded against. In such scenarios Redpanda Connect will attempt to encode the data against each of them and select the first to successfully match against the data, this process currently **ignores all nested message definitions**. In order to speed up this exhaustive search the last known successful message will be attempted first for each subsequent input. We will be considering alternative approaches in future so please [get in touch](https://redpanda.com/slack) with thoughts and feedback. ## [](#fields)Fields ### [](#avro)`avro` Configuration for Avro encoding. **Type**: `object` ### [](#avro-namespace)`avro.namespace` The Avro namespace for the root record type when encoding from a common schema (schema\_metadata mode). **Type**: `string` **Default**: `""` ### [](#avro-raw_json)`avro.raw_json` Whether messages encoded in Avro format should be parsed as normal JSON rather than Avro JSON. Overrides the deprecated top-level `avro_raw_json` when set. **Type**: `bool` ### [](#avro-record_name)`avro.record_name` The name to use for the root Avro record type when encoding from a common schema (schema\_metadata mode). If empty, derived from the subject. **Type**: `string` **Default**: `""` ### [](#basic_auth)`basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#basic_auth-enabled)`basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#basic_auth-password)`basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#basic_auth-username)`basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#format)`format` The encoding format to use when converting a common schema from metadata. Required when `schema_metadata` is set. **Type**: `string` **Options**: `avro`, `json_schema` ### [](#jwt)`jwt` Beta Configure JSON Web Token (JWT) authentication. This feature is in beta and may change in future releases. JWT tokens provide secure, stateless authentication between services. **Type**: `object` ### [](#jwt-claims)`jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#jwt-enabled)`jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#jwt-headers)`jwt.headers` Additional key-value pairs to include in the JWT header (optional). These headers provide extra metadata for JWT processing. **Type**: `object` **Default**: `{}` ### [](#jwt-private_key_file)`jwt.private_key_file` Path to a file containing the PEM-encoded private key using PKCS#1 or PKCS#8 format. The private key must be compatible with the algorithm specified in the `signing_method` field. **Type**: `string` **Default**: `""` ### [](#jwt-signing_method)`jwt.signing_method` The cryptographic algorithm used to sign the JWT token. Supported algorithms include RS256, RS384, RS512, and EdDSA. This algorithm must be compatible with the private key specified in the `private_key_file` field. **Type**: `string` **Default**: `""` ### [](#normalize)`normalize` Whether to normalize the schema before registering with the schema registry (schema\_metadata mode only). **Type**: `bool` **Default**: `true` ### [](#oauth)`oauth` Configure OAuth version 1.0 authentication for secure API access. **Type**: `object` ### [](#oauth-access_token)`oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#oauth-access_token_secret)`oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_key)`oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#oauth-consumer_secret)`oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#oauth-enabled)`oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#refresh_period)`refresh_period` The period after which a schema is refreshed for each subject, this is done by polling the schema registry service. **Type**: `string` **Default**: `10m` ```yaml # Examples: refresh_period: 60s # --- refresh_period: 1h ``` ### [](#schema_metadata)`schema_metadata` When set, the processor reads a schema in benthos common schema format from this metadata key on each message, converts it to the format specified by `format`, registers it with the schema registry under the configured subject, and encodes the message. When empty (the default), the processor pulls the latest schema from the registry instead. **Type**: `string` **Default**: `""` ### [](#subject)`subject` The schema subject to derive schemas from. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ```yaml # Examples: subject: foo # --- subject: ${! meta("kafka_topic") } ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#url)`url` The base URL of the schema registry service. **Type**: `string` --- # Page 270: select_parts **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/select_parts.md --- # select\_parts --- title: select_parts latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/select_parts page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/select_parts.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/select_parts.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/select_parts/ "View the Self-Managed version of this component") Cherry pick a set of messages from a batch by their index. Indexes larger than the number of messages are simply ignored. ```yml # Config fields, showing default values label: "" select_parts: parts: [] ``` The selected parts are added to the new message batch in the same order as the selection array. E.g. with 'parts' set to \[ 2, 0, 1 \] and the message parts \[ '0', '1', '2', '3' \], the output will be \[ '2', '0', '1' \]. If none of the selected parts exist in the input batch (resulting in an empty output message) the batch is dropped entirely. Message indexes can be negative, and if so the part will be selected from the end counting backwards starting from -1. E.g. if index = -1 then the selected part will be the last part of the message, if index = -2 then the part before the last element with be selected, and so on. This processor is only applicable to [batched messages](../../../configuration/batching/). ## [](#fields)Fields ### [](#parts)`parts[]` An array of message indexes of a batch. Indexes can be negative, and if so the part will be selected from the end counting backwards starting from -1. **Type**: `int` **Default**: `[]` --- # Page 271: slack_thread **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/slack_thread.md --- # slack\_thread --- title: slack_thread latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/slack_thread page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/slack_thread.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/slack_thread.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/slack_thread/ "View the Self-Managed version of this component") Reads a Slack thread using the Slack API method [conversations.replies](https://api.slack.com/methods/conversations.replies). ```yml # Common configuration fields, showing default values label: "" slack_thread: bot_token: "" # No default (required) channel_id: "" # No default (required) thread_ts: "" # No default (required) ``` ## [](#fields)Fields ### [](#bot_token)`bot_token` Your Slack bot user’s OAuth token, which must have the correct permissions to read messages from the Slack channel specified in `channel_id`. **Type**: `string` ### [](#channel_id)`channel_id` The encoded ID of the Slack channel from which to read threads. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` ### [](#thread_ts)`thread_ts` The timestamp of the parent message of the thread you want to read. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 272: sleep **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/sleep.md --- # sleep --- title: sleep latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/sleep page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/sleep.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/sleep.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/sleep/ "View the Self-Managed version of this component") Sleep for a period of time specified as a duration string for each message. This processor will interpolate functions within the `duration` field, you can find a list of functions [here](../../../configuration/interpolation/#bloblang-queries). ```yml # Config fields, showing default values label: "" sleep: duration: "" # No default (required) ``` ## [](#fields)Fields ### [](#duration)`duration` The duration of time to sleep for each execution. This field supports [interpolation functions](../../../configuration/interpolation/#bloblang-queries). **Type**: `string` --- # Page 273: split **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/split.md --- # split --- title: split latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/split page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/split.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/split.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/split/ "View the Self-Managed version of this component") Breaks message batches (synonymous with multiple part messages) into smaller batches. The size of the resulting batches are determined either by a discrete size or, if the field `byte_size` is non-zero, then by total size in bytes (which ever limit is reached first). ```yml # Config fields, showing default values label: "" split: size: 1 byte_size: 0 ``` This processor is for breaking batches down into smaller ones. In order to break a single message out into multiple messages use the [`unarchive` processor](../unarchive/). If there is a remainder of messages after splitting a batch the remainder is also sent as a single batch. For example, if your target size was 10, and the processor received a batch of 95 message parts, the result would be 9 batches of 10 messages followed by a batch of 5 messages. ## [](#fields)Fields ### [](#byte_size)`byte_size` An optional target of total message bytes. **Type**: `int` **Default**: `0` ### [](#size)`size` The target number of messages. **Type**: `int` **Default**: `1` --- # Page 274: sql_insert **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/sql_insert.md --- # sql\_insert --- title: sql_insert latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/sql_insert page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/sql_insert.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/sql_insert.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/sql_insert/)[Output](/redpanda-cloud/develop/connect/components/outputs/sql_insert/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/sql_insert/ "View the Self-Managed version of this component") Inserts rows into an SQL database for each message, and leaves the message unchanged. #### Common ```yml processors: label: "" sql_insert: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) args_mapping: "" # No default (required) ``` #### Advanced ```yml processors: label: "" sql_insert: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) args_mapping: "" # No default (required) prefix: "" # No default (optional) suffix: "" # No default (optional) options: [] # No default (optional) init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` If the insert fails to execute then the message will still remain unchanged and the error can be caught using [error handling methods](../../../configuration/error_handling/). ## [](#examples)Examples ### [](#table-insert-mysql)Table Insert (MySQL) Here we insert rows into a database by populating the columns id, name and topic with values extracted from messages and metadata: ```yaml pipeline: processors: - sql_insert: driver: mysql dsn: foouser:foopassword@tcp(localhost:3306)/foodb table: footable columns: [ id, name, topic ] args_mapping: | root = [ this.user.id, this.user.name, meta("kafka_topic"), ] ``` ## [](#dynamic-sql-operations)Dynamic SQL operations The `table` and `columns` fields are static strings that do not support Bloblang interpolation. For dynamic table names, dynamic column lists, DELETE operations, or any other SQL that `sql_insert` cannot express, use the [`sql_raw` processor](../sql_raw/) instead. To use Bloblang interpolation inside ``sql_raw’s `query`` field, you must enable `unsafe_dynamic_query: true`. > ⚠️ **CAUTION** > > Interpolating unsanitized values into a query can introduce SQL injection risks. Always validate or sanitize the interpolated value beforehand. ## [](#fields)Fields ### [](#args_mapping)`args_mapping` A [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of columns specified. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#columns)`columns[]` A list of columns to insert. **Type**: `array` ```yaml # Examples: columns: - foo - bar - baz ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#options)`options[]` A list of keyword options to add before the INTO clause of the query. **Type**: `array` ```yaml # Examples: options: - DELAYED - IGNORE ``` ### [](#prefix)`prefix` An optional prefix to prepend to the insert query (before INSERT). **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the insert query. **Type**: `string` ```yaml # Examples: suffix: ON CONFLICT (name) DO NOTHING ``` ### [](#table)`table` The table to insert to. **Type**: `string` ```yaml # Examples: table: foo ``` --- # Page 275: sql_raw **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/sql_raw.md --- # sql\_raw --- title: sql_raw latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/sql_raw page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/sql_raw.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/sql_raw.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/sql_raw/)[Input](/redpanda-cloud/develop/connect/components/inputs/sql_raw/)[Output](/redpanda-cloud/develop/connect/components/outputs/sql_raw/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/sql_raw/ "View the Self-Managed version of this component") Runs an arbitrary SQL query against a database and (optionally) returns the result as an array of objects, one for each row returned. #### Common ```yml processors: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (optional) args_mapping: "" # No default (optional) exec_only: "" # No default (optional) queries: [] # No default (optional) ``` #### Advanced ```yml processors: label: "" sql_raw: driver: "" # No default (required) dsn: "" # No default (required) query: "" # No default (optional) unsafe_dynamic_query: false args_mapping: "" # No default (optional) exec_only: "" # No default (optional) queries: [] # No default (optional) init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` If the query fails to execute then the message will remain unchanged and the error can be caught using [error handling methods](../../../configuration/error_handling/). For some scenarios where you might use this processor, see [Examples](#examples). ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) that includes the same number of values in an array as the placeholder arguments in the [`query`](#query) field. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#exec_only)`exec_only` Whether to discard the [`query`](#query) result. Set to `true` to leave the message contents unchanged, which is useful when you are executing inserts, updates, and so on. By default, the message contents are kept for the last query executed, and previous queries don’t change the results. **Type**: `bool` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#queries)`queries[]` A list of database statements to run in addition to your main [`query`](#query). If you specify multiple queries, they are executed within a single transaction. For more information, see [Examples](#examples). **Type**: `object` ### [](#queries-args_mapping)`queries[].args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `query`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#queries-exec_only)`queries[].exec_only` Whether the query result should be discarded. When set to `true` the message contents will remain unchanged, which is useful in cases where you are executing inserts, updates, etc. By default this is true for the last query, and previous queries don’t change the results. If set to true for any query but the last one, the subsequent `args_mappings` input is overwritten. **Type**: `bool` ### [](#queries-query)`queries[].query` The query to execute. The style of placeholder to use depends on the driver, some drivers require question marks (`?`) whereas others expect incrementing dollar signs (`$1`, `$2`, and so on) or colons (`:1`, `:2` and so on). The style to use is outlined in this table: | Driver | Placeholder Style | |---|---| | `clickhouse` | Dollar sign | | `mysql` | Question mark | | `postgres` | Dollar sign | | `pgx` | Dollar sign | | `mssql` | Question mark | | `sqlite` | Question mark | | `oracle` | Colon | | `snowflake` | Question mark | | `trino` | Question mark | | `gocosmos` | Colon | **Type**: `string` ### [](#query)`query` The query to execute. You must include the correct placeholders for the specified database driver. Some drivers use question marks (`?`), whereas others expect incrementing dollar signs (`$1`, `$2`, and so on) or colons (`:1`, `:2`, and so on). | Driver | Placeholder Style | | --- | --- | | clickhouse | Dollar sign ($) | | gocosmos | Colon (:) | | mysql | Question mark (?) | | mssql | Question mark (?) | | oracle | Colon (:) | | postgres | Dollar sign ($) | | snowflake | Question mark (?) | | spanner | Question mark (?) | | sqlite | Question mark (?) | | trino | Question mark (?) | **Type**: `string` ```yaml # Examples: query: INSERT INTO footable (foo, bar, baz) VALUES (?, ?, ?); # --- query: SELECT * FROM footable WHERE user_id = $1; ``` ### [](#unsafe_dynamic_query)`unsafe_dynamic_query` Whether to enable [interpolation functions](../../../configuration/interpolation/#bloblang-queries) in the query. Great care should be made to ensure your queries are defended against injection attacks. **Type**: `bool` **Default**: `false` ## [](#examples)Examples ### [](#table-insert-mysql)Table Insert (MySQL) The following example inserts rows into the table footable with the columns foo, bar and baz populated with values extracted from messages. ```yaml pipeline: processors: - sql_raw: driver: mysql dsn: foouser:foopassword@tcp(localhost:3306)/foodb query: "INSERT INTO footable (foo, bar, baz) VALUES (?, ?, ?);" args_mapping: '[ document.foo, document.bar, meta("kafka_topic") ]' exec_only: true ``` ### [](#table-query-postgresql)Table Query (PostgreSQL) Here we query a database for columns of footable that share a `user_id` with the message field `user.id`. A [`branch` processor](../branch/) is used in order to insert the resulting array into the original message at the path `foo_rows`. ```yaml pipeline: processors: - branch: processors: - sql_raw: driver: postgres dsn: postgres://foouser:foopass@localhost:5432/testdb?sslmode=disable query: "SELECT * FROM footable WHERE user_id = $1;" args_mapping: '[ this.user.id ]' result_map: 'root.foo_rows = this' ``` ### [](#dynamically-creating-tables-postgresql)Dynamically Creating Tables (PostgreSQL) Here we query a database for columns of footable that share a `user_id` with the message field `user.id`. A [`branch` processor](../branch/) is used in order to insert the resulting array into the original message at the path `foo_rows`. ```yaml pipeline: processors: - mapping: | root = this # Prevent SQL injection when using unsafe_dynamic_query meta table_name = "\"" + metadata("table_name").replace_all("\"", "\"\"") + "\"" - sql_raw: driver: postgres dsn: postgres://localhost/postgres unsafe_dynamic_query: true queries: - query: | CREATE TABLE IF NOT EXISTS ${!metadata("table_name")} (id varchar primary key, document jsonb); - query: | INSERT INTO ${!metadata("table_name")} (id, document) VALUES ($1, $2) ON CONFLICT (id) DO UPDATE SET document = EXCLUDED.document; args_mapping: | root = [ this.id, this.document.string() ] ``` --- # Page 276: sql_select **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/sql_select.md --- # sql\_select --- title: sql_select latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/sql_select page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/sql_select.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/sql_select.adoc categories: "[\"Integration\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/sql_select/)[Input](/redpanda-cloud/develop/connect/components/inputs/sql_select/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/sql_select/ "View the Self-Managed version of this component") Runs an SQL select query against a database and returns the result as an array of objects, one for each row returned, containing a key for each column queried and its value. #### Common ```yml processors: label: "" sql_select: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) where: "" # No default (optional) args_mapping: "" # No default (optional) ``` #### Advanced ```yml processors: label: "" sql_select: driver: "" # No default (required) dsn: "" # No default (required) table: "" # No default (required) columns: [] # No default (required) where: "" # No default (optional) args_mapping: "" # No default (optional) prefix: "" # No default (optional) suffix: "" # No default (optional) init_files: [] # No default (optional) init_statement: "" # No default (optional) conn_max_idle_time: "" # No default (optional) conn_max_life_time: "" # No default (optional) conn_max_idle: 2 conn_max_open: "" # No default (optional) ``` If the query fails to execute then the message will remain unchanged and the error can be caught using [error handling methods](../../../configuration/error_handling/). ## [](#examples)Examples ### [](#table-query-postgresql)Table Query (PostgreSQL) Here we query a database for columns of footable that share a `user_id` with the message `user.id`. A [`branch` processor](../branch/) is used in order to insert the resulting array into the original message at the path `foo_rows`: ```yaml pipeline: processors: - branch: processors: - sql_select: driver: postgres dsn: postgres://foouser:foopass@localhost:5432/testdb?sslmode=disable table: footable columns: [ '*' ] where: user_id = ? args_mapping: '[ this.user.id ]' result_map: 'root.foo_rows = this' ``` ## [](#fields)Fields ### [](#args_mapping)`args_mapping` An optional [Bloblang mapping](../../../guides/bloblang/about/) which should evaluate to an array of values matching in size to the number of placeholder arguments in the field `where`. **Type**: `string` ```yaml # Examples: args_mapping: root = [ this.cat.meow, this.doc.woofs[0] ] # --- args_mapping: root = [ meta("user.id") ] ``` ### [](#columns)`columns[]` A list of columns to query. **Type**: `array` ```yaml # Examples: columns: - "*" # --- columns: - foo - bar - baz ``` ### [](#conn_max_idle)`conn_max_idle` An optional maximum number of connections in the idle connection pool. If conn\_max\_open is greater than 0 but less than the new conn\_max\_idle, then the new conn\_max\_idle will be reduced to match the conn\_max\_open limit. If `value ⇐ 0`, no idle connections are retained. The default max idle connections is currently 2. This may change in a future release. **Type**: `int` **Default**: `2` ### [](#conn_max_idle_time)`conn_max_idle_time` An optional maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections idle time. **Type**: `string` ### [](#conn_max_life_time)`conn_max_life_time` An optional maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If `value ⇐ 0`, connections are not closed due to a connections age. **Type**: `string` ### [](#conn_max_open)`conn_max_open` An optional maximum number of open connections to the database. If conn\_max\_idle is greater than 0 and the new conn\_max\_open is less than conn\_max\_idle, then conn\_max\_idle will be reduced to match the new conn\_max\_open limit. If `value ⇐ 0`, then there is no limit on the number of open connections. The default is 0 (unlimited). **Type**: `int` ### [](#driver)`driver` A database [driver](#drivers) to use. **Type**: `string` **Options**: `mysql`, `postgres`, `pgx`, `clickhouse`, `mssql`, `sqlite`, `oracle`, `snowflake`, `trino`, `gocosmos`, `spanner`, `databricks` ### [](#dsn)`dsn` A Data Source Name to identify the target database. #### [](#drivers)Drivers The following is a list of supported drivers, their placeholder style, and their respective DSN formats: | Driver | Data Source Name Format | | --- | --- | | clickhouse | clickhouse://[username[:password]@][netloc][:port]/dbname[?param1=value1&…​¶mN=valueN] | | mysql | [username[:password]@][protocol[(address)]]/dbname[?param1=value1&…​¶mN=valueN] | | postgres and pgx | postgres://[user[:password]@][netloc][:port][/dbname][?param1=value1&…​] | | mssql | sqlserver://[user[:password]@][netloc][:port][?database=dbname¶m1=value1&…​] | | sqlite | file:/path/to/filename.db[?param&=value1&…​] | | oracle | oracle://[username[:password]@][netloc][:port]/service_name?server=server2&server=server3 | | snowflake | username[:password]@account_identifier/dbname/schemaname[?param1=value&…​¶mN=valueN] | | trino | http[s]://user[:pass]@host[:port][?parameters] | | gocosmos | AccountEndpoint=;AccountKey=[;TimeoutMs=][;Version=][;DefaultDb/Db=][;AutoId=][;InsecureSkipVerify=] | | spanner | projects/[PROJECT]/instances/[INSTANCE]/databases/[DATABASE] | | databricks | token:@:/ | Please note that the `postgres` and `pgx` drivers enforce SSL by default, you can override this with the parameter `sslmode=disable` if required. The `pgx` driver is an alternative to the standard `postgres` (pq) driver and comes with extra functionality such as support for array insertion. The `snowflake` driver supports multiple DSN formats. Please consult [the docs](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Connection_String) for more details. For [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), the DSN has the following format: `@//?warehouse=&role=&authenticator=snowflake_jwt&privateKey=`, where the value for the `privateKey` parameter can be constructed from an unencrypted RSA private key file `rsa_key.p8` using `openssl enc -d -base64 -in rsa_key.p8 | basenc --base64url -w0` (you can use `gbasenc` instead of `basenc` on OSX if you install `coreutils` via Homebrew). If you have a password-encrypted private key, you can decrypt it using `openssl pkcs8 -in rsa_key_encrypted.p8 -out rsa_key.p8`. Also, make sure fields such as the username are URL-encoded. The [`gocosmos`](https://pkg.go.dev/github.com/microsoft/gocosmos) driver is still experimental, but it has support for [hierarchical partition keys](https://learn.microsoft.com/en-us/azure/cosmos-db/hierarchical-partition-keys) as well as [cross-partition queries](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-query-container#cross-partition-query). Please refer to the [SQL notes](https://github.com/microsoft/gocosmos/blob/main/SQL.md) for details. **Type**: `string` ```yaml # Examples: dsn: clickhouse://username:password@host1:9000,host2:9000/database?dial_timeout=200ms&max_execution_time=60 # --- dsn: foouser:foopassword@tcp(localhost:3306)/foodb # --- dsn: postgres://foouser:foopass@localhost:5432/foodb?sslmode=disable # --- dsn: oracle://foouser:foopass@localhost:1521/service_name # --- dsn: token:dapi1234567890ab@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/warehouses/abc123def456 ``` ### [](#init_files)`init_files[]` An optional list of file paths containing SQL statements to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Glob patterns are supported, including super globs (double star). Care should be taken to ensure that the statements are idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If a statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `array` ```yaml # Examples: init_files: - ./init/*.sql # --- init_files: - ./foo.sql - ./bar.sql ``` ### [](#init_statement)`init_statement` An optional SQL statement to execute immediately upon the first connection to the target database. This is a useful way to initialise tables before processing data. Care should be taken to ensure that the statement is idempotent, and therefore would not cause issues when run multiple times after service restarts. If both `init_statement` and `init_files` are specified the `init_statement` is executed _after_ the `init_files`. If the statement fails for any reason a warning log will be emitted but the operation of this component will not be stopped. **Type**: `string` ```yaml # Examples: init_statement: |- CREATE TABLE IF NOT EXISTS some_table ( foo varchar(50) not null, bar integer, baz varchar(50), primary key (foo) ) WITHOUT ROWID; ``` ### [](#prefix)`prefix` An optional prefix to prepend to the query (before SELECT). **Type**: `string` ### [](#suffix)`suffix` An optional suffix to append to the select query. **Type**: `string` ### [](#table)`table` The table to query. **Type**: `string` ```yaml # Examples: table: foo ``` ### [](#where)`where` An optional where clause to add. Placeholder arguments are populated with the `args_mapping` field. Placeholders should always be question marks, and will automatically be converted to dollar syntax when the postgres or clickhouse drivers are used. **Type**: `string` ```yaml # Examples: where: meow = ? and woof = ? # --- where: user_id = ? ``` --- # Page 277: string_split **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/string_split.md --- # string\_split --- title: string_split latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/string_split page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/string_split.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/string_split.adoc page-git-created-date: "2026-04-08" page-git-modified-date: "2026-04-08" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/string_split/ "View the Self-Managed version of this component") Splits a string by a delimiter into an array. Generally, using bloblang’s `split` method is preferred. In some high performance use cases this processor can be faster than the equivalent bloblang if there is no additional logic. #### Common ```yml processors: label: "" string_split: delimiter: empty_as_null: false ``` #### Advanced ```yml processors: label: "" string_split: delimiter: emit_bytes: false empty_as_null: false ``` ## [](#fields)Fields ### [](#delimiter)`delimiter` The delimiter to split the string by. **Type**: `string` **Default**: \` \` ### [](#emit_bytes)`emit_bytes` When true, the output will be bloblang bytes instead of strings. **Type**: `bool` **Default**: `false` ### [](#empty_as_null)`empty_as_null` When true, empty strings resulting from the split are converted to null. **Type**: `bool` **Default**: `false` --- # Page 278: switch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/switch.md --- # switch --- title: switch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/switch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/switch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/switch.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/switch/)[Output](/redpanda-cloud/develop/connect/components/outputs/switch/)[Scanner](/redpanda-cloud/develop/connect/components/scanners/switch/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/switch/ "View the Self-Managed version of this component") Conditionally processes messages based on their contents. ```yml # Config fields, showing default values label: "" switch: [] # No default (required) ``` For each switch case a [Bloblang query](../../../guides/bloblang/about/) is checked and, if the result is true (or the check is empty) the child processors are executed on the message. ## [](#fields)Fields ### [](#check)`check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether a message should have the processors of this case executed on it. If left empty the case always passes. If the check mapping throws an error the message will be flagged [as having failed](../../../configuration/error_handling/) and will not be tested against any other cases. **Type**: `string` **Default**: `""` ```yaml # Examples: check: this.type == "foo" # --- check: this.contents.urls.contains("https://benthos.dev/") ``` ### [](#continue)`continue` Indicates whether, if this case passes for a message, the next case should also be tested. Unlike `fallthrough`, which skips the next case’s check, `continue` will evaluate the next case’s condition before executing. **Type**: `bool` **Default**: `false` ### [](#fallthrough)`fallthrough` Indicates whether, if this case passes for a message, the next case should also be executed without checking its condition. **Type**: `bool` **Default**: `false` ### [](#processors)`processors[]` A list of [processors](../about/) to execute on a message. **Type**: `processor` **Default**: `[]` ## [](#examples)Examples ### [](#ignore-george)Ignore George We have a system where we’re counting a metric for all messages that pass through our system. However, occasionally we get messages from George that we don’t care about. For George’s messages we want to instead emit a metric that gauges how angry he is about being ignored and then we drop it. ```yaml pipeline: processors: - switch: - check: this.user.name.first != "George" processors: - metric: type: counter name: MessagesWeCareAbout - processors: - metric: type: gauge name: GeorgesAnger value: ${! json("user.anger") } - mapping: root = deleted() ``` ## [](#batching)Batching When a switch processor executes on a [batch of messages](../../../configuration/batching/) they are checked individually and can be matched independently against cases. During processing the messages matched against a case are processed as a batch, although the ordering of messages during case processing cannot be guaranteed to match the order as received. At the end of switch processing the resulting batch will follow the same ordering as the batch was received. If any child processors have split or otherwise grouped messages this grouping will be lost as the result of a switch is always a single batch. In order to perform conditional grouping and/or splitting use the [`group_by` processor](../group_by/). --- # Page 279: sync_response **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/sync_response.md --- # sync\_response --- title: sync_response latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/sync_response page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/sync_response.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/sync_response.adoc categories: "[\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Processor ▼ [Processor](/redpanda-cloud/develop/connect/components/processors/sync_response/)[Output](/redpanda-cloud/develop/connect/components/outputs/sync_response/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/sync_response/ "View the Self-Managed version of this component") Adds the payload in its current state as a synchronous response to the input source, where it is dealt with according to that specific input type. ```yml # Config fields, showing default values label: "" sync_response: {} ``` For most inputs this mechanism is ignored entirely, in which case the sync response is dropped without penalty. It is therefore safe to use this processor even when combining input types that might not have support for sync responses. --- # Page 280: text_chunker **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/text_chunker.md --- # text\_chunker --- title: text_chunker latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/text_chunker page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/text_chunker.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/text_chunker.adoc page-git-created-date: "2025-05-02" page-git-modified-date: "2025-05-02" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/text_chunker/ "View the Self-Managed version of this component") Breaks down text-based message content into manageable chunks using a configurable strategy. This processor is ideal for creating vector embeddings of large text documents. #### Common ```yml processors: label: "" text_chunker: strategy: "" # No default (required) chunk_size: 512 chunk_overlap: 100 separators: - "\n\n" - "\n" - " " - "" length_measure: runes include_code_blocks: false keep_reference_links: false ``` #### Advanced ```yml processors: label: "" text_chunker: strategy: "" # No default (required) chunk_size: 512 chunk_overlap: 100 separators: - "\n\n" - "\n" - " " - "" length_measure: runes token_encoding: "" # No default (optional) allowed_special: [] disallowed_special: - "all" include_code_blocks: false keep_reference_links: false ``` ## [](#fields)Fields ### [](#allowed_special)`allowed_special[]` A list of special tokens to include in the output from this processor. **Type**: `array` **Default**: `[]` ### [](#chunk_overlap)`chunk_overlap` The number of characters duplicated in adjacent chunks of text. **Type**: `int` **Default**: `100` ### [](#chunk_size)`chunk_size` The maximum size of each chunk, using the selected [`length_measure`](#length_measure). **Type**: `int` **Default**: `512` ### [](#disallowed_special)`disallowed_special[]` A list of special tokens to exclude from the output of this processor. **Type**: `array` **Default**: ```yaml - "all" ``` ### [](#include_code_blocks)`include_code_blocks` When set to `true`, this processor includes code blocks in the output. **Type**: `bool` **Default**: `false` ### [](#keep_reference_links)`keep_reference_links` When set to `true`, this processor includes reference links in the output. **Type**: `bool` **Default**: `false` ### [](#length_measure)`length_measure` Choose a method to measure the length of a string. **Type**: `string` **Default**: `runes` | Option | Summary | | --- | --- | | graphemes | Use unicode graphemes to determine the length of a string. | | runes | Use the number of codepoints to determine the length of a string. | | token | Use the number of tokens (using the token_encoding tokenizer) to determine the length of a string. | | utf8 | Determine the length of text using the number of utf8 bytes. | ### [](#separators)`separators[]` A list of strings to use as separators between chunks when the [`recursive_character` strategy option](#strategy) is specified. By default, the following separators are tried in turn until one is successful: - Double newlines (\` `) - Single newlines (` ``) - Spaces (`" “,”"``) **Type**: `array` **Default**: ```yaml - "\n\n" - "\n" - " " - "" ``` ### [](#strategy)`strategy` Choose a strategy for breaking content down into chunks. **Type**: `string` | Option | Summary | | --- | --- | | markdown | Split text by markdown headers. | | recursive_character | Split text recursively by characters (defined in separators). | | token | Split text by tokens. | ### [](#token_encoding)`token_encoding` The type of encoding to use for tokenization. **Type**: `string` ```yaml # Examples: token_encoding: cl100k_base # --- token_encoding: r50k_base ``` --- # Page 281: try **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/try.md --- # try --- title: try latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/try page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/try.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/try.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/try/ "View the Self-Managed version of this component") Executes a list of child processors on messages only if no prior processors have failed (or the errors have been cleared). ```yml # Config fields, showing default values label: "" try: [] ``` This processor behaves similarly to the [`for_each`](../for_each/) processor, where a list of child processors are applied to individual messages of a batch. However, if a message has failed any prior processor (before or during the try block) then that message will skip all following processors. For example, with the following config: ```yaml pipeline: processors: - resource: foo - try: - resource: bar - resource: baz - resource: buz ``` If the processor `bar` fails for a particular message, that message will skip the processors `baz` and `buz`. Similarly, if `bar` succeeds but `baz` does not then `buz` will be skipped. If the processor `foo` fails for a message then none of `bar`, `baz` or `buz` are executed on that message. This processor is useful for when child processors depend on the successful output of previous processors. This processor can be followed with a [catch](../catch/) processor for defining child processors to be applied only to failed messages. More information about error handing can be found in [Error Handling](../../../configuration/error_handling/). ## [](#nest-within-a-catch-block)Nest within a catch block In some cases it might be useful to nest a try block within a catch block, since the [`catch` processor](../catch/) only clears errors _after_ executing its child processors this means a nested try processor will not execute unless the errors are explicitly cleared beforehand. This can be done by inserting an empty catch block before the try block like as follows: ```yaml pipeline: processors: - resource: foo - catch: - log: level: ERROR message: "Foo failed due to: ${! error() }" - catch: [] # Clear prior error - try: - resource: bar - resource: baz ``` --- # Page 282: unarchive **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/unarchive.md --- # unarchive --- title: unarchive latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/unarchive page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/unarchive.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/unarchive.adoc categories: "[\"Parsing\",\"Utility\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/unarchive/ "View the Self-Managed version of this component") Unarchives messages according to the selected archive format into multiple messages within a [batch](../../../configuration/batching/). ```yml # Config fields, showing default values label: "" unarchive: format: "" # No default (required) ``` When a message is unarchived the new messages replace the original message in the batch. Messages that are selected but fail to unarchive (invalid format) will remain unchanged in the message batch but will be flagged as having failed, allowing you to [error handle them](../../../configuration/error_handling/). ## [](#metadata)Metadata The metadata found on the messages handled by this processor will be copied into the resulting messages. For the unarchive formats that contain file information (tar, zip), a metadata field is also added to each message called `archive_filename` with the extracted filename. ## [](#fields)Fields ### [](#format)`format` The unarchiving format to apply. **Type**: `string` | Option | Summary | | --- | --- | | binary | Extract messages from a binary blob format. | | csv | Attempt to parse the message as a csv file (header required) and for each row in the file expands its contents into a json object in a new message. | | csv:x | Attempt to parse the message as a csv file (header required) and for each row in the file expands its contents into a json object in a new message using a custom delimiter. The custom delimiter must be a single character, e.g. the format "csv:\t" would consume a tab delimited file. | | json_array | Attempt to parse a message as a JSON array, and extract each element into its own message. | | json_documents | Attempt to parse a message as a stream of concatenated JSON documents. Each parsed document is expanded into a new message. | | json_map | Attempt to parse the message as a JSON map and for each element of the map expands its contents into a new message. A metadata field is added to each message called archive_key with the relevant key from the top-level map. | | lines | Extract the lines of a message each into their own message. | | tar | Extract messages from a unix standard tape archive. | | zip | Extract messages from a zip file. | --- # Page 283: while **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/while.md --- # while --- title: while latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/while page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/while.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/while.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/while/ "View the Self-Managed version of this component") A processor that checks a [Bloblang query](../../../guides/bloblang/about/) against each batch of messages and executes child processors on them for as long as the query resolves to true. #### Common ```yml processors: label: "" while: at_least_once: false check: "" processors: [] # No default (required) ``` #### Advanced ```yml processors: label: "" while: at_least_once: false max_loops: 0 check: "" processors: [] # No default (required) ``` The field `at_least_once`, if true, ensures that the child processors are always executed at least one time (like a do .. while loop.) The field `max_loops`, if greater than zero, caps the number of loops for a message batch to this value. If following a loop execution the number of messages in a batch is reduced to zero the loop is exited regardless of the condition result. If following a loop execution there are more than 1 message batches the query is checked against the first batch only. The conditions of this processor are applied across entire message batches. You can find out more about batching [in this doc](../../../configuration/batching/). ## [](#fields)Fields ### [](#at_least_once)`at_least_once` Whether to always run the child processors at least one time. **Type**: `bool` **Default**: `false` ### [](#check)`check` A [Bloblang query](../../../guides/bloblang/about/) that should return a boolean value indicating whether the while loop should execute again. **Type**: `string` **Default**: `""` ```yaml # Examples: check: errored() # --- check: this.urls.unprocessed.length() > 0 ``` ### [](#max_loops)`max_loops` An optional maximum number of loops to execute. Helps protect against accidentally creating infinite loops. **Type**: `int` **Default**: `0` ### [](#processors)`processors[]` A list of child processors to execute on each loop. **Type**: `processor` --- # Page 284: workflow **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/workflow.md --- # workflow --- title: workflow latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/workflow page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/workflow.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/workflow.adoc categories: "[\"Composition\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/workflow/ "View the Self-Managed version of this component") Executes a topology of [`branch` processors](../branch/), performing them in parallel where possible. #### Common ```yml processors: label: "" workflow: meta_path: meta.workflow order: [] branches: request_map: "" processors: [] # No default (required) result_map: "" ``` #### Advanced ```yml processors: label: "" workflow: meta_path: meta.workflow order: [] branch_resources: [] branches: request_map: "" processors: [] # No default (required) result_map: "" ``` ## [](#why-use-a-workflow)Why use a workflow ### [](#performance)Performance Most of the time the best way to compose processors is also the simplest, just configure them in series. This is because processors are often CPU bound, low-latency, and you can gain vertical scaling by increasing the number of processor pipeline threads, allowing Redpanda Connect to process [multiple messages in parallel](../../../configuration/processing_pipelines/). However, some processors, such as [`aws_lambda`](../aws_lambda/) and [`cache`](../cache/), interact with external services and therefore spend most of their time waiting for a response. These processors tend to be high-latency and low CPU activity, which causes messages to process slowly. When a processing pipeline contains multiple network processors that aren’t dependent on each other we can benefit from performing these processors in parallel for each individual message, reducing the overall message processing latency. ### [](#simplifying-processor-topology)Simplifying processor topology A workflow is often expressed as a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of processing stages, where each stage can result in N possible next stages, until finally the flow ends at an exit node. For example, if we had processing stages A, B, C and D, where stage A could result in either stage B or C being next, always followed by D, it might look something like this: ```text /--> B --\ A --| |--> D \--> C --/ ``` This flow would be easy to express in a standard Redpanda Connect config, we could simply use a [`switch` processor](../switch/) to route to either B or C depending on a condition on the result of A. However, this method of flow control quickly becomes unfeasible as the DAG gets more complicated, imagine expressing this flow using switch processors: ```text /--> B -------------|--> D / / A --| /--> E --| \--> C --| \ \----------|--> F ``` And imagine doing so knowing that the diagram is subject to change over time. Yikes! Instead, with a workflow we can either trust it to automatically resolve the DAG or express it manually as simply as `order: [ [ A ], [ B, C ], [ E ], [ D, F ] ]`, and the conditional logic for determining if a stage is executed is defined as part of the branch itself. ## [](#examples)Examples ### [](#automatic-ordering)Automatic Ordering When the field `order` is omitted a best attempt is made to determine a dependency tree between branches based on their request and result mappings. In the following example the branches foo and bar will be executed first in parallel, and afterwards the branch baz will be executed. ```yaml pipeline: processors: - workflow: meta_path: meta.workflow branches: foo: request_map: 'root = ""' processors: - http: url: TODO result_map: 'root.foo = this' bar: request_map: 'root = this.body' processors: - aws_lambda: function: TODO result_map: 'root.bar = this' baz: request_map: | root.fooid = this.foo.id root.barstuff = this.bar.content processors: - cache: resource: TODO operator: set key: ${! json("fooid") } value: ${! json("barstuff") } ``` ### [](#conditional-branches)Conditional Branches Branches of a workflow are skipped when the `request_map` assigns `deleted()` to the root. In this example the branch A is executed when the document type is "foo", and branch B otherwise. Branch C is executed afterwards and is skipped unless either A or B successfully provided a result at `tmp.result`. ```yaml pipeline: processors: - workflow: branches: A: request_map: | root = if this.document.type != "foo" { deleted() } processors: - http: url: TODO result_map: 'root.tmp.result = this' B: request_map: | root = if this.document.type == "foo" { deleted() } processors: - aws_lambda: function: TODO result_map: 'root.tmp.result = this' C: request_map: | root = if this.tmp.result != null { deleted() } processors: - http: url: TODO_SOMEWHERE_ELSE result_map: 'root.tmp.result = this' ``` ### [](#resources)Resources The `order` field can be used in order to refer to [branch processor resources](#resources), this can sometimes make your pipeline configuration cleaner, as well as allowing you to reuse branch configurations in order places. It’s also possible to mix and match branches configured within the workflow and configured as resources. ```yaml pipeline: processors: - workflow: order: [ [ foo, bar ], [ baz ] ] branches: bar: request_map: 'root = this.body' processors: - aws_lambda: function: TODO result_map: 'root.bar = this' processor_resources: - label: foo branch: request_map: 'root = ""' processors: - http: url: TODO result_map: 'root.foo = this' - label: baz branch: request_map: | root.fooid = this.foo.id root.barstuff = this.bar.content processors: - cache: resource: TODO operator: set key: ${! json("fooid") } value: ${! json("barstuff") } ``` ## [](#fields)Fields ### [](#branch_resources)`branch_resources[]` An optional list of [`branch` processor](../branch/) names that are configured as [Resources](#resources). These resources will be included in the workflow with any branches configured inline within the [`branches`](#branches) field. The order and parallelism in which branches are executed is automatically resolved based on the mappings of each branch. When using resources with an explicit order it is not necessary to list resources in this field. **Type**: `array` **Default**: `[]` ### [](#branches)`branches` An object of named [`branch` processors](../branch/) that make up the workflow. The order and parallelism in which branches are executed can either be made explicit with the field `order`, or if omitted an attempt is made to automatically resolve an ordering based on the mappings of each branch. **Type**: `object` **Default**: `{}` ### [](#branches-processors)`branches.processors[]` A list of processors to apply to mapped requests. When processing message batches the resulting batch must match the size and ordering of the input batch, therefore filtering, grouping should not be performed within these processors. **Type**: `processor` ### [](#branches-request_map)`branches.request_map` A [Bloblang mapping](../../../guides/bloblang/about/) that describes how to create a request payload suitable for the child processors of this branch. If left empty then the branch will begin with an exact copy of the origin message (including metadata). **Type**: `string` **Default**: `""` ```yaml # Examples: request_map: |- root = { "id": this.doc.id, "content": this.doc.body.text } # --- request_map: |- root = if this.type == "foo" { this.foo.request } else { deleted() } ``` ### [](#branches-result_map)`branches.result_map` A [Bloblang mapping](../../../guides/bloblang/about/) that describes how the resulting messages from branched processing should be mapped back into the original payload. If left empty the origin message will remain unchanged (including metadata). **Type**: `string` **Default**: `""` ```yaml # Examples: result_map: |- meta foo_code = metadata("code") root.foo_result = this # --- result_map: |- meta = metadata() root.bar.body = this.body root.bar.id = this.user.id # --- result_map: root.raw_result = content().string() # --- result_map: |- root.enrichments.foo = if metadata("request_failed") != null { throw(metadata("request_failed")) } else { this } # --- result_map: |- # Retain only the updated metadata fields which were present in the origin message meta = metadata().filter(v -> @.get(v.key) != null) ``` ### [](#meta_path)`meta_path` A [dot path](../../../configuration/field_paths/) indicating where to store and reference [structured metadata](#structured-metadata) about the workflow execution. **Type**: `string` **Default**: `meta.workflow` ### [](#order)`order` An explicit declaration of branch ordered tiers, which describes the order in which parallel tiers of branches should be executed. Branches should be identified by the name as they are configured in the field `branches`. It’s also possible to specify branch processors configured [as a resource](#resources). **Type**: `string` **Default**: `[]` ```yaml # Examples: order: - - foo - bar - - baz # --- order: - - foo - - bar - - baz ``` ## [](#structured-metadata)Structured metadata When the field `meta_path` is non-empty the workflow processor creates an object describing which workflows were successful, skipped or failed for each message and stores the object within the message at the end. The object is of the following form: ```json { "succeeded": [ "foo" ], "skipped": [ "bar" ], "failed": { "baz": "the error message from the branch" } } ``` If a message already has a meta object at the given path when it is processed then the object is used in order to determine which branches have already been performed on the message (or skipped) and can therefore be skipped on this run. This is a useful pattern when replaying messages that have failed some branches previously. For example, given the above example object the branches foo and bar would automatically be skipped, and baz would be reattempted. The previous meta object will also be preserved in the field `.previous` when the new meta object is written, preserving a full record of all workflow executions. If a field `.apply` exists in the meta object for a message and is an array then it will be used as an explicit list of stages to apply, all other stages will be skipped. ## [](#error-handling)Error handling The recommended approach to handle failures within a workflow is to query against the [structured metadata](#structured-metadata) it provides, as it provides granular information about exactly which branches failed and which ones succeeded and therefore aren’t necessary to perform again. For example, if our meta object is stored at the path `meta.workflow` and we wanted to check whether a message has failed for any branch we can do that using a [Bloblang query](../../../guides/bloblang/about/) like `this.meta.workflow.failed.length() | 0 > 0`, or to check whether a specific branch failed we can use `this.exists("meta.workflow.failed.foo")`. However, if structured metadata is disabled by setting the field `meta_path` to empty then the workflow processor instead adds a general error flag to messages when any executed branch fails. In this case it’s possible to handle failures using [standard error handling patterns](../../../configuration/error_handling/). --- # Page 285: xml **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/processors/xml.md --- # xml --- title: xml latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/processors/xml page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/processors/xml.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/xml.adoc categories: "[\"Parsing\"]" page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/processors/xml/ "View the Self-Managed version of this component") Parses messages as an XML document, performs a mutation on the data, and then overwrites the previous contents with the new value. ```yml # Config fields, showing default values label: "" xml: operator: "" cast: false ``` ## [](#operators)Operators ### [](#to_json)`to_json` Converts an XML document into a JSON structure, where elements appear as keys of an object according to the following rules: - If an element contains attributes they are parsed by prefixing a hyphen, `-`, to the attribute label. - If the element is a simple element and has attributes, the element value is given the key `#text`. - XML comments, directives, and process instructions are ignored. - When elements are repeated the resulting JSON value is an array. - XML namespaces are stripped from element and attribute names, and namespace declarations (`xmlns`) are omitted. For example, given the following XML: ```xml This is a title This is a description foo1 foo2 foo3 ``` The resulting JSON structure would look like this: ```json { "root":{ "title":"This is a title", "description":{ "#text":"This is a description", "-tone":"boring" }, "elements":[ {"#text":"foo1","-id":"1"}, {"#text":"foo2","-id":"2"}, "foo3" ] } } ``` With cast set to true, the resulting JSON structure would look like this: ```json { "root":{ "title":"This is a title", "description":{ "#text":"This is a description", "-tone":"boring" }, "elements":[ {"#text":"foo1","-id":1}, {"#text":"foo2","-id":2}, "foo3" ] } } ``` ## [](#fields)Fields ### [](#cast)`cast` Whether to try to cast values that are numbers and booleans to the right type. Default: all values are strings. **Type**: `bool` **Default**: `false` ### [](#operator)`operator` An XML [operation](#operators) to apply to messages. **Type**: `string` **Default**: `""` **Options**: `to_json` --- # Page 286: Rate Limits **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/rate_limits/about.md --- # Rate Limits --- title: Rate Limits latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/rate_limits/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/rate_limits/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/rate_limits/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- A rate limit is a strategy for limiting the usage of a shared resource across parallel components in a Redpanda Connect instance, or potentially across multiple instances. They are configured as a resource: ```yaml rate_limit_resources: - label: foobar local: count: 500 interval: 1s ``` And most components that hit external services have a field `rate_limit` for specifying a rate limit resource to use, identified by the `label` field. For example, if we wanted to use our `foobar` rate limit with a `http_client` input it would look like this: ```yaml input: http_client: url: TODO verb: GET rate_limit: foobar ``` By using a rate limit in this way we can guarantee that our input will only poll our HTTP source at the rate of 500 requests per second. Some components don’t have a `rate_limit` field but we might still wish to throttle them by a rate limit, in which case we can use the [`rate_limit` processor](../../processors/rate_limit/) that applies back pressure to a processing pipeline when the limit is reached. --- # Page 287: local **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/rate_limits/local.md --- # local --- title: local latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/rate_limits/local page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/rate_limits/local.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/rate_limits/local.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/rate_limits/local/ "View the Self-Managed version of this component") The local rate limit is a simple X every Y type rate limit that can be shared across any number of components within the pipeline but does not support distributed rate limits across multiple running instances of Benthos. ```yml # Config fields, showing default values label: "" local: count: 1000 interval: 1s ``` ## [](#fields)Fields ### [](#count)`count` The maximum number of requests to allow for a given period of time. **Type**: `int` **Default**: `1000` ### [](#interval)`interval` The time window to limit requests by. **Type**: `string` **Default**: `"1s"` --- # Page 288: redis **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/rate_limits/redis.md --- # redis --- title: redis latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/rate_limits/redis page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/rate_limits/redis.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/rate_limits/redis.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Rate\_limit ▼ [Rate\_limit](/redpanda-cloud/develop/connect/components/rate_limits/redis/)[Cache](/redpanda-cloud/develop/connect/components/caches/redis/)[Processor](/redpanda-cloud/develop/connect/components/processors/redis/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/rate_limits/redis/ "View the Self-Managed version of this component") A rate limit implementation using Redis. It works by using a simple token bucket algorithm to limit the number of requests to a given count within a given time period. The rate limit is shared across all instances of Redpanda Connect that use the same Redis instance, which must all have a consistent count and interval. #### Common ```yml # Common config fields, showing default values label: "" redis: url: redis://:6379 # No default (required) count: 1000 interval: 1s key: "" # No default (required) ``` #### Advanced ```yml # All config fields, showing default values label: "" redis: url: redis://:6379 # No default (required) kind: simple master: "" tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] count: 1000 interval: 1s key: "" # No default (required) ``` ## [](#fields)Fields ### [](#url)`url` The URL of the target Redis server. Database is optional and is supplied as the URL path. **Type**: `string` ```yml # Examples url: redis://:6379 url: redis://localhost:6379 url: redis://foousername:foopassword@redisplace:6379 url: redis://:foopassword@redisplace:6379 url: redis://localhost:6379/1 url: redis://localhost:6379/1,redis://localhost:6380/1 ``` ### [](#kind)`kind` Specifies a simple, cluster-aware, or failover-aware redis client. **Type**: `string` **Default**: `"simple"` Options: `simple` , `cluster` , `failover` . ### [](#master)`master` Name of the redis master when `kind` is `failover` **Type**: `string` **Default**: `""` ```yml # Examples master: mymaster ``` ### [](#tls)`tls` Custom TLS settings can be used to override system defaults. **Troubleshooting** Some cloud hosted instances of Redis (such as Azure Cache) might need some hand holding in order to establish stable connections. Unfortunately, it is often the case that TLS issues will manifest as generic error messages such as "i/o timeout". If you’re using TLS and are seeing connectivity problems consider setting `enable_renegotiation` to `true`, and ensuring that the server supports at least TLS version 1.2. **Type**: `object` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yml # Examples root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yml # Examples root_cas_file: ./root_cas.pem ``` ### [](#tls-client_certs)`tls.client_certs` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `array` **Default**: `[]` ```yml # Examples client_certs: - cert: foo key: bar client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yml # Examples password: foo password: ${KEY_PASSWORD} ``` ### [](#count)`count` The maximum number of messages to allow for a given period of time. **Type**: `int` **Default**: `1000` ### [](#interval)`interval` The time window to limit requests by. **Type**: `string` **Default**: `"1s"` ### [](#key)`key` The key to use for the rate limit. **Type**: `string` --- # Page 289: redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/redpanda/about.md --- # redpanda --- title: redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/redpanda/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/redpanda/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/redpanda/about.adoc page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- The Redpanda Connect configuration service allows you to: - Configure Redpanda cluster credentials in a single configuration block, which is referenced by multiple components in data pipeline. For more information, see the [Pipeline example](#pipeline-example). - Send logs and status updates to topics on a Redpanda cluster, in addition to the [default logger](../../logger/about/). The `redpanda` namespace contains the configuration of this service. #### Common ```yml # Common configuration fields, showing default values redpanda: seed_brokers: [] # No default (optional) pipeline_id: "" logs_topic: "" logs_level: info status_topic: "" ``` #### Advanced ```yml # All configuration fields, showing default values redpanda: seed_brokers: [] # No default (optional) client_id: benthos tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 5m request_timeout_overhead: 10s conn_idle_timeout: 20s pipeline_id: "" logs_topic: "" logs_level: info status_topic: "" partitioner: "" # No default (optional) idempotent_write: true compression: "" # No default (optional) timeout: 10s max_message_bytes: 1MB broker_write_max_bytes: 100MB allow_auto_topic_creation: true ``` ## [](#pipeline-example)Pipeline example This data pipeline reads data from `topic_A` and `topic_B` on a Redpanda cluster, and then writes the data to `topic_C` on the same cluster. The cluster details are configured within the `redpanda` configuration block, so you only need to configure them once. This is a useful feature when you have multiple inputs and outputs in the same data pipeline that need to connect to the same cluster. ```none input: redpanda_common: topics: [ topic_A, topic_B ] output: redpanda_common: topic: topic_C key: ${! @id } redpanda: seed_brokers: [ "127.0.0.1:9092" ] tls: enabled: true sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ## [](#fields)Fields ### [](#seed_brokers)`seed_brokers` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yml # Examples seed_brokers: - localhost:9092 seed_brokers: - foo:9092 - bar:9092 seed_brokers: - foo:9092,bar:9092 ``` ### [](#client_id)`client_id` An identifier for the client connection. **Type**: `string` **Default**: `benthos` ### [](#tls)`tls` Override system defaults with custom TLS settings. **Type**: `object` ### [](#tls-enabled)`tls.enabled` Whether custom TLS settings are enabled. **Type**: `bool` **Default**: `false` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. **Type**: `bool` **Default**: `false` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent trusted root certificate, through possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yml # Example root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent trusted root certificate, through possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yml # Example root_cas_file: ./root_cas.pem ``` ### [](#tls-client_certs)`tls.client_certs` A list of client certificates to use. For each certificate, specify either the fields `cert` and `key` or `cert_file` and `key_file`. **Type**: `array` **Default**: `[]` ```yml # Examples client_certs: - cert: foo key: bar client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` The plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` The plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` The plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. > ⚠️ **WARNING** > > The `pbeWithMD5AndDES-CBC` algorithm does not authenticate ciphertext, and is vulnerable to padding oracle attacks which may allow an attacker to recover the plain text password. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yml # Examples password: foo password: ${KEY_PASSWORD} ``` ### [](#sasl)`sasl` Specify one or more methods or mechanisms of SASL authentication. They are tried in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `array` ```yml # Example sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM-based authentication as specified by the aws-msk-iam-auth Java library. | | OAUTHBEARER | OAuth Bearer-based authentication. | | PLAIN | Plain text authentication. | | SCRAM-SHA-256 | SCRAM-based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM-based authentication as specified in RFC5802. | | none | Disable SASL authentication | ### [](#sasl-username)`sasl[].username` A username for `PLAIN` or `SCRAM-*` authentication. **Type**: `string` **Default**: `""` ### [](#sasl-password)`sasl[].password` A password for `PLAIN` or `SCRAM-*` authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s `OAUTHBEARER` authentication. **Type**: `string` **Default**: `""` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to `OAUTHBEARER` authentication requests. **Type**: `object` ### [](#sasl-aws)`sasl[].aws` AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` **Default**: `""` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Specify a custom endpoint for the AWS API. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Manually configure the AWS credentials to use (optional). For more information, see the [Amazon Web Services guide](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` The profile from `~/.aws/credentials` to use. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of the AWS credentials to use. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the AWS credentials in use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the AWS credentials in use. This is a required value for short-term credentials. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume an [IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` **Default**: `false` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` The role ARN to assume. **Type**: `string` **Default**: `""` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to use when assuming a role. **Type**: `string` **Default**: `""` ### [](#metadata_max_age)`metadata_max_age` The maximum period of time after which metadata is refreshed. **Type**: `string` **Default**: `5m` ### [](#request_timeout_overhead)`request_timeout_overhead` Grants an additional buffer or overhead to requests that have timeout fields defined. This field is based on the behavior of Apache Kafka’s `request.timeout.ms` parameter, but with the option to extend the timeout deadline. **Type**: `string` **Default**: `10s` ### [](#conn_idle_timeout)`conn_idle_timeout` Define how long connections can remain idle before they are closed. **Type**: `string` ### [](#pipeline_id)`pipeline_id` The ID of a Redpanda Connect data pipeline (optional). When specified, the pipeline ID is written to all logs and status updates sent to the configured topics. **Type**: `string` **Default**: `""` ### [](#logs_topic)`logs_topic` The topic that logs are sent to. **Type**: `string` **Default**: `""` ```yml # Example logs_topic: __redpanda.connect.logs ``` ### [](#logs_level)`logs_level` The logging level of logs sent to Redpanda. **Type**: `string` **Default**: `info` **Options**: `debug`, `info`, `warn`, `error` ### [](#status_topic)`status_topic` The topic that status updates are sent to. For full details of the schema for status updates, see the [object specification](https://github.com/redpanda-data/connect/blob/main/internal/protoconnect/status.pb.go). **Type**: `string` **Default**: `""` ```yml # Example status_topic: __redpanda.connect.status ``` ### [](#partitioner)`partitioner` Override the default murmur2 hashing partitioner. **Type**: `string` | Option | Summary | | --- | --- | | least_backup | Chooses the least backed up partition. The partition with the fewest buffered records. Partitions are selected per batch. | | manual | Manually select a partition for each message. You must also specify a value for the partition field. | | murmur2_hash | Kafka’s default hash algorithm that uses a 32-bit murmur2 hash of the key to compute the partition for the record. | | round_robin | Does a round robin of messages through all available partitions. This algorithm has lower throughput and causes higher CPU load on brokers, but is useful if you want to ensure an even distribution of records to partitions. | ### [](#idempotent_write)`idempotent_write` Enable the idempotent write producer option. This requires the `IDEMPOTENT_WRITE` permission on `CLUSTER`. Disable this option if the `IDEMPOTENT_WRITE` permission is not available. **Type**: `bool` **Default**: `true` ### [](#compression)`compression` Set an explicit compression type (optional). The default preference is to use `snappy` when the broker supports it. Otherwise, use `none`. **Type**: `string` Options: `lz4` , `snappy` , `gzip` , `none` , `zstd` ### [](#timeout)`timeout` The maximum period of time allowed for sending log or status update messages before a request is abandoned and a retry attempted. **Type**: `string` **Default**: `10s` ### [](#max_message_bytes)`max_message_bytes` The maximum size of an individual message in bytes. Messages larger than this value are rejected. This field is equivalent to Kafka’s `max.message.bytes`. **Type**: `string` **Default**: `1MB` ```yml # Examples max_message_bytes: 100MB max_message_bytes: 50mib ``` ### [](#broker_write_max_bytes)`broker_write_max_bytes` The upper bound for the number of bytes written to a broker connection in a single write. This field corresponds to Kafka’s `socket.request.max.bytes`. **Type**: `string` **Default**: `"100MB"` ```yml # Examples broker_write_max_bytes: 128MB broker_write_max_bytes: 50mib ``` --- # Page 290: Scanners **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/about.md --- # Scanners --- title: Scanners latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- For such inputs it’s necessary to define a mechanism by which the stream of source bytes can be chopped into smaller logical messages, processed and outputted as a continuous process whilst the stream is being read, as this dramatically reduces the memory usage of Redpanda Connect as a whole and results in a more fluid flow of data. The way in which we define this chopping mechanism is through scanners, configured as a field on each input that requires one. For example, if we wished to consume files line-by-line, which each individual line being processed as a discrete message, we could use the [`lines` scanner](../lines/) with our `file` input: ## Common ```yaml input: file: paths: [ "./*.txt" ] scanner: lines: {} ``` ## Advanced ```yaml # Instead of newlines, use a custom delimiter: input: file: paths: [ "./*.txt" ] scanner: lines: custom_delimiter: "---END---" max_buffer_size: 100_000_000 # 100MB line buffer ``` A scanner is a plugin similar to any other core Redpanda Connect component (inputs, processors, outputs, etc), which means it’s possible to define your own scanners that can be utilized by inputs that need them. --- # Page 291: avro **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/avro.md --- # avro --- title: avro latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/avro page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/avro.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/avro.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Scanner ▼ [Scanner](/redpanda-cloud/develop/connect/components/scanners/avro/)[Processor](/redpanda-cloud/develop/connect/components/processors/avro/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/avro/ "View the Self-Managed version of this component") Consume a stream of Avro OCF datum. #### Common ```yml scanners: avro: ``` #### Advanced ```yml scanners: avro: raw_json: false ``` ## [](#avro-json-format)Avro JSON format This scanner creates documents formatted as [Avro JSON](https://avro.apache.org/docs/current/specification/) when decoding with Avro schemas. In this format, the value of a union is encoded in JSON as follows: - If the union’s type is `null`, it is encoded as a JSON `null`. - Otherwise, the union is encoded as a JSON object with one name/value pair. The `"name"` is the type’s name and the `"value"` is the recursively encoded value. For Avro’s named types (record, fixed or enum), the user-specified name is used. For other types, the type name is used. For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would encode: - The `null` as a JSON `null` - The string `"a"` as `{"string": "a"}` - A `Transaction` instance as `{"Transaction": {…​}}`, where `{…​}` indicates the JSON encoding of a `Transaction` instance Alternatively, you can create documents in [standard/raw JSON format](https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull) by setting the field [`raw_json`](#raw_json) to `true`. ## [](#metadata)Metadata This scanner emits the following metadata for each message: - The `@avro_schema` field: The canonical Avro schema. - The `@avro_schema_fingerprint` field: The schema ID or fingerprint. ## [](#fields)Fields ### [](#raw_json)`raw_json` Whether to decode messages into normal JSON rather than [Avro JSON](https://avro.apache.org/docs/current/specification/_print/#json-encoding). When true, this unwraps union values (bare values instead of {"type": value} wrappers). **Type**: `bool` **Default**: `false` --- # Page 292: chunker **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/chunker.md --- # chunker --- title: chunker latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/chunker page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/chunker.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/chunker.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/chunker/ "View the Self-Managed version of this component") Split an input stream into chunks of a given number of bytes. ```yml # Config fields, showing default values chunker: size: 0 # No default (required) ``` ## [](#fields)Fields ### [](#size)`size` The size of each chunk in bytes. **Type**: `int` --- # Page 293: csv **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/csv.md --- # csv --- title: csv latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/csv page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/csv.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/csv.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Scanner ▼ [Scanner](/redpanda-cloud/develop/connect/components/scanners/csv/)[Input](/redpanda-connect/components/inputs/csv/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/csv/ "View the Self-Managed version of this component") Consume comma-separated values row by row, including support for custom delimiters. ```yml # Config fields, showing default values csv: custom_delimiter: "" # No default (optional) parse_header_row: true lazy_quotes: false continue_on_error: false ``` ## [](#metadata)Metadata This scanner adds the following metadata to each message: - `csv_row` The index of each row, beginning at 0. ## [](#fields)Fields ### [](#continue_on_error)`continue_on_error` If a row fails to parse due to any error emit an empty message marked with the error and then continue consuming subsequent rows when possible. This can sometimes be useful in situations where input data contains individual rows which are malformed. However, when a row encounters a parsing error it is impossible to guarantee that following rows are valid, as this indicates that the input data is unreliable and could potentially emit misaligned rows. **Type**: `bool` **Default**: `false` ### [](#custom_delimiter)`custom_delimiter` Use a provided custom delimiter instead of the default comma. **Type**: `string` ### [](#lazy_quotes)`lazy_quotes` If set to `true`, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field. **Type**: `bool` **Default**: `false` ### [](#parse_header_row)`parse_header_row` Whether to reference the first row as a header row. If set to true the output structure for messages will be an object where field keys are determined by the header row. Otherwise, each message will consist of an array of values from the corresponding CSV row. **Type**: `bool` **Default**: `true` --- # Page 294: decompress **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/decompress.md --- # decompress --- title: decompress latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/decompress page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/decompress.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/decompress.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Scanner ▼ [Scanner](/redpanda-cloud/develop/connect/components/scanners/decompress/)[Processor](/redpanda-cloud/develop/connect/components/processors/decompress/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/decompress/ "View the Self-Managed version of this component") Decompress the stream of bytes according to an algorithm, before feeding it into a child scanner. ```yml # Config fields, showing default values decompress: algorithm: "" # No default (required) into: to_the_end: {} ``` ## [](#fields)Fields ### [](#algorithm)`algorithm` One of `gzip`, `pgzip`, `zlib`, `bzip2`, `flate`, `snappy`, `lz4`, `zstd`. **Type**: `string` ### [](#into)`into` The child scanner to feed the decompressed stream into. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` --- # Page 295: json_array **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/json_array.md --- # json\_array --- title: json_array latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/json_array page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/json_array.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/json_array.adoc categories: "[]" description: Consumes a stream of one or more JSON elements within a top level array. page-git-created-date: "2025-09-26" page-git-modified-date: "2025-09-26" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/json_array/ "View the Self-Managed version of this component") Consumes a stream of one or more JSON elements within a top level array. This scanner is useful for: - Processing exports from systems that generate a JSON array as the top-level JSON structure (for example, logs, bulk exports, etc). - Efficiently breaking up large files with many objects into individual events/messages. Suppose you have a file `events.json`: `events.json` ```json [ {"event": "login", "user": "alice"}, {"event": "logout", "user": "bob"}, {"event": "purchase", "user": "carol", "amount": 42} ] ``` The configuration to process this file is: ```yaml input: file: paths: [ "./events.json" ] scanner: json_array: {} ``` Result: Each event in the array is processed as a separate message. ## [](#requirements)Requirements The `json_array` scanner expects the input to be a single JSON array, where each array element is a JSON object or value. ## [](#fields)Fields The `json_array` scanner has no required fields. You declare it as `{}` in your config. ```yaml json_array: {} ``` --- # Page 296: json_documents **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/json_documents.md --- # json\_documents --- title: json_documents latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/json_documents page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/json_documents.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/json_documents.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/json_documents/ "View the Self-Managed version of this component") Consumes a stream of one or more JSON documents. ```yml # Config fields, showing default values json_documents: {} ``` --- # Page 297: lines **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/lines.md --- # lines --- title: lines latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/lines page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/lines.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/lines.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/lines/ "View the Self-Managed version of this component") Split an input stream into a message per line of data. ```yml # Config fields, showing default values lines: custom_delimiter: "" # No default (optional) max_buffer_size: 65536 omit_empty: false ``` ## [](#fields)Fields ### [](#custom_delimiter)`custom_delimiter` Use a provided custom delimiter for detecting the end of a line rather than a single line break. **Type**: `string` ### [](#max_buffer_size)`max_buffer_size` Set the maximum buffer size for storing line data, this limits the maximum size that a line can be without causing an error. **Type**: `int` **Default**: `65536` ### [](#omit_empty)`omit_empty` Omit empty lines. **Type**: `bool` **Default**: `false` --- # Page 298: re_match **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/re_match.md --- # re\_match --- title: re_match latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/re_match page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/re_match.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/re_match.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/re_match/ "View the Self-Managed version of this component") Split an input stream into segments matching against a regular expression. ```yml # Config fields, showing default values re_match: pattern: (?m)^\d\d:\d\d:\d\d # No default (required) max_buffer_size: 65536 ``` ## [](#fields)Fields ### [](#max_buffer_size)`max_buffer_size` Set the maximum buffer size for storing line data, this limits the maximum size that a message can be without causing an error. **Type**: `int` **Default**: `65536` ### [](#pattern)`pattern` The pattern to match against. **Type**: `string` ```yaml # Examples: pattern: (?m)^\d\d:\d\d:\d\d ``` --- # Page 299: skip_bom **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/skip_bom.md --- # skip\_bom --- title: skip_bom latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/skip_bom page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/skip_bom.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/skip_bom.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/skip_bom/ "View the Self-Managed version of this component") Skip one or more byte order marks for each opened child scanner. ```yml # Config fields, showing default values skip_bom: into: to_the_end: {} ``` ## [](#fields)Fields ### [](#into)`into` The child scanner to feed the resulting stream into. **Type**: `scanner` **Default**: ```yaml to_the_end: {} ``` --- # Page 300: switch **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/switch.md --- # switch --- title: switch latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/switch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/switch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/switch.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Scanner ▼ [Scanner](/redpanda-cloud/develop/connect/components/scanners/switch/)[Output](/redpanda-cloud/develop/connect/components/outputs/switch/)[Processor](/redpanda-cloud/develop/connect/components/processors/switch/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/switch/ "View the Self-Managed version of this component") Select a child scanner dynamically for source data based on factors such as the filename. ```yml # Config fields, showing default values switch: [] # No default (required) ``` This scanner outlines a list of potential child scanner candidates to be chosen, and for each source of data the first candidate to pass will be selected. A candidate without any conditions acts as a catch-all and will pass for every source, it is recommended to always have a catch-all scanner at the end of your list. If a given source of data does not pass a candidate an error is returned and the data is rejected. ## [](#fields)Fields ### [](#re_match_name)`re_match_name` A regular expression to test against the name of each source of data fed into the scanner (filename or equivalent). If this pattern matches the child scanner is selected. **Type**: `string` ### [](#scanner)`scanner` The scanner to activate if this candidate passes. **Type**: `scanner` ## [](#examples)Examples ### [](#switch-based-on-file-name)Switch based on file name In this example a file input chooses a scanner based on the extension of each file ```yaml input: file: paths: [ ./data/* ] scanner: switch: - re_match_name: '\.avro$' scanner: { avro: {} } - re_match_name: '\.csv$' scanner: { csv: {} } - re_match_name: '\.csv.gz$' scanner: decompress: algorithm: gzip into: csv: {} - re_match_name: '\.tar$' scanner: { tar: {} } - re_match_name: '\.tar.gz$' scanner: decompress: algorithm: gzip into: tar: {} - scanner: { to_the_end: {} } ``` --- # Page 301: tar **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/tar.md --- # tar --- title: tar latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/tar page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/tar.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/tar.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/tar/ "View the Self-Managed version of this component") Consume a tar archive file by file. ```yml # Config fields, showing default values tar: {} ``` ## [](#metadata)Metadata This scanner adds the following metadata to each message: - `tar_name` --- # Page 302: to_the_end **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/scanners/to_the_end.md --- # to\_the\_end --- title: to_the_end latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/scanners/to_the_end page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/scanners/to_the_end.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/scanners/to_the_end.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/scanners/to_the_end/ "View the Self-Managed version of this component") Read the input stream all the way until the end and deliver it as a single message. ```yml # Config fields, showing default values to_the_end: {} ``` > ⚠️ **CAUTION** > > Some sources of data may not have a logical end, therefore caution should be made to exclusively use this scanner when the end of an input stream is clearly defined (and well within memory). --- # Page 303: Tracers **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/tracers/about.md --- # Tracers --- title: Tracers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/tracers/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/tracers/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/tracers/about.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- When a tracer is configured all messages will be allocated a root span during ingestion that represents their journey through a Redpanda Connect pipeline. Many Redpanda Connect processors create spans, and so tracing is a great way to analyse the pathways of individual messages as they progress through a Redpanda Connect instance. Some inputs, such as `http_server` and `http_client`, are capable of extracting a root span from the source of the message (HTTP headers). This is a work in progress and should eventually expand so that all inputs have a way of doing so. Other inputs, such as `kafka` can be configured to extract a root span by using the `extract_tracing_map` field. A tracer config section looks like this: ```yaml tracer: jaeger: agent_address: localhost:6831 sampler_type: const sampler_param: 1 ``` > ⚠️ **CAUTION** > > Although the configuration spec of this component is stable the format of spans, tags and logs created by Redpanda Connect is subject to change as it is tuned for improvement. --- # Page 304: gcp_cloudtrace **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/tracers/gcp_cloudtrace.md --- # gcp\_cloudtrace --- title: gcp_cloudtrace latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/tracers/gcp_cloudtrace page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/tracers/gcp_cloudtrace.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/tracers/gcp_cloudtrace.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/tracers/gcp_cloudtrace/ "View the Self-Managed version of this component") Send tracing events to a [Google Cloud Trace](https://cloud.google.com/trace). #### Common ```yml tracers: gcp_cloudtrace: project: "" # No default (required) sampling_ratio: 1 flush_interval: "" # No default (optional) ``` #### Advanced ```yml tracers: gcp_cloudtrace: project: "" # No default (required) sampling_ratio: 1 tags: {} flush_interval: "" # No default (optional) ``` ## [](#fields)Fields ### [](#flush_interval)`flush_interval` The period of time between each flush of tracing spans. **Type**: `string` ### [](#project)`project` The google project with Cloud Trace API enabled. If this is omitted then the Google Cloud SDK will attempt auto-detect it from the environment. **Type**: `string` ### [](#sampling_ratio)`sampling_ratio` Sets the ratio of traces to sample. Tuning the sampling ratio is recommended for high-volume production workloads. **Type**: `float` **Default**: `1` ```yaml # Examples: sampling_ratio: 1 ``` ### [](#tags)`tags` A map of tags to add to tracing spans. **Type**: `string` **Default**: `{}` --- # Page 305: none **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/tracers/none.md --- # none --- title: none latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/tracers/none page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/tracers/none.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/tracers/none.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- **Type:** Tracer ▼ [Tracer](/redpanda-cloud/develop/connect/components/tracers/none/)[Buffer](/redpanda-cloud/develop/connect/components/buffers/none/)[Metric](/redpanda-cloud/develop/connect/components/metrics/none/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/tracers/none/ "View the Self-Managed version of this component") Do not send tracing events anywhere. ```yml # Config fields, showing default values tracer: none: {} ``` --- # Page 306: redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/components/tracers/redpanda.md --- # redpanda --- title: redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/components/tracers/redpanda page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/components/tracers/redpanda.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/tracers/redpanda.adoc categories: "[]" description: Send tracing events to a Redpanda topic. page-git-created-date: "2025-12-03" page-git-modified-date: "2025-12-03" --- **Type:** Tracer ▼ [Tracer](/redpanda-cloud/develop/connect/components/tracers/redpanda/)[Cache](/redpanda-cloud/develop/connect/components/caches/redpanda/)[Input](/redpanda-cloud/develop/connect/components/inputs/redpanda/)[Output](/redpanda-cloud/develop/connect/components/outputs/redpanda/) **Available in:** Cloud, [Self-Managed](/redpanda-connect/components/tracers/redpanda/ "View the Self-Managed version of this component") Export distributed tracing data to a Redpanda topic, enabling you to monitor and debug your Redpanda Connect pipelines. Traces are exported in OpenTelemetry format as JSON, allowing integration with observability platforms like Jaeger, Grafana Tempo, or custom trace consumers. #### Common ```yml tracers: redpanda: seed_brokers: [] # No default (required) topic: otel-traces format: json schema_registry: url: "" # No default (optional) tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} service: redpanda-connect sampling: enabled: false ratio: "" # No default (optional) ``` #### Advanced ```yml tracers: redpanda: seed_brokers: [] # No default (required) client_id: redpanda-connect tls: enabled: false skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] sasl: [] # No default (optional) metadata_max_age: 1m request_timeout_overhead: 10s conn_idle_timeout: 20s tcp: connect_timeout: 0s keep_alive: idle: 15s interval: 15s count: 9 tcp_user_timeout: 0s partitioner: "" # No default (optional) idempotent_write: true compression: "" # No default (optional) allow_auto_topic_creation: true timeout: 10s max_message_bytes: 1MiB broker_write_max_bytes: 100MiB topic: otel-traces format: json schema_registry: url: "" # No default (optional) tls: skip_cert_verify: false enable_renegotiation: false root_cas: "" root_cas_file: "" client_certs: [] oauth2: enabled: false client_key: "" client_secret: "" token_url: "" scopes: [] endpoint_params: {} oauth: enabled: false consumer_key: "" consumer_secret: "" access_token: "" access_token_secret: "" basic_auth: enabled: false username: "" password: "" jwt: enabled: false private_key_file: "" signing_method: "" claims: {} headers: {} service: redpanda-connect tags: {} sampling: enabled: false ratio: "" # No default (optional) ``` This tracer automatically captures trace spans as messages flow through your pipeline, recording timing information, component metadata, and error details. Use this to: - **Track message flow** through complex pipelines with multiple processors. - **Identify performance bottlenecks** by analyzing span durations. - **Debug failures** by examining trace context and error details. - **Monitor pipeline health** across distributed Redpanda Connect instances. - **Correlate activity** across multiple services using trace IDs. The tracer writes to a dedicated Redpanda topic that can be consumed by trace analysis tools. Configure sampling to control trace volume in high-throughput environments. ## [](#fields)Fields ### [](#allow_auto_topic_creation)`allow_auto_topic_creation` Whether to automatically create the trace topic if it doesn’t exist. If false, the topic must be created manually before starting the tracer. **Type**: `bool` **Default**: `true` ### [](#broker_write_max_bytes)`broker_write_max_bytes` The maximum number of bytes this output can write to a broker connection in a single write. This field corresponds to Kafka’s `socket.request.max.bytes`. **Type**: `string` **Default**: `100MiB` ```yaml # Examples: broker_write_max_bytes: 128MB # --- broker_write_max_bytes: 50mib ``` ### [](#client_id)`client_id` An identifier for the client connection. This appears in broker logs and metrics to help identify which Redpanda Connect instance is sending traces. **Type**: `string` **Default**: `redpanda-connect` ### [](#compression)`compression` Compression codec to use for trace messages. Options include `gzip`, `snappy`, `lz4`, `zstd`, or none. Compression can reduce network bandwidth and storage costs. **Type**: `string` **Options**: `lz4`, `snappy`, `gzip`, `none`, `zstd` ### [](#conn_idle_timeout)`conn_idle_timeout` The maximum duration that connections can remain idle before they are automatically closed. This field accepts Go duration format strings such as `100ms`, `1s`, or `5s`. **Type**: `string` **Default**: `20s` ### [](#format)`format` The format for trace data. Currently only `json` is supported, which exports OpenTelemetry spans as JSON messages. **Type**: `string` **Default**: `json` | Option | Summary | | --- | --- | | json | Emit in JSON Format | | protobuf | Emit in Protobuf Format | | schema-registry-json | Emit in JSON Format with Schema Registry encoding | | schema-registry-protobuf | Emit in Protobuf Format with Schema Registry encoding | ### [](#idempotent_write)`idempotent_write` Enable idempotent writes to prevent duplicate trace messages in case of retries. Recommended for production environments. **Type**: `bool` **Default**: `true` ### [](#max_message_bytes)`max_message_bytes` The maximum size of individual trace messages. Traces exceeding this size will be truncated or dropped. **Type**: `string` **Default**: `1MiB` ```yaml # Examples: max_message_bytes: 100MB # --- max_message_bytes: 50mib ``` ### [](#metadata_max_age)`metadata_max_age` The maximum age of cached cluster metadata before it is refreshed. Reducing this value can help detect cluster changes faster but increases metadata requests. **Type**: `string` **Default**: `1m` ### [](#partitioner)`partitioner` Override the default partitioner for trace messages. By default, traces are distributed across partitions for load balancing. **Type**: `string` | Option | Summary | | --- | --- | | least_backup | Chooses the least backed up partition (the partition with the fewest amount of buffered records). Partitions are selected per batch. | | manual | Manually select a partition for each message, requires the field partition to be specified. | | murmur2_hash | Kafka’s default hash algorithm that uses a 32-bit murmur2 hash of the key to compute which partition the record will be on. | | round_robin | Round-robin’s messages through all available partitions. This algorithm has lower throughput and causes higher CPU load on brokers, but can be useful if you want to ensure an even distribution of records to partitions. | ### [](#request_timeout_overhead)`request_timeout_overhead` Additional time to apply as overhead when calculating request deadlines. This buffer helps prevent premature timeouts. **Type**: `string` **Default**: `10s` ### [](#sampling)`sampling` Configure trace sampling to control the volume of trace data. Sampling is essential for high-throughput pipelines to prevent trace data from overwhelming your observability infrastructure. **Type**: `object` ### [](#sampling-enabled)`sampling.enabled` Whether to enable trace sampling. When disabled, all traces are exported. When enabled, traces are sampled according to the configured ratio. **Type**: `bool` **Default**: `false` ### [](#sampling-ratio)`sampling.ratio` The sampling ratio as a decimal between 0 and 1. For example, `0.1` samples 10% of traces, `0.01` samples 1%. Lower ratios reduce trace volume and overhead. For high-throughput production systems, start with 0.01-0.1 and adjust based on your needs. **Type**: `float` ```yaml # Examples: ratio: 0.05 # --- ratio: 0.85 # --- ratio: 0.5 ``` ### [](#sasl)`sasl[]` Specify one or more methods or mechanisms of SASL authentication, which are attempted in order. If the broker supports the first SASL mechanism, all connections use it. If the first mechanism fails, the client picks the first supported mechanism. If the broker does not support any client mechanisms, all connections fail. **Type**: `object` ```yaml # Examples: sasl: - mechanism: SCRAM-SHA-512 password: bar username: foo ``` ### [](#sasl-aws)`sasl[].aws` Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. **Type**: `object` ### [](#sasl-aws-credentials)`sasl[].aws.credentials` Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](../../../guides/cloud/aws/). **Type**: `object` ### [](#sasl-aws-credentials-from_ec2_role)`sasl[].aws.credentials.from_ec2_role` Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). **Type**: `bool` ### [](#sasl-aws-credentials-id)`sasl[].aws.credentials.id` The ID of credentials to use. **Type**: `string` ### [](#sasl-aws-credentials-profile)`sasl[].aws.credentials.profile` A profile from `~/.aws/credentials` to use. **Type**: `string` ### [](#sasl-aws-credentials-role)`sasl[].aws.credentials.role` A role ARN to assume. **Type**: `string` ### [](#sasl-aws-credentials-role_external_id)`sasl[].aws.credentials.role_external_id` An external ID to provide when assuming a role. **Type**: `string` ### [](#sasl-aws-credentials-secret)`sasl[].aws.credentials.secret` The secret for the credentials being used. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` ### [](#sasl-aws-credentials-token)`sasl[].aws.credentials.token` The token for the credentials being used, required when using short term credentials. **Type**: `string` ### [](#sasl-aws-endpoint)`sasl[].aws.endpoint` Allows you to specify a custom endpoint for the AWS API. **Type**: `string` ### [](#sasl-aws-region)`sasl[].aws.region` The AWS region to target. **Type**: `string` ### [](#sasl-aws-tcp)`sasl[].aws.tcp` TCP socket configuration. **Type**: `object` ### [](#sasl-aws-tcp-connect_timeout)`sasl[].aws.tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-aws-tcp-keep_alive)`sasl[].aws.tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#sasl-aws-tcp-keep_alive-count)`sasl[].aws.tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#sasl-aws-tcp-keep_alive-idle)`sasl[].aws.tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-keep_alive-interval)`sasl[].aws.tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#sasl-aws-tcp-tcp_user_timeout)`sasl[].aws.tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#sasl-extensions)`sasl[].extensions` Key/value pairs to add to OAUTHBEARER authentication requests. **Type**: `string` ### [](#sasl-mechanism)`sasl[].mechanism` The SASL mechanism to use. **Type**: `string` | Option | Summary | | --- | --- | | AWS_MSK_IAM | AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. | | OAUTHBEARER | OAuth Bearer based authentication. | | PLAIN | Plain text authentication. | | REDPANDA_CLOUD_SERVICE_ACCOUNT | Redpanda Cloud Service Account authentication when running in Redpanda Cloud. | | SCRAM-SHA-256 | SCRAM based authentication as specified in RFC5802. | | SCRAM-SHA-512 | SCRAM based authentication as specified in RFC5802. | | none | Disable sasl authentication | ### [](#sasl-password)`sasl[].password` A password to provide for PLAIN or SCRAM-\* authentication. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#sasl-token)`sasl[].token` The token to use for a single session’s OAUTHBEARER authentication. **Type**: `string` **Default**: `""` ### [](#sasl-username)`sasl[].username` A username to provide for PLAIN or SCRAM-\* authentication. **Type**: `string` **Default**: `""` ### [](#schema_registry)`schema_registry` Schema registry information to publish schemas for tracing data along with the data. **Type**: `object` ### [](#schema_registry-basic_auth)`schema_registry.basic_auth` Allows you to specify basic authentication. **Type**: `object` ### [](#schema_registry-basic_auth-enabled)`schema_registry.basic_auth.enabled` Whether to use basic authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-basic_auth-password)`schema_registry.basic_auth.password` A password to authenticate with. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-basic_auth-username)`schema_registry.basic_auth.username` A username to authenticate as. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt)`schema_registry.jwt` Beta Allows you to specify JWT authentication. **Type**: `object` ### [](#schema_registry-jwt-claims)`schema_registry.jwt.claims` A value used to identify the claims that issued the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-enabled)`schema_registry.jwt.enabled` Whether to use JWT authentication in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-jwt-headers)`schema_registry.jwt.headers` Add optional key/value headers to the JWT. **Type**: `object` **Default**: `{}` ### [](#schema_registry-jwt-private_key_file)`schema_registry.jwt.private_key_file` A file with the PEM encoded via PKCS1 or PKCS8 as private key. **Type**: `string` **Default**: `""` ### [](#schema_registry-jwt-signing_method)`schema_registry.jwt.signing_method` A method used to sign the token such as RS256, RS384, RS512 or EdDSA. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth)`schema_registry.oauth` Allows you to specify open authentication via OAuth version 1. **Type**: `object` ### [](#schema_registry-oauth-access_token)`schema_registry.oauth.access_token` A value used to gain access to the protected resources on behalf of the user. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-access_token_secret)`schema_registry.oauth.access_token_secret` A secret provided in order to establish ownership of a given access token. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_key)`schema_registry.oauth.consumer_key` A value used to identify the client to the service provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-consumer_secret)`schema_registry.oauth.consumer_secret` A secret used to establish ownership of the consumer key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth-enabled)`schema_registry.oauth.enabled` Whether to use OAuth version 1 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2)`schema_registry.oauth2` Allows you to specify open authentication via OAuth version 2 using the client credentials token flow. **Type**: `object` ### [](#schema_registry-oauth2-client_key)`schema_registry.oauth2.client_key` A value used to identify the client to the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-client_secret)`schema_registry.oauth2.client_secret` A secret used to establish ownership of the client key. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-oauth2-enabled)`schema_registry.oauth2.enabled` Whether to use OAuth version 2 in requests. **Type**: `bool` **Default**: `false` ### [](#schema_registry-oauth2-endpoint_params)`schema_registry.oauth2.endpoint_params` A list of optional endpoint parameters, values should be arrays of strings. **Type**: `object` **Default**: `{}` ```yaml # Examples: endpoint_params: audience: - https://example.com resource: - https://api.example.com ``` ### [](#schema_registry-oauth2-scopes)`schema_registry.oauth2.scopes[]` A list of optional requested permissions. **Type**: `array` **Default**: `[]` ### [](#schema_registry-oauth2-token_url)`schema_registry.oauth2.token_url` The URL of the token provider. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls)`schema_registry.tls` Custom TLS settings can be used to override system defaults. **Type**: `object` ### [](#schema_registry-tls-client_certs)`schema_registry.tls.client_certs[]` A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#schema_registry-tls-client_certs-cert)`schema_registry.tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-cert_file)`schema_registry.tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key)`schema_registry.tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-key_file)`schema_registry.tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#schema_registry-tls-client_certs-password)`schema_registry.tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#schema_registry-tls-enable_renegotiation)`schema_registry.tls.enable_renegotiation` Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#schema_registry-tls-root_cas)`schema_registry.tls.root_cas` An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#schema_registry-tls-root_cas_file)`schema_registry.tls.root_cas_file` An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#schema_registry-tls-skip_cert_verify)`schema_registry.tls.skip_cert_verify` Whether to skip server side certificate verification. **Type**: `bool` **Default**: `false` ### [](#schema_registry-url)`schema_registry.url` The base URL of the schema registry service. **Type**: `string` ### [](#seed_brokers)`seed_brokers[]` A list of broker addresses to connect to in order. Use commas to separate multiple addresses in a single list item. **Type**: `array` ```yaml # Examples: seed_brokers: - "localhost:9092" # --- seed_brokers: - "foo:9092" - "bar:9092" # --- seed_brokers: - "foo:9092,bar:9092" ``` ### [](#service)`service` The service name to identify this Redpanda Connect instance in traces. This appears in trace visualizations and helps correlate traces across distributed systems. Use descriptive names like `order-processor` or `analytics-pipeline`. **Type**: `string` **Default**: `redpanda-connect` ### [](#tags)`tags` Custom key-value tags to attach to all traces from this instance. Use tags to add metadata like environment (`production`, `staging`), region, version, or instance identifiers. Tags appear as resource attributes in OpenTelemetry traces. **Type**: `string` **Default**: `{}` ### [](#tcp)`tcp` Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for: - **High-latency networks**: Increase `connect_timeout` to allow more time for connection establishment - **Long-lived connections**: Configure `keep_alive` settings to detect and recover from stale connections - **Unstable networks**: Tune keep-alive probes to balance between quick failure detection and avoiding false positives - **Linux systems with specific requirements**: Use `tcp_user_timeout` (Linux 2.6.37+) to control data acknowledgment timeouts Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements. **Type**: `object` ### [](#tcp-connect_timeout)`tcp.connect_timeout` Maximum amount of time a dial will wait for a connect to complete. Zero disables. **Type**: `string` **Default**: `0s` ### [](#tcp-keep_alive)`tcp.keep_alive` TCP keep-alive probe configuration. **Type**: `object` ### [](#tcp-keep_alive-count)`tcp.keep_alive.count` Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9. **Type**: `int` **Default**: `9` ### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle` Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes. **Type**: `string` **Default**: `15s` ### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval` Duration between keep-alive probes. Zero defaults to 15s. **Type**: `string` **Default**: `15s` ### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout` Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables. **Type**: `string` **Default**: `0s` ### [](#timeout)`timeout` The maximum time to wait for trace messages to be acknowledged by the broker before considering the write failed. **Type**: `string` **Default**: `10s` ### [](#tls)`tls` Configure Transport Layer Security (TLS) settings to secure network connections. This includes options for standard TLS as well as mutual TLS (mTLS) authentication where both client and server authenticate each other using certificates. Key configuration options include `enabled` to enable TLS, `client_certs` for mTLS authentication, `root_cas`/`root_cas_file` for custom certificate authorities, and `skip_cert_verify` for development environments. **Type**: `object` ### [](#tls-client_certs)`tls.client_certs[]` A list of client certificates for mutual TLS (mTLS) authentication. Configure this field to enable mTLS, authenticating the client to the server with these certificates. You must set `tls.enabled: true` for the client certificates to take effect. **Certificate pairing rules**: For each certificate item, provide either: - Inline PEM data using both `cert` **and** `key` or - File paths using both `cert_file` **and** `key_file`. Mixing inline and file-based values within the same item is not supported. **Type**: `object` **Default**: `[]` ```yaml # Examples: client_certs: - cert: foo key: bar # --- client_certs: - cert_file: ./example.pem key_file: ./example.key ``` ### [](#tls-client_certs-cert)`tls.client_certs[].cert` A plain text certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file` The path of a certificate to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key)`tls.client_certs[].key` A plain text certificate key to use. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-key_file)`tls.client_certs[].key_file` The path of a certificate key to use. **Type**: `string` **Default**: `""` ### [](#tls-client_certs-password)`tls.client_certs[].password` A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: password: foo # --- password: ${KEY_PASSWORD} ``` ### [](#tls-enable_renegotiation)`tls.enable_renegotiation` Whether to allow the remote server to request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`. **Type**: `bool` **Default**: `false` ### [](#tls-enabled)`tls.enabled` Whether to use TLS for the connection to the Redpanda cluster. **Type**: `bool` **Default**: `false` ### [](#tls-root_cas)`tls.root_cas` Specify a root certificate authority to use (optional). This is a string that represents a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for inline certificate data or `root_cas_file` for file-based certificate loading. > ⚠️ **CAUTION** > > This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](../../../configuration/secret-management/) before adding it to your configuration. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### [](#tls-root_cas_file)`tls.root_cas_file` Specify the path to a root certificate authority file (optional). This is a file, often with a `.pem` extension, which contains a certificate chain from the parent-trusted root certificate, through possible intermediate signing certificates, to the host certificate. Use either this field for file-based certificate loading or `root_cas` for inline certificate data. **Type**: `string` **Default**: `""` ```yaml # Examples: root_cas_file: ./root_cas.pem ``` ### [](#tls-skip_cert_verify)`tls.skip_cert_verify` Whether to skip server-side certificate verification. Set to `true` only for testing environments as this reduces security by disabling certificate validation. When using self-signed certificates or in development, this may be necessary, but should never be used in production. Consider using `root_cas` or `root_cas_file` to specify trusted certificates instead of disabling verification entirely. **Type**: `bool` **Default**: `false` ### [](#topic)`topic` The Redpanda topic where trace data is written. This topic should be dedicated to traces and configured with appropriate retention policies. Default: `otel-traces` **Type**: `string` **Default**: `otel-traces` --- # Page 307: Configuration **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/about.md --- # Configuration --- title: Configuration latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/about.adoc description: Learn about different options for configuring Redpanda Connect. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect pipelines are configured in a YAML file that consists of a number of root sections, arranged like so: #### Common ```yaml input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup pipeline: processors: - mapping: | root.message = this root.meta.link_count = this.links.length() output: aws_s3: bucket: TODO path: '${! meta("kafka_topic") }/${! json("message.id") }.json' ``` #### Full ```yaml http: address: 0.0.0.0:4195 debug_endpoints: false input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup buffer: none: {} pipeline: processors: - mapping: | root.message = this root.meta.link_count = this.links.length() output: aws_s3: bucket: TODO path: '${! meta("kafka_topic") }/${! json("message.id") }.json' input_resources: [] cache_resources: [] processor_resources: [] rate_limit_resources: [] output_resources: [] logger: level: INFO static_fields: '@service': benthos metrics: prometheus: {} tracer: none: {} shutdown_timeout: 20s shutdown_delay: "" ``` Most sections represent a component type, which you can read about in more detail in [this document](../../components/about/). These types are hierarchical. For example, an `input` can have a list of child `processor` types attached to it, which in turn can have their own `processor` children. This is powerful but can potentially lead to large and cumbersome configuration files. This document outlines tooling provided by Redpanda Connect to help with writing and managing these more complex configuration files. ## [](#testing)Testing For guidance on how to write and run unit tests for your configuration files read [this guide](../unit_testing/). ## [](#customizing-your-configuration)Customizing your configuration Sometimes it’s useful to write a configuration where certain fields can be defined during deployment. For this purpose Redpanda Connect supports [environment variable interpolation](../interpolation/), allowing you to set fields in your config with environment variables like so: ```yaml input: kafka: addresses: - ${KAFKA_BROKER:localhost:9092} topics: - ${KAFKA_TOPIC:default-topic} ``` This is very useful for sharing configuration files across different deployment environments. ## [](#labels)Labels Labels are unique, user-defined identifiers used throughout Redpanda Connect configurations. They serve two purposes: - **Reference:** Allow different parts of your pipeline to refer to specific components or resources. - **Readability:** Make your configuration more understandable for humans, especially in complex deployments. You can assign labels to most pipeline components, including resources, inputs, outputs, processors, and entire pipelines. Using clear, descriptive labels improves both maintainability and clarity. Labels are commonly applied to the following components: ### [](#resources)Resources Labels identify [reusable resources](#reuse) such as processors, caches, and rate limiters, making them easy to reference elsewhere in your pipeline. ```yaml processor_resources: - label: my-transformer # Processor resource label mapping: 'root = content().uppercase()' cache_resources: - label: user-cache # Cache resource label memory: default_ttl: 300s rate_limit_resources: - label: api-limiter # Rate limiter resource label local: count: 100 interval: 1m ``` ### [](#component-labeling-for-clarity)Component labeling for clarity You can also use labels on inputs, outputs, processors, and other components to improve the human-readability of your configuration and make troubleshooting easier. For example: ```yaml input: label: ingest_api http_server: {} pipeline: label: user_data_ingest processors: - label: sanitize_fields mapping: 'root = this.trim()' - resource: my-transformer ``` ## [](#label-naming-requirements)Label naming requirements Labels must meet the following criteria: - **Length**: 3-128 characters - **Allowed characters**: Alphanumeric, hyphens, and underscores (`A-Za-z0-9-_`) - **Case sensitivity**: Labels are case-sensitive Example valid labels my-processor data\_transformer\_01 UserAnalytics-v2 Example invalid labels ab // Too short (less than 3 characters) my.processor // Invalid character: period my processor // Invalid character: space ## [](#reuse)Reusing configuration snippets Sometimes it’s necessary to use a rather large component multiple times. Instead of copy/pasting the configuration or using YAML anchors you can define your component as a resource. In the following example we want to make an HTTP request with our payloads. Occasionally the payload might get rejected due to garbage within its contents, and so we catch these rejected requests, attempt to "cleanse" the contents and try to make the same HTTP request again. Since the HTTP request component is quite large (and likely to change over time) we make sure to avoid duplicating it by defining it as a resource `get_foo`: ```yaml pipeline: processors: - resource: get_foo - catch: - mapping: | root = this root.content = this.content.strip_html() - resource: get_foo processor_resources: - label: get_foo http: url: http://example.com/foo verb: POST headers: SomeThing: "set-to-this" SomeThingElse: "set-to-something-else" ``` ## [](#shutting-down)Shutting down Under normal operating conditions, the Redpanda Connect process will shut down when there are no more messages produced by inputs and the final message has been processed. The shutdown procedure can also be initiated by sending the process a interrupt (`SIGINT`) or termination (`SIGTERM`) signal. There are two top-level configuration options that control the shutdown behavior: `shutdown_timeout` and `shutdown_delay`. ### [](#shutdown-delay)Shutdown delay The `shutdown_delay` option can be used to delay the start of the shutdown procedure. This is useful for pipelines that need a short grace period to have their metrics and traces scraped. While the shutdown delay is in effect, the HTTP metrics endpoint continues to be available for scraping and any active tracers are free to flush remaining traces. The shutdown delay can be interrupted by sending the Redpanda Connect process a second OS interrupt or termination signal. ### [](#shutdown-timeout)Shutdown timeout The `shutdown_timeout` option sets a hard deadline for Redpanda Connect process to gracefully terminate. If this duration is exceeded then the process is forcefully terminated and any messages that were in-flight will be dropped. This option takes effect after the `shutdown_delay` duration has passed if that is enabled. --- # Page 308: Message Batching **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/batching.md --- # Message Batching --- title: Message Batching latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/batching page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/batching.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/batching.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect is able to join sources and sinks with sometimes conflicting batching behaviors without sacrificing its strong delivery guarantees. It’s also able to perform powerful [processing functions](../windowed_processing/) across batches of messages such as grouping, archiving and reduction. Therefore, batching within Redpanda Connect is a mechanism that serves multiple purposes: 1. [Performance (throughput)](#performance) 2. [Grouped message processing](#grouped-message-processing) 3. [Compatibility (mixing multi and single part message protocols)](#compatibility) ## [](#performance)Performance For most users the only benefit of batching messages is improving throughput over your output protocol. For some protocols this can happen in the background and requires no configuration from you. However, if an output has a `batching` configuration block this means it benefits from batching and requires you to specify how you’d like your batches to be formed by configuring a [batching policy](#batch-policy): ```yaml output: kafka: addresses: [ todo:9092 ] topic: benthos_stream # Either send batches when they reach 10 messages or when 100ms has passed # since the last batch. batching: count: 10 period: 100ms ``` However, a small number of inputs such as [`kafka`](../../components/inputs/kafka/) must be consumed sequentially (in this case by partition) and therefore benefit from specifying your batch policy at the input level instead: ```yaml input: kafka: addresses: [ todo:9092 ] topics: [ benthos_input_stream ] batching: count: 10 period: 100ms output: kafka: addresses: [ todo:9092 ] topic: benthos_stream ``` Inputs that behave this way are documented as such and have a `batching` configuration block. Sometimes you may prefer to create your batches before processing in order to benefit from [batch wide processing](#grouped-message-processing), in which case if your input doesn’t already support [a batch policy](#batch-policy) you can instead use a [`broker`](../../components/inputs/broker/), which also allows you to combine inputs with a single batch policy: ```yaml input: broker: inputs: - resource: foo - resource: bar batching: count: 50 period: 500ms ``` This also works the same with [output brokers](../../components/outputs/broker/). ## [](#grouped-message-processing)Grouped message processing And some processors such as [`while`](../../components/processors/while/) are executed once across a whole batch, you can avoid this behavior with the [`for_each` processor](../../components/processors/for_each/): ```yaml pipeline: processors: - for_each: - while: at_least_once: true max_loops: 0 check: errored() processors: - catch: [] # Wipe any previous error - resource: foo # Attempt this processor until success ``` There’s a vast number of processors that specialise in operations across batches such as [grouping](../../components/processors/group_by/) and [archiving](../../components/processors/archive/). For example, the following processors group a batch of messages according to a metadata field and compresses them into separate `.tar.gz` archives: ```yaml pipeline: processors: - group_by_value: value: ${! meta("kafka_partition") } - archive: format: tar - compress: algorithm: gzip output: aws_s3: bucket: TODO path: docs/${! meta("kafka_partition") }/${! count("files") }-${! timestamp_unix_nano() }.tar.gz ``` For more examples of batched (or windowed) processing check out [this document](../windowed_processing/). ## [](#compatibility)Compatibility Redpanda Connect is able to read and write over protocols that support multiple part messages, and all payloads travelling through Redpanda Connect are represented as a multiple part message. Therefore, all components within Redpanda Connect are able to work with multiple parts in a message as standard. When messages reach an output that _doesn’t_ support multiple parts the message is broken down into an individual message per part, and then one of two behaviors happen depending on the output. If the output supports batch sending messages then the collection of messages are sent as a single batch. Otherwise, Redpanda Connect falls back to sending the messages sequentially in multiple, individual requests. This behavior means that not only can multiple part message protocols be easily matched with single part protocols, but also the concept of multiple part messages and message batches are interchangeable within Redpanda Connect. ### [](#shrinking-batches)Shrinking batches A message batch (or multiple part message) can be broken down into smaller batches using the [`split`](../../components/processors/split/) processor: ```yaml input: # Consume messages that arrive in three parts. resource: foo processors: # Drop the third part - select_parts: parts: [ 0, 1 ] # Then break our message parts into individual messages - split: size: 1 ``` This is also useful when your input source creates batches that are too large for your output protocol: ```yaml input: aws_s3: bucket: todo pipeline: processors: - decompress: algorithm: gzip - unarchive: format: tar # Limit batch sizes to 5MB - split: byte_size: 5_000_000 ``` ## [](#batch-policy)Batch policy When an input or output component has a config field `batching` that means it supports a batch policy. This is a mechanism that allows you to configure exactly how your batching should work on messages before they are routed to the input or output it’s associated with. Batches are considered complete and will be flushed downstream when either of the following conditions are met: - The `byte_size` field is non-zero and the total size of the batch in bytes matches or exceeds it (disregarding metadata.) - The `count` field is non-zero and the total number of messages in the batch matches or exceeds it. - A message added to the batch causes the [`check`](../../guides/bloblang/about/) to return to `true`. - The `period` field is non-empty and the time since the last batch exceeds its value. This allows you to combine conditions: ```yaml output: kafka: addresses: [ todo:9092 ] topic: benthos_stream # Either send batches when they reach 10 messages or when 100ms has passed # since the last batch. batching: count: 10 period: 100ms ``` > ⚠️ **CAUTION** > > A batch policy has the capability to _create_ batches, but not to break them down. If your configured pipeline is processing messages that are batched _before_ they reach the batch policy then they may circumvent the conditions you’ve specified here, resulting in sizes you aren’t expecting. If you are affected by this limitation then consider breaking the batches down with a [`split` processor](../../components/processors/split/) before they reach the batch policy. ### [](#post-batch-processing)Post-batch processing A batch policy also has a field `processors` which allows you to define an optional list of [processors](../../components/processors/about/) to apply to each batch before it is flushed. This is a good place to aggregate or archive the batch into a compatible format for an output: ```yaml output: http_client: url: http://localhost:4195/post batching: count: 10 processors: - archive: format: lines ``` The above config will batch up messages and then merge them into a line delimited format before sending it over HTTP. This is an easier format to parse than the default which would have been [rfc1342](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html). During shutdown any remaining messages waiting for a batch to complete will be flushed down the pipeline. --- # Page 309: Contextual Variables **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/contextual-variables.md --- # Contextual Variables --- title: Contextual Variables latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/contextual-variables page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/contextual-variables.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/contextual-variables.adoc description: Learn about the advantages of using contextual variables, and how to add them to your data pipelines. page-git-created-date: "2025-01-09" page-git-modified-date: "2025-08-08" --- Learn about the advantages of using contextual variables, and how to add them to your data pipelines. ## [](#understanding-contextual-variables)Understanding contextual variables Contextual variables provide an easy way to access information about the environment in which a data pipeline is running and the pipeline itself. You can add any of the following contextual variables to your pipeline configurations: | Contextual variable name | Description | | --- | --- | | ${REDPANDA_BROKERS} | The bootstrap server address of the cluster on which the data pipeline is running. | | ${REDPANDA_ID} | The ID of the cluster on which the data pipeline is running. | | ${REDPANDA_REGION} | The cloud region where the data pipeline is deployed. | | ${REDPANDA_PIPELINE_ID} | The ID of the data pipeline that is currently running. | | ${REDPANDA_PIPELINE_NAME} | The display name of the data pipeline that is currently running. | | ${REDPANDA_SCHEMA_REGISTRY_URL} | The URL of the Schema Registry associated with the cluster on which the data pipeline is running. | Contextual variables are automatically set at runtime, which means that you can reuse them across multiple pipelines and development environments. For example, if you add the contextual variable `${REDPANDA_ID}` to a pipeline configuration, it’s always set to the ID of the cluster on which the data pipeline is running, whether the pipeline is in your development, user acceptance testing, or production environment. This increases the portability of pipeline configurations and reduces maintenance overheads. You can also use contextual variables to improve data traceability. See the [Example pipeline configuration](#example-pipeline-configuration) for full details. ## [](#add-contextual-variable-to-a-data-pipeline)Add contextual variable to a data pipeline Add a contextual variable to any pipeline configuration using the notation `${CONTEXTUAL_VARIABLE_NAME}`, for example: ```yaml output: kafka_franz: seed_brokers: - ${REDPANDA_BROKERS} ``` ### [](#example-pipeline-configuration)Example pipeline configuration For improved data traceability, the following pipeline configuration adds the data pipeline display name (`${REDPANDA_PIPELINE_NAME}`) and ID (`${REDPANDA_PIPELINE_ID}`) to all messages that are processed. The configuration also uses the `$REDPANDA_BROKERS` contextual variable to automatically populate the bootstrap server address of the cluster on which the pipeline is run, which allows Redpanda Connect to write updated messages to the `data` topic defined in the `kafka_franz` output. ```yaml input: generate: mapping: | root.data = "test message" interval: 10s pipeline: processors: - bloblang: | root = this root.source = "${REDPANDA_PIPELINE_NAME}" root.source_id = "${REDPANDA_PIPELINE_ID}" output: kafka_franz: seed_brokers: - ${REDPANDA_BROKERS} topic: data tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: cluster-username password: cluster-password ``` ## [](#suggested-reading)Suggested reading - Learn how to [add secrets to your pipeline](../secret-management/). - Try one of our [Redpanda Connect cookbooks](../../cookbooks/). - Choose [connectors for your use case](../../components/about/). --- # Page 310: Error Handling **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/error_handling.md --- # Error Handling --- title: Error Handling latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/error_handling page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/error_handling.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/error_handling.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Redpanda Connect supports a range of [processors](../../components/processors/about/), such as `http` and `aws_lambda`, that may fail when retry attempts are exhausted. When a processor fails, the message data continues through the pipeline mostly unchanged, except for the addition of a metadata flag, which you can use for handling errors. This topic explains some common error-handling patterns, including dropping messages, recovering them with more processing, and routing them to a dead-letter queue. It also shows how to combine these approaches, where appropriate. ## [](#abandon-on-failure)Abandon on failure You can use the [`try` processor](../../components/processors/try/) to define a list of processors that are executed in sequence. If a processor fails for a particular message, that message skips the remaining processors. For example: - If `processor_1` fails to process a message, that message skips `processor_2` and `processor_3`. - If a message is processed by `processor_1`, but `processor_2` fails, that message skips `processor_3`, and so on. ```yaml pipeline: processors: - try: - resource: processor_1 - resource: processor_2 # Skip if processor_1 fails - resource: processor_3 # Skip if processor_1 or processor_2 fails ``` ## [](#recover-failed-messages)Recover failed messages You can also route failed messages through defined processing steps using a [`catch` processor](../../components/processors/catch/). For example, if `processor_1` fails to process a message, it is rerouted to `processor_2`. ```yaml pipeline: processors: - resource: processor_1 # Processor that might fail - catch: - resource: processor_2 # Processes rerouted messages ``` After messages complete all processing steps defined in the `catch` block, failure flags are removed and they are treated like regular messages. To keep failure flags in messages, you can simulate a `catch` block using a [`switch` processor](../../components/processors/switch/): ```yaml pipeline: processors: - resource: processor_1 # Processor that might fail - switch: - check: errored() processors: - resource: processor_2 # Processes rerouted messages ``` ## [](#logging-errors)Logging errors When an error occurs, there may be useful information stored in the error flag. You can use [`error`](../../guides/bloblang/functions/#error) Bloblang function interpolations to write this information to logs. You can also add the following Bloblang functions to expose additional details about the processor that triggered the error. - [`error_source_label`](../../guides/bloblang/functions/#error_source_label) - [`error_source_name`](../../guides/bloblang/functions/#error_source_name) - [`error_source_path`](../../guides/bloblang/functions/#error_source_path) For example, this configuration catches processor failures and writes the following information to logs: - The label of the processor (`${!error_source_label()}`) that failed - The cause of the failure (`${!error()}`) ```yaml pipeline: processors: - try: - resource: processor_1 # Processor that might fail - resource: processor_2 # Processor that might fail - resource: processor_3 # Processor that might fail - catch: - log: message: "Processor ${!error_source_label()} failed due to: ${!error()}" ``` You could also add an error message to the message payload: ```yaml pipeline: processors: - resource: processor_1 # Processor that might fail - resource: processor_2 # Processor that might fail - resource: processor_3 # Processor that might fail - catch: - mapping: | root = this root.meta.error = error() ``` ## [](#attempt-until-success)Attempt until success To process a particular message until it is successful, try using a [`retry`](../../components/processors/retry/) processor: ```yaml pipeline: processors: - retry: backoff: initial_interval: 1s max_interval: 5s max_elapsed_time: 30s processors: # Retries this processor until the message is processed, or the maximum elapsed time is reached. - resource: processor_1 ``` ## [](#drop-failed-messages)Drop failed messages To filter out any failed messages from your pipeline, you can use a [`mapping` processor](../../components/processors/mapping/): ```yaml pipeline: processors: - mapping: root = if errored() { deleted() } ``` The mapping uses the error flag to identify any failed messages in a batch and drops the messages, which propagates acknowledgements (also known as "acks") upstream to the pipeline’s input. ## [](#reject-messages)Reject messages Some inputs, such as `nats`, `gcp_pubsub`, and `amqp_1`, support nacking (rejecting) messages. Rather than delivering unprocessed messages to your output, you can use the [`reject_errored` output](../../components/outputs/reject_errored/) to perform a nack (or rejection) on them: ```yaml output: reject_errored: resource: processor_1 # Only non-errored messages go here ``` ## [](#route-to-a-dead-letter-queue)Route to a dead-letter queue You can also route failed messages to a different output by nesting the [`reject_errored` output](../../components/outputs/reject_errored/) within a [`fallback` output](../../components/outputs/fallback/) ```yaml output: fallback: - reject_errored: resource: processor_1 # Only non-errored messages go here - resource: processor_2 # Only errored messages, or delivery failures to processor_1, go here ``` If you want to route data differently based on the type of error message, you can use a [`switch` output](../../components/outputs/switch/): ```yaml output: switch: cases: # Capture specifically cat-related errors - check: errored() && error().contains("meow") output: resource: processor_1 # Capture all other errors - check: errored() output: resource: processor_2 # Finally, route all successfully processed messages here - output: resource: processor_3 ``` Finally, you can attach additional metadata when routing messages to the dead-letter queue, such as the error message. This can be done by running a series of [processors](../../components/processors/about/) before sending the data to the final [output](../../components/outputs/about/). ```yaml output: fallback: - reject_errored: resource: processor_1 # Only non-errored messages go here - processors: - mutation: | root.error = @fallback_error # Adds the error message before sending the message to the dead-letter queue output resource: processor_2 # Only errored messages, or delivery failures to processor_1, go here ``` --- # Page 311: Field Paths **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/field_paths.md --- # Field Paths --- title: Field Paths latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/field_paths page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/field_paths.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/field_paths.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Many components within Redpanda Connect allow you to target certain fields using a JSON dot path. The syntax of a path within Redpanda Connect is similar to [JSON Pointers](https://tools.ietf.org/html/rfc6901), except with dot separators instead of slashes (and no leading dot.) When a path is used to set a value any path segment that does not yet exist in the structure is created as an object. For example, if we had the following JSON structure: ```json { "foo": { "bar": 21 } } ``` The query path `foo.bar` would return `21`. The characters `~` (%x7E) and `.` (%x2E) have special meaning in Redpanda Connect paths. Therefore `~` needs to be encoded as `~0` and `.` needs to be encoded as `~1` when these characters appear within a key. For example, if we had the following JSON structure: ```json { "foo.foo": { "bar~bo": { "": { "baz": 22 } } } } ``` The query path `foo~1foo.bar~0bo..baz` would return `22`. ## [](#arrays)Arrays When Redpanda Connect encounters an array while traversing a JSON structure it requires the next path segment to be either an integer of an existing index or, depending on whether the path is used to query or set the target value, the character `*` or `-` respectively. For example, if we had the following JSON structure: ```json { "foo": [ 0, 1, { "bar": 23 } ] } ``` The query path `foo.2.bar` would return `23`. ### [](#querying)Querying When a query reaches an array the character `*` indicates that the query should return the value of the remaining path from each array element (within an array.) ### [](#setting)Setting When an array is reached the character `-` indicates that a new element should be appended to the end of the existing elements, if this character is not the final segment of the path then an object is created. --- # Page 312: Interpolation **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/interpolation.md --- # Interpolation --- title: Interpolation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/interpolation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/interpolation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/interpolation.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- > 📝 **NOTE** > > Environment variables are not currently supported in Redpanda Connect in Redpanda Cloud, but you can use [contextual variables](../contextual-variables/) to access information about the environment in which a data pipeline is running, and the pipeline itself. Redpanda Connect allows you to dynamically set config fields with environment variables anywhere within a config file using the syntax `${}` (or `${:}` in order to specify a default value). This is useful for setting environment specific fields such as addresses: ```yaml input: kafka: addresses: [ "${BROKERS}" ] consumer_group: redpanda_connect_consumer topics: [ "haha_business" ] ``` ```sh BROKERS="foo:9092,bar:9092" rpk connect run ./config.yaml ``` If a literal string is required that matches this pattern (`${foo}`) you can escape it with double brackets. For example, the string `${{foo}}` is read as the literal `${foo}`. ## [](#undefined-variables)Undefined variables When an environment variable interpolation is found within a config, does not have a default value specified, and the environment variable is not defined a linting error will be reported. In order to avoid this it is possible to specify environment variable interpolations with an explicit empty default value by adding the colon without a following value, i.e. `${FOO:}` would be equivalent to `${FOO}` and would not trigger a linting error should `FOO` not be defined. ## [](#yaml-tags)YAML tags By default, Redpanda Connect interpolates environment variables as strings. You can use [YAML tags](https://yaml.org/spec/1.2.2/#24-tags) to interpret values as another scalar type, such as integers. ```yaml output: redpanda: # ... batching: count: !!int ${BATCHING_COUNT:500} period: "${BATCHING_PERIOD:1s}" ``` Redpanda Connect supports the [core schema tags](https://yaml.org/spec/1.2.2/#103-core-schema) for scalar types: - `null` - `bool` - `int` - `float` - `str` (default) ## [](#bloblang-queries)Bloblang queries Some Redpanda Connect fields also support [Bloblang](../../guides/bloblang/about/) function interpolations, which are much more powerful expressions that allow you to query the contents of messages and perform arithmetic. The syntax of a function interpolation is `${!}`, where the contents are a bloblang query (the right-hand-side of a bloblang map) including a range of [functions](../../guides/bloblang/about/#functions). For example, with the following config: ```yaml output: kafka: addresses: [ "TODO:6379" ] topic: 'dope-${! json("topic") }' ``` A message with the contents `{"topic":"foo","message":"hello world"}` would be routed to the Kafka topic `dope-foo`. If a literal string is required that matches this pattern (`${!foo}`) then, similar to environment variables, you can escape it with double brackets. For example, the string `${{!foo}}` would be read as the literal `${!foo}`. Bloblang supports arithmetic, boolean operators, coalesce and mapping expressions. For more in-depth details about the language [check out the docs](../../guides/bloblang/about/). ## [](#examples)Examples ### [](#reference-metadata)Reference metadata A common usecase for interpolated functions is dynamic routing at the output level using metadata: ```yaml output: kafka: addresses: [ TODO ] topic: ${! meta("output_topic") } key: ${! meta("key") } ``` ### [](#coalesce-and-mapping)Coalesce and mapping Bloblang supports coalesce and mapping, which makes it easy to extract values from slightly varying data structures: ```yaml pipeline: processors: - cache: resource: foocache operator: set key: '${! json().message.(foo | bar).id }' value: '${! content() }' ``` Here’s a map of inputs to resulting values: {"foo":{"a":{"baz":"from\_a"},"c":{"baz":"from\_c"}}} -> from\_a {"foo":{"b":{"baz":"from\_b"},"c":{"baz":"from\_c"}}} -> from\_b {"foo":{"b":null,"c":{"baz":"from\_c"}}} -> from\_c ### [](#delayed-processing)Delayed processing We have a stream of JSON documents each with a unix timestamp field `doc.received_at` which is set when our platform receives it. We wish to only process messages an hour _after_ they were received. We can achieve this by running the `sleep` processor using an interpolation function to calculate the seconds needed to wait for: ```yaml pipeline: processors: - sleep: duration: '${! 3600 - ( timestamp_unix() - json("doc.created_at").number() ) }s' ``` If the calculated result is less than or equal to zero the processor does not sleep at all. If the value of `doc.created_at` is a string then our method `.number()` will attempt to parse it into a number. --- # Page 313: Metadata **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/metadata.md --- # Metadata --- title: Metadata latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/metadata page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/metadata.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/metadata.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- In Redpanda Connect each message has raw contents and metadata, which is a map of key/value pairs representing an arbitrary amount of complementary data. When an input protocol supports attributes or metadata they will automatically be added to your messages, refer to the respective input documentation for a list of metadata keys. When an output supports attributes or metadata any metadata key/value pairs in a message will be sent (subject to service limits). ## [](#editing-metadata)Editing metadata Redpanda Connect allows you to add and remove metadata using the [`mapping` processor](../../components/processors/mapping/). For example, you can do something like this in your pipeline: ```yaml pipeline: processors: - mapping: | # Remove all existing metadata from messages meta = deleted() # Add a new metadata field `time` from the contents of a JSON # field `event.timestamp` meta time = event.timestamp ``` You can also use [Bloblang](../../guides/bloblang/about/) to delete individual metadata keys with: ```bloblang meta foo = deleted() ``` Or do more interesting things like remove all metadata keys with a certain prefix: ```bloblang meta = @.filter(kv -> !kv.key.has_prefix("kafka_")) ``` ## [](#using-metadata)Using metadata Metadata values can be referenced in any field that supports [interpolation functions](../interpolation/). For example, you can route messages to Kafka topics using interpolation of metadata keys: ```yaml output: kafka: addresses: [ TODO ] topic: ${! meta("target_topic") } ``` Redpanda Connect also allows you to conditionally process messages based on their metadata with the [`switch` processor](../../components/processors/switch/): ```yaml pipeline: processors: - switch: - check: '@doc_type == "nested"' processors: - sql_insert: driver: mysql dsn: foouser:foopassword@tcp(localhost:3306)/foodb table: footable columns: [ foo, bar, baz ] args_mapping: | root = [ this.document.foo, this.document.bar, @kafka_topic, ] # In: {"document":{"foo":"value1","bar":"value2"}} ``` ## [](#restricting-metadata)Restricting metadata Outputs that support metadata, headers or some other variant of enriched fields on messages will attempt to send all metadata key/value pairs by default. However, sometimes it’s useful to refer to metadata fields at the output level even though we do not wish to send them with our data. In this case it’s possible to restrict the metadata keys that are sent with the field `metadata.exclude_prefixes` within the respective output config. For example, if we were sending messages to kafka using a metadata key `target_topic` to determine the topic but we wished to prevent that metadata key from being sent as a header we could use the following configuration: ```yaml output: kafka: addresses: [ TODO ] topic: ${! meta("target_topic") } metadata: exclude_prefixes: - target_topic ``` And when the list of metadata keys that we do _not_ want to send is large it can be helpful to use a [Bloblang mapping](../../guides/bloblang/about/) in order to give all of these "private" keys a common prefix: ```yaml pipeline: processors: # Has an explicit list of public metadata keys, and everything else is given # an underscore prefix. - mapping: | let allowed_meta = [ "foo", "bar", "baz", ] meta = @.map_each_key(key -> if !$allowed_meta.contains(key) { "_" + key }) output: kafka: addresses: [ TODO ] topic: ${! meta("_target_topic") } metadata: exclude_prefixes: [ "_" ] ``` --- # Page 314: Monitor Data Pipelines on BYOC and Dedicated Clusters **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/monitor-connect.md --- # Monitor Data Pipelines on BYOC and Dedicated Clusters --- title: Monitor Data Pipelines on BYOC and Dedicated Clusters latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/monitor-connect page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/monitor-connect.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/monitor-connect.adoc description: Configure Prometheus monitoring of your data pipelines on BYOC clusters. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-12-03" --- You can configure monitoring on BYOC and Dedicated clusters to understand the behavior, health, and performance of your data pipelines. Redpanda Connect automatically exports [detailed metrics for each component of your data pipeline](../../components/metrics/about/) to a Prometheus endpoint, along with metrics for all other cluster services. You don’t need to update the configuration of your pipeline. ## [](#configure-prometheus)Configure Prometheus To monitor a BYOC cluster in [Prometheus](https://prometheus.io/): 1. On the Redpanda Cloud **Overview** page for your cluster, under **How to connect**, click the **Prometheus** tab. 2. Click the copy icon next to **Prometheus YAML** to copy the contents to your clipboard. The YAML contains the Prometheus scrape target configuration, as well as authentication, for the cluster. ```yaml - job_name: redpandaCloud-sample static_configs: - targets: - console-..fmc.cloud.redpanda.com metrics_path: /api/cloud/prometheus/public_metrics basic_auth: username: prometheus password: "" scheme: https ``` 3. Save the YAML configuration to Prometheus replacing the following placeholders: - `.`: ID and identifier from the **HTTPS endpoint**. - ``: Copy and paste the onscreen Prometheus password. Metrics from Redpanda endpoints are scraped into Prometheus. The metrics for each data pipeline are labelled by pipeline ID. ## [](#use-redpanda-monitoring-examples)Use Redpanda monitoring examples For hands-on learning, Redpanda provides a repository with examples of monitoring Redpanda with Prometheus and Grafana: [redpanda-data/observability](https://github.com/redpanda-data/observability/tree/main/cloud). ![Example Redpanda Connect Dashboard^](../../../../shared/_images/redpanda_connect_dashboard.png) It includes [an example Grafana dashboard for Redpanda Connect](https://github.com/redpanda-data/observability/blob/main/grafana-dashboards/Redpanda-Connect-Dashboard.json) and a [sandbox environment](https://github.com/redpanda-data/observability#sandbox-environment) in which you launch a Dockerized Redpanda cluster and create a custom workload to monitor with dashboards. --- # Page 315: Process Pipelines **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/processing_pipelines.md --- # Process Pipelines --- title: Process Pipelines latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/processing_pipelines page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/processing_pipelines.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/processing_pipelines.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-10-25" --- If you have processors that are heavy on CPU and aren’t specific to a certain input or output they are best suited for the pipeline section. It is advantageous to use the pipeline section as it allows you to set an explicit number of parallel threads of execution: ```yaml input: resource: foo pipeline: threads: 4 processors: - mapping: | root = this fans = fans.map_each(match { this.obsession > 0.5 => this _ => deleted() }) output: resource: bar ``` If the field `threads` is set to `-1` (the default) it will automatically match the number of logical CPUs available. By default almost all Redpanda Connect sources will utilize as many processing threads as have been configured, which makes horizontal scaling easy. --- # Page 316: Manage Pipeline Resources on Clusters **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/resource-management.md --- # Manage Pipeline Resources on Clusters --- title: Manage Pipeline Resources on Clusters latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/resource-management page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/resource-management.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/resource-management.adoc description: Learn how to set an initial resource limit for a standard data pipeline (excluding Ollama AI components) and how to manually scale the pipeline’s resources to improve performance. page-git-created-date: "2024-12-18" page-git-modified-date: "2026-02-18" --- Learn how to set an initial resource limit for a standard data pipeline (excluding Ollama AI components) and how to manually scale the pipeline’s resources to improve performance. ## [](#prerequisites)Prerequisites - A running Redpanda Cloud cluster. - An estimate of the throughput of your data pipeline. You can get some basic statistics by running your data pipeline locally using the [`benchmark` processor](../../../../../redpanda-connect/components/processors/benchmark/). ### [](#understanding-compute-units)Understanding compute units A compute unit allocates a specific amount of server resources (CPU and memory) to a data pipeline to handle message throughput. By default, each pipeline is allocated one compute unit, which includes 0.1 CPU (100 milliCPU or `100m`) and 400 MB (`400M`) of memory. For sizing purposes, one compute unit supports an estimated message throughput of 1 MB/s. However, actual performance depends on the complexity of a pipeline, including the components it contains and the processing it does. You can allocate a maximum of 72 compute units per pipeline. You can add compute units in increments of one up to 15 compute units. Beyond this, scaling options increase to 33 and then to 72 compute units. This scaling strategy is based on the number of machine cores required to provision resources, which scale from two to four, and then to eight cores. Server resources are charged at an [hourly rate in compute unit hours (compute/hour)](../../../../billing/billing/#redpanda-connect-pipeline-metrics). | Number of compute units | CPU | Memory | | --- | --- | --- | | 1 | 0.1 CPU (100m) | 400 MB (400M) | | 2 | 0.2 CPU (200m) | 800 MB (800M) | | 3 | 0.3 CPU (300m) | 1.2 GB (1200M) | | 4 | 0.4 CPU (400m) | 1.6 GB (1600M) | | 5 | 0.5 CPU (500m) | 2.0 GB (2000M) | | 6 | 0.6 CPU (600m) | 2.4 GB (2400M) | | 7 | 0.7 CPU (700m) | 2.8 GB (2800M) | | 8 | 0.8 CPU (800m) | 3.2 GB (3200M) | | 9 | 0.9 CPU (900m) | 3.6 GB (3600M) | | 10 | 1.0 CPU (1000m) | 4.0 GB (4000M) | | 11 | 1.1 CPU (1100m) | 4.4 GB (4400M) | | 12 | 1.2 CPU (1200m) | 4.8 GB (4800M) | | 13 | 1.3 CPU (1300m) | 5.2 GB (5200M) | | 14 | 1.4 CPU (1400m) | 5.6 GB (5600M) | | 15 | 1.5 CPU (1500m) | 6.0 GB (6000M) | | 33 | 3.3 CPU (3300m) | 13.2 GB (13200M) | | 72 | 7.2 CPU (7200m) | 28.8 GB (28800M) | > 📝 **NOTE** > > A GPU machine is automatically assigned to each pipeline that contains embedded Ollama AI components. By default, GPU-enabled pipelines are allocated eight compute units. For larger workloads, you can scale them up to a maximum of 30 compute units. ### [](#set-an-initial-resource-limit)Set an initial resource limit When you create a data pipeline, you can allocate a fixed amount of server resources to it using compute units. > 📝 **NOTE** > > If your pipeline reaches the CPU limit, it becomes throttled, which reduces the data processing rate. If it reaches the memory limit, the pipeline restarts. To set an initial resource limit: 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. On the **Clusters** page, select the cluster where you want to add a pipeline. 3. Go to the **Connect** page. 4. Select the **Redpanda Connect** tab. 5. Click **Create pipeline**. 6. Enter details for your pipeline, including a short name and description. 7. For **Compute units**, leave the default **1** compute unit to experiment with pipelines that create low message volumes. For higher throughputs, you can allocate a maximum of 72 compute units. 8. For **Configuration**, paste your pipeline configuration and click **Create** to run it. ### [](#scale-resources)Scale resources View the server resources allocated to a data pipeline, and manually scale those resources to improve performance or decrease resource consumption. To view resources already allocated to a data pipeline: #### Cloud UI 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the cluster where the pipeline is set up. 3. On the **Connect** page, select your pipeline and look at the value for **Resources**. - CPU resources are displayed first, in milliCPU. For example, `1` compute unit is `100m` or 0.1 CPU. - Memory is displayed next in megabytes. For example, `1` compute unit is `400M` or 400 MB. #### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`GET /v1/redpanda-connect/pipelines`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_listpipelines), which lists details of all pipelines on your cluster by ID. - Memory (`memory_shares`) is displayed in megabytes. For example, `1` compute unit is `400M` or 400 MB. - CPU resources (`cpu_shares`) are displayed in milliCPU. For example, `1` compute unit is `100m` or 0.1 CPU. To scale the resources for a pipeline: #### Cloud UI 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the cluster where the pipeline is set up. 3. On the **Connect** page, select your pipeline and click **Edit**. 4. For **Compute units**, update the number of compute units. You can allocate a maximum of 72 compute units per pipeline. 5. Click **Update** to apply your changes. The specified resources are available immediately. #### Data Plane API You can only update CPU resources using the Data Plane API. For every 0.1 CPU that you allocate, Redpanda Cloud automatically reserves 400 MB of memory for the exclusive use of the pipeline. 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API, if you haven’t already. 2. Make a request to [`GET /v1/redpanda-connect/pipelines/{id}`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_getpipeline), including the ID of the pipeline you want to update. You’ll use the returned values in the next step. 3. Now make a request to [`PUT /v1/redpanda-connect/pipelines/{id}`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_updatepipeline), to update the pipeline resources: - Reuse the values returned by your `GET` request to populate the request body. - Replace the `cpu_shares` value with the resources you want to allocate, and enter any valid value for `memory_shares`. This example allocates 0.2 CPU or 200 milliCPU to a data pipeline. For `cpu_shares`, `0.1` CPU is the minimum allocation. ```bash curl -X PUT "https:///v1/redpanda-connect/pipelines/xxx..." \ -H 'accept: application/json'\ -H 'authorization: Bearer xxx...' \ -H "content-type: application/json" \ -d '{ "config_yaml": "input:\n generate:\n interval: 1s\n mapping: |\n root.id = uuid_v4()\n root.user.name = fake(\"name\")\n root.user.email = fake(\"email\")\n root.content = fake(\"paragraph\")\n\npipeline:\n processors:\n - mutation: |\n root.title = \"PRIVATE AND CONFIDENTIAL\"\n\noutput:\n kafka_franz:\n seed_brokers:\n - seed-j888.byoc.prd.cloud.redpanda.com:9092\n sasl:\n mechanism: SCRAM-SHA-256\n password: password\n username: connect\n topic: processed-emails\n tls:\n enabled: true\n", "description": "Email processor", "display_name": "emailprocessor-pipeline", "resources": { "memory_shares": "800M", "cpu_shares": "200m" } }' ``` A successful response shows the updated resource allocations with the `cpu_shares` value returned in milliCPU. 4. Make a request to [`GET /v1/redpanda-connect/pipelines`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_listpipelines) to verify your pipeline resource updates. --- # Page 317: Manage Secrets **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/secret-management.md --- # Manage Secrets --- title: Manage Secrets latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/secret-management page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/secret-management.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/secret-management.adoc description: Learn how to manage secrets in Redpanda Connect using the Cloud UI or Data Plane API, and how to add them to your data pipelines. page-git-created-date: "2024-12-03" page-git-modified-date: "2026-02-18" --- Learn how to manage secrets in Redpanda Connect, and how to add them to your data pipelines without exposing them. Secrets are stored in the secret management solution of your cloud provider and are retrieved when you run a pipeline configuration that references them. ## [](#manage-secrets)Manage secrets You can manage secrets from the Cloud UI or the Data Plane API. ### [](#create-a-secret)Create a secret You can create a secret and reference it in multiple data pipelines on the same cluster. #### Cloud UI 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the **Secrets Store** page. 3. Click **Create secret**. 4. For **ID**, enter a name for the secret. You cannot rename the secret once it is created. 5. For **Value**, enter the secret you need to add. 6. For **Scopes**, select Redpanda Connect. 7. Optionally, add labels to help organize your secrets. 8. Click **Create**. You can now [add the secret to your data pipeline](#add-a-secret-to-a-data-pipeline). #### Data Plane API You must use a Base64-encoded secret. 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`POST /v1/secrets`](/api/doc/cloud-dataplane/operation/operation-secretservice_createsecret). ```bash curl -X POST "https:///v1/secrets" \ -H 'accept: application/json'\ -H 'authorization: Bearer '\ -H 'content-type: application/json' \ -d '{"id":"","scopes":["SCOPE_REDPANDA_CONNECT"],"secret_data":""}' ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The API key you generated during authentication. - ``: The ID or name of the secret you want to add. Use only the following characters: `^[A-Z][A-Z0-9_]*$`. - ``: The Base64-encoded secret. - This scope: `"SCOPE_REDPANDA_CONNECT"`. The response returns the name of the secret and the scope `"SCOPE_REDPANDA_CONNECT"`. You can now [add the secret to your data pipeline](#add-a-secret-to-a-data-pipeline). ### [](#update-a-secret)Update a secret You can only update the secret value, not its name. > 📝 **NOTE** > > Changes to secret values do not take effect until a pipeline is restarted. #### Cloud UI 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the **Secrets Store** page. 3. Find the secret you want to update, and click the edit icon. 4. Enter the new secret value or labels, and click **Update**. 5. Start and stop any pipelines that reference the secret. #### Data Plane API You must use a Base64-encoded secret. 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`PUT /v1/secrets/{id}`](/api/doc/cloud-dataplane/operation/operation-secretservice_updatesecret). ```bash curl -X PUT "https:///v1/secrets/" \ -H 'accept: application/json'\ -H 'authorization: Bearer '\ -H 'content-type: application/json' \ -d '{"scopes":["SCOPE_REDPANDA_CONNECT"],"secret_data":""}' ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The name of the secret you want to update. - ``: The API key you generated during authentication. - This scope: `"SCOPE_REDPANDA_CONNECT"`. - ``: Your new Base64-encoded secret. The response returns the name of the secret and the scope `"SCOPE_REDPANDA_CONNECT"`. ### [](#delete-a-secret)Delete a secret Before you delete a secret, make sure that you remove references to it from your data pipelines. > 📝 **NOTE** > > Changes do not affect pipelines that are already running. #### Cloud UI 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the **Secrets Store** page. 3. Find the secret you want to remove, and click the delete icon. 4. Confirm your deletion. #### Data Plane API 1. [Authenticate and get the base URL](/api/doc/cloud-dataplane/topic/topic-quickstart) for the Data Plane API. 2. Make a request to [`DELETE /v1/secrets/{id}`](/api/doc/cloud-dataplane/operation/operation-secretservice_deletesecret). ```bash curl -X DELETE "https:///v1/secrets/" \ -H 'accept: application/json'\ -H 'authorization: Bearer '\ ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The name of the secret you want to delete. - ``: The API key you generated during authentication. ## [](#add-a-secret-to-a-data-pipeline)Add a secret to a data pipeline ### Cloud UI 1. Go to the **Connect** page, and create a pipeline (or open an existing pipeline to edit). 2. Click the **Secret** button to add a new or existing secret to the pipeline. ### Data Plane API You can add a secret to any pipeline in your cluster using the notation `${secrets.SECRET_NAME}`. For example: ```yml sasl: - mechanism: SCRAM-SHA-256 username: "user" password: "${secrets.PASSWORD}" ``` --- # Page 318: Unit Testing **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/unit_testing.md --- # Unit Testing --- title: Unit Testing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/unit_testing page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/unit_testing.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/unit_testing.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- The Redpanda Connect service offers a command `rpk connect test` for running unit tests on sections of a configuration file. This makes it easy to protect your config files from regressions over time. ## [](#writing-a-test)Writing a test Let’s imagine we have a configuration file `foo.yaml` containing some processors: ```yaml input: kafka: addresses: [ TODO ] topics: [ foo, bar ] consumer_group: foogroup pipeline: processors: - mapping: '"%vend".format(content().uppercase().string())' output: aws_s3: bucket: TODO path: '${! meta("kafka_topic") }/${! json("message.id") }.json' ``` One way to write our unit tests for this config is to accompany it with a file of the same name and extension but suffixed with `_benthos_test`, which in this case would be `foo_benthos_test.yaml`. ```yml tests: - name: example test target_processors: '/pipeline/processors' environment: {} input_batch: - content: 'example content' metadata: example_key: example metadata value output_batches: - - content_equals: EXAMPLE CONTENTend metadata_equals: example_key: example metadata value ``` Under `tests` we have a list of any number of unit tests to execute for the config file. Each test is run in complete isolation, including any resources defined by the config file. Tests should be allocated a unique `name` that identifies the feature being tested. The field `target_processors` is either the label of a processor to test, or a [JSON Pointer](https://tools.ietf.org/html/rfc6901) that identifies the position of a processor, or list of processors, within the file which should be executed by the test. For example a value of `foo` would target a processor with the label `foo`, and a value of `/input/processors` would target all processors within the input section of the config. The field `environment` allows you to define an object of key/value pairs that set environment variables to be evaluated during the parsing of the target config file. These are unique to each test, allowing you to test different environment variable interpolation combinations. The field `input_batch` lists one or more messages to be fed into the targeted processors as a batch. Each message of the batch may have its raw content defined as well as metadata key/value pairs. For the common case where the messages are in JSON format, you can use `json_content` instead of `content` to specify the message structurally rather than verbatim. The field `output_batches` lists any number of batches of messages which are expected to result from the target processors. Each batch lists any number of messages, each one defining [`conditions`](#output-conditions) to describe the expected contents of the message. If the number of batches defined does not match the resulting number of batches the test will fail. If the number of messages defined in each batch does not match the number in the resulting batches the test will fail. If any condition of a message fails then the test fails. ### [](#inline-tests)Inline tests Sometimes it’s more convenient to define your tests within the config being tested. This is fine, simply add the `tests` field to the end of the config being tested. ### [](#bloblang-tests)Bloblang tests Sometimes when working with large [Bloblang mappings](../../guides/bloblang/about/) it’s preferred to have the full mapping in a separate file to your Redpanda Connect configuration. In this case it’s possible to write unit tests that target and execute the mapping directly with the field `target_mapping`, which when specified is interpreted as either an absolute path or a path relative to the test definition file that points to a file containing only a Bloblang mapping. For example, if we were to have a file `cities.blobl` containing a mapping: ```bloblang root.Cities = this.locations. filter(loc -> loc.state == "WA"). map_each(loc -> loc.name). sort().join(", ") ``` We can accompany it with a test file `cities_test.yaml` containing a regular test definition: ```yml tests: - name: test cities mapping target_mapping: './cities.blobl' environment: {} input_batch: - content: | { "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ] } output_batches: - - json_equals: {"Cities": "Bellevue, Olympia, Seattle"} ``` And execute this test the same way we execute other Redpanda Connect tests (`rpk connect test ./dir/cities_test.yaml`, `rpk connect test ./dir/…​`, etc). ### [](#fragmented-tests)Fragmented tests Sometimes the number of tests you need to define in order to cover a config file is so vast that it’s necessary to split them across multiple test definition files. This is possible but Redpanda Connect still requires a way to detect the configuration file being targeted by these fragmented test definition files. In order to do this we must prefix our `target_processors` field with the path of the target relative to the definition file. The syntax of `target_processors` in this case is a full [JSON Pointer](https://tools.ietf.org/html/rfc6901) that should look something like `target.yaml#/pipeline/processors`. For example, if we saved our test definition above in an arbitrary location like `./tests/first.yaml` and wanted to target our original `foo.yaml` config file, we could do that with the following: ```yml tests: - name: example test target_processors: '../foo.yaml#/pipeline/processors' environment: {} input_batch: - content: 'example content' metadata: example_key: example metadata value output_batches: - - content_equals: EXAMPLE CONTENTend metadata_equals: example_key: example metadata value ``` ## [](#input-definitions)Input Definitions ### [](#content)`content` Sets the raw content of the message. ### [](#json_content)`json_content` ```yml json_content: foo: foo value bar: [ element1, 10 ] ``` Sets the raw content of the message to a JSON document matching the structure of the value. ### [](#file_content)`file_content` ```yml file_content: ./foo/bar.txt ``` Sets the raw content of the message by reading a file. The path of the file should be relative to the path of the test file. ### [](#metadata)`metadata` A map of key/value pairs that sets the metadata values of the message. ## [](#output-conditions)Output Conditions ### [](#bloblang)`bloblang` ```yml bloblang: 'this.age > 10 && @foo.length() > 0' ``` Executes a [Bloblang expression](../../guides/bloblang/about/) on a message, if the result is anything other than a boolean equalling `true` the test fails. ### [](#content_equals)`content_equals` ```yml content_equals: example content ``` Checks the full raw contents of a message against a value. ### [](#content_matches)`content_matches` ```yml content_matches: "^foo [a-z]+ bar$" ``` Checks whether the full raw contents of a message matches a regular expression (re2). ### [](#metadata_equals)`metadata_equals` ```yml metadata_equals: example_key: example metadata value ``` Checks a map of metadata keys to values against the metadata stored in the message. If there is a value mismatch between a key of the condition versus the message metadata this condition will fail. ### [](#file_equals)`file_equals` ```yml file_equals: ./foo/bar.txt ``` Checks that the contents of a message matches the contents of a file. The path of the file should be relative to the path of the test file. ### [](#file_json_equals)`file_json_equals` ```yml file_json_equals: ./foo/bar.json ``` Checks that both the message and the file contents are valid JSON documents, and that they are structurally equivalent. Will ignore formatting and ordering differences. The path of the file should be relative to the path of the test file. ### [](#json_equals)`json_equals` ```yml json_equals: { "key": "value" } ``` Checks that both the message and the condition are valid JSON documents, and that they are structurally equivalent. Will ignore formatting and ordering differences. You can also structure the condition content as YAML and it will be converted to the equivalent JSON document for testing: ```yml json_equals: key: value ``` ### [](#json_contains)`json_contains` ```yml json_contains: { "key": "value" } ``` Checks that both the message and the condition are valid JSON documents, and that the message is a superset of the condition. ## [](#running-tests)Running tests Executing tests for a specific config can be done by pointing the subcommand `test` at either the config to be tested or its test definition, e.g. `rpk connect test ./config.yaml` and `rpk connect test ./config_benthos_test.yaml` are equivalent. The `test` subcommand also supports wildcard patterns e.g. `rpk connect test ./foo/*.yaml` will execute all tests within matching files. In order to walk a directory tree and execute all tests found you can use the shortcut `./…​`, e.g. `rpk connect test ./…​` will execute all tests found in the current directory, any child directories, and so on. If you want to allow components to write logs at a provided level to stdout when running the tests, you can use `rpk connect test --log `. Please consult the [logger docs](../../components/logger/about/) for further details. ## [](#mocking-processors)Mocking processors BETA: This feature is currently in a BETA phase, which means breaking changes could be made if a fundamental issue with the feature is found. Sometimes you’ll want to write tests for a series of processors, where one or more of them are networked (or otherwise stateful). Rather than creating and managing mocked services you can define mock versions of those processors in the test definition. For example, if we have a config with the following processors: ```yaml pipeline: processors: - mapping: 'root = "simon says: " + content()' - label: get_foobar_api http: url: http://example.com/foobar verb: GET - mapping: 'root = content().uppercase()' ``` Rather than create a fake service for the `http` processor to interact with we can define a mock in our test definition that replaces it with a [`mapping` processor](../../components/processors/mapping/). Mocks are configured as a map of labels that identify a processor to replace and the config to replace it with: ```yaml tests: - name: mocks the http proc target_processors: '/pipeline/processors' mocks: get_foobar_api: mapping: 'root = content().string() + " this is some mock content"' input_batch: - content: "hello world" output_batches: - - content_equals: "SIMON SAYS: HELLO WORLD THIS IS SOME MOCK CONTENT" ``` With the above test definition the `http` processor will be swapped out for `mapping: 'root = content().string() + " this is some mock content"'`. For the purposes of mocking it is recommended that you use a [`mapping` processor](../../components/processors/mapping/) that simply mutates the message in a way that you would expect the mocked processor to. > 📝 **NOTE** > > It’s not currently possible to mock components that are imported as separate resource files (using `--resource`/`-r`). It is recommended that you mock these by maintaining separate definitions for test purposes (`-r "./test/*.yaml"`). ### [](#more-granular-mocking)More granular mocking It is also possible to target specific fields within the test config by [JSON pointers](https://tools.ietf.org/html/rfc6901) as an alternative to labels. The following test definition would create the same mock as the previous: ```yaml tests: - name: mocks the http proc target_processors: '/pipeline/processors' mocks: /pipeline/processors/1: mapping: 'root = content().string() + " this is some mock content"' input_batch: - content: "hello world" output_batches: - - content_equals: "SIMON SAYS: HELLO WORLD THIS IS SOME MOCK CONTENT" ``` ## [](#fields)Fields The schema of a template file is as follows: ### [](#tests)`tests` A list of one or more unit tests to execute. **Type**: `array` ### [](#tests-name)`tests[].name` The name of the test, this should be unique and give a rough indication of what behavior is being tested. **Type**: `string` ### [](#tests-environment)`tests[].environment` An optional map of environment variables to set for the duration of the test. **Type**: `object` ### [](#tests-target_processors)`tests[].target_processors` A \[JSON Pointer\]\[json-pointer\] that identifies the specific processors which should be executed by the test. The target can either be a single processor or an array of processors. Alternatively a resource label can be used to identify a processor. It is also possible to target processors in a separate file by prefixing the target with a path relative to the test file followed by a # symbol. **Type**: `string` **Default**: `"/pipeline/processors"` ```yml # Examples target_processors: foo_processor target_processors: /pipeline/processors/0 target_processors: target.yaml#/pipeline/processors target_processors: target.yaml#/pipeline/processors ``` ### [](#tests-target_mapping)`tests[].target_mapping` A file path relative to the test definition path of a Bloblang file to execute as an alternative to testing processors with the `target_processors` field. This allows you to define unit tests for Bloblang mappings directly. **Type**: `string` **Default**: `""` ### [](#tests-mocks)`tests[].mocks` An optional map of processors to mock. Keys should contain either a label or a JSON pointer of a processor that should be mocked. Values should contain a processor definition, which will replace the mocked processor. Most of the time you’ll want to use a \[`mapping` processor\]\[processors.mapping\] here, and use it to create a result that emulates the target processor. **Type**: `object` ```yml # Examples mocks: get_foobar_api: mapping: root = content().string() + " this is some mock content" mocks: /pipeline/processors/1: mapping: root = content().string() + " this is some mock content" ``` ### [](#tests-input_batch)`tests[].input_batch` Define a batch of messages to feed into your test, specify either an `input_batch` or a series of `input_batches`. **Type**: `array` ### [](#tests-input_batch-content)`tests[].input_batch[].content` The raw content of the input message. **Type**: `string` ### [](#tests-input_batch-json_content)`tests[].input_batch[].json_content` Sets the raw content of the message to a JSON document matching the structure of the value. **Type**: `object` ```yml # Examples json_content: bar: - element1 - 10 foo: foo value ``` ### [](#tests-input_batch-file_content)`tests[].input_batch[].file_content` Sets the raw content of the message by reading a file. The path of the file should be relative to the path of the test file. **Type**: `string` ```yml # Examples file_content: ./foo/bar.txt ``` ### [](#tests-input_batch-metadata)`tests[].input_batch[].metadata` A map of metadata key/values to add to the input message. **Type**: `object` ### [](#tests-input_batches)`tests[].input_batches` Define a series of batches of messages to feed into your test, specify either an `input_batch` or a series of `input_batches`. **Type**: `two-dimensional array` ### [](#tests-input_batches-content)`tests[].input_batches[][].content` The raw content of the input message. **Type**: `string` ### [](#tests-input_batches-json_content)`tests[].input_batches[][].json_content` Sets the raw content of the message to a JSON document matching the structure of the value. **Type**: `object` ```yml # Examples json_content: bar: - element1 - 10 foo: foo value ``` ### [](#tests-input_batches-file_content)`tests[].input_batches[][].file_content` Sets the raw content of the message by reading a file. The path of the file should be relative to the path of the test file. **Type**: `string` ```yml # Examples file_content: ./foo/bar.txt ``` ### [](#tests-input_batches-metadata)`tests[].input_batches[][].metadata` A map of metadata key/values to add to the input message. **Type**: `object` ### [](#tests-output_batches)`tests[].output_batches` List of output batches. **Type**: `two-dimensional array` ### [](#tests-output_batches-bloblang)`tests[].output_batches[][].bloblang` Executes a Bloblang mapping on the output message, if the result is anything other than a boolean equalling `true` the test fails. **Type**: `string` ```yml # Examples bloblang: this.age > 10 && @foo.length() > 0 ``` ### [](#tests-output_batches-content_equals)`tests[].output_batches[][].content_equals` Checks the full raw contents of a message against a value. **Type**: `string` ### [](#tests-output_batches-content_matches)`tests[].output_batches[][].content_matches` Checks whether the full raw contents of a message matches a regular expression (re2). **Type**: `string` ```yml # Examples content_matches: ^foo [a-z]+ bar$ ``` ### [](#tests-output_batches-metadata_equals)`tests[].output_batches[][].metadata_equals` Checks a map of metadata keys to values against the metadata stored in the message. If there is a value mismatch between a key of the condition versus the message metadata this condition will fail. **Type**: `object` ```yml # Examples metadata_equals: example_key: example metadata value ``` ### [](#tests-output_batches-file_equals)`tests[].output_batches[][].file_equals` Checks that the contents of a message matches the contents of a file. The path of the file should be relative to the path of the test file. **Type**: `string` ```yml # Examples file_equals: ./foo/bar.txt ``` ### [](#tests-output_batches-file_json_equals)`tests[].output_batches[][].file_json_equals` Checks that both the message and the file contents are valid JSON documents, and that they are structurally equivalent. Will ignore formatting and ordering differences. The path of the file should be relative to the path of the test file. **Type**: `string` ```yml # Examples file_json_equals: ./foo/bar.json ``` ### [](#tests-output_batches-json_equals)`tests[].output_batches[][].json_equals` Checks that both the message and the condition are valid JSON documents, and that they are structurally equivalent. Will ignore formatting and ordering differences. **Type**: `object` ```yml # Examples json_equals: key: value ``` ### [](#tests-output_batches-json_contains)`tests[].output_batches[][].json_contains` Checks that both the message and the condition are valid JSON documents, and that the message is a superset of the condition. **Type**: `object` ```yml # Examples json_contains: key: value ``` ### [](#tests-output_batches-file_json_contains)`tests[].output_batches[][].file_json_contains` Checks that both the message and the file contents are valid JSON documents, and that the message is a superset of the condition. Will ignore formatting and ordering differences. The path of the file should be relative to the path of the test file. **Type**: `string` ```yml # Examples file_json_contains: ./foo/bar.json ``` --- # Page 319: Windowed Processing **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/configuration/windowed_processing.md --- # Windowed Processing --- title: Windowed Processing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/configuration/windowed_processing page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/configuration/windowed_processing.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/configuration/windowed_processing.adoc description: Learn how to process periodic windows of messages with Redpanda Connect. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- A window is a batch of messages made with respect to time, with which we are able to perform processing that can analyze or aggregate the messages of the window. This is useful in stream processing as the dataset is never "complete", and therefore in order to perform analysis against a collection of messages we must do so by creating a continuous feed of windows (collections), where our analysis is made against each window. For example, given a stream of messages relating to cars passing through various traffic lights: ```json { "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", "created_at": "2021-08-07T09:49:35Z", "registration_plate": "AB1C DEF", "passengers": 3 } ``` Windowing allows us to produce a stream of messages representing the total traffic for each light every hour: ```json { "traffic_light": "cbf2eafc-806e-4067-9211-97be7e42cee3", "created_at": "2021-08-07T10:00:00Z", "unique_cars": 15, "passengers": 43 } ``` ## [](#creating-windows)Creating windows The first step in processing windows is producing the windows themselves, this can be done by configuring a window producing buffer after your input: ### System A `system_window` buffer creates windows by following the system clock of the running machine. Windows will be created and emitted at predictable times, but this also means windows for historic data will not be emitted and therefore prevents backfills of traffic data: ```yaml input: kafka: addresses: [ TODO ] topics: [ traffic_data ] consumer_group: traffic_consumer checkpoint_limit: 1000 buffer: system_window: timestamp_mapping: root = this.created_at size: 1h allowed_lateness: 3m ``` For more information about this buffer refer to the `system_window` buffer docs. ## [](#grouping)Grouping With a window buffer chosen our stream of messages will be emitted periodically as batches of all messages that fit within each window. Since we want to analyse the window separately for each traffic light we need to expand this single batch out into one for each traffic light identifier within the window. For that purpose we have two processor options: [`group_by`](../../components/processors/group_by/) and [`group_by_value`](../../components/processors/group_by_value/). In our case we want to group by the value of the field `traffic_light` of each message, which we can do with the following: ```yaml pipeline: processors: - group_by_value: value: ${! json("traffic_light") } ``` ## [](#aggregating)Aggregating Once our window has been grouped the next step is to calculate the aggregated passenger and unique cars counts. For this purpose the Redpanda Connect [mapping language Bloblang](../../guides/bloblang/about/) comes in handy as the method [`from_all`](../../guides/bloblang/methods/#from_all) executes the target function against the entire batch and returns an array of the values, allowing us to mutate the result with chained methods such as [`sum`](../../guides/bloblang/methods/#sum): ```yaml pipeline: processors: - group_by_value: value: ${! json("traffic_light") } - mapping: | let is_first_message = batch_index() == 0 root.traffic_light = this.traffic_light root.created_at = @window_end_timestamp root.total_cars = if $is_first_message { json("registration_plate").from_all().unique().length() } root.passengers = if $is_first_message { json("passengers").from_all().sum() } # Only keep the first batch message containing the aggregated results. root = if ! $is_first_message { deleted() } ``` [Bloblang](../../guides/bloblang/about/) is very powerful, and by using [`from`](../../guides/bloblang/methods/#from) and [`from_all`](../../guides/bloblang/methods/#from_all) it’s possible to perform a wide range of batch-wide processing. If you fancy a challenge try updating the above mapping to only count passengers from the first journey of each registration plate in the window (hint: the [`fold` method](../../guides/bloblang/methods/#fold) might come in handy). --- # Page 320: Redpanda Connect Quickstart **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/connect-quickstart.md --- # Redpanda Connect Quickstart --- title: Redpanda Connect Quickstart latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/connect-quickstart page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/connect-quickstart.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/connect-quickstart.adoc description: Learn how to quickly start building data pipelines with Redpanda Connect. page-topic-type: tutorial personas: streaming_developer learning-objective-1: Build a producer pipeline that generates and publishes data to a topic learning-objective-2: Build a consumer pipeline that reads, transforms, and logs data from a topic page-git-created-date: "2024-09-09" page-git-modified-date: "2026-04-08" --- In this quickstart, you build data pipelines to generate, transform, and handle streaming data end-to-end. You create two pipelines: one that generates dad jokes and writes them to a topic in your cluster, and another that reads those jokes and gives each one a random "cringe rating". After completing this quickstart, you will be able to: - Build a producer pipeline that generates and publishes data to a topic - Build a consumer pipeline that reads, transforms, and logs data from a topic ## [](#prerequisites)Prerequisites You must have a Redpanda Cloud account with a Serverless, Dedicated, or standard BYOC cluster. If you don’t already have an account, [sign up for a free trial](https://redpanda.com/try-redpanda/cloud-trial). > 📝 **NOTE** > > Clusters can create up to 100 pipelines. For additional pipelines, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#quickstart-pipelines)Quickstart pipelines This quickstart creates the following pipelines: - The first pipeline produces dad jokes and writes them to a topic in your cluster. - The second pipeline consumes those dad jokes and gives each one a random "cringe rating" from 1-10. The **producer pipeline** uses the following Redpanda Connect components: | Component type | Component | Purpose | | --- | --- | --- | | Input | generate | Creates jokes | | Output | redpanda | Writes messages to your topic | | Processor | log | Logs generated messages | | Processor | catch | Catches errors | The **consumer pipeline** uses the following Redpanda Connect components: | Component type | Component | Purpose | | --- | --- | --- | | Input | redpanda | Reads messages from your topic | | Output | drop | Drops the processed messages | | Processor | bloblang | Processes ratings | | Processor | log | Logs processed messages | | Processor | catch | Catches errors | > 💡 **TIP** > > The pipeline editor provides an IDE-like experience for creating pipelines. After a component has been added, you can click the leaf icon in the left sidebar to open its documentation. ![Redpanda Connect user interface](../../../shared/_images/connect_ui.png) ## [](#build-a-producer-pipeline)Build a producer pipeline Every pipeline requires an input and an output in a configuration file. You can select components in the left sidebar and customize the YAML in the editor. To create the producer pipeline: 1. Go to the **Connect** page for your cluster and click **Create a pipeline**. 2. Enter this name for the pipeline: `joke-generator-producer`. 3. In the left sidebar, click **Add input +** and search for and select the `generate` input connector. The YAML for this connector appears in the editor. 4. Click **Add output +** and search for and select the `redpanda` output connector. The YAML for this connector also appears in the editor. 5. The `redpanda` connector requires a Redpanda topic and user: 1. In the `redpanda` output connector, click **Topic +** to create a new topic. Toggle to **New** and enter `dad-jokes` for the topic name. Click **Add**. 2. In the `redpanda` output connector, click **User +** to create a new user. Toggle to **New** and enter `connect` for the username. Click **Add**. 6. Replace the generated YAML in the editor with the following. This configuration includes the `log` and `catch` processors and the `mapping` for joke generation. Bloblang is Redpanda Connect’s scripting language used to add logic. ```yaml input: generate: interval: 5s count: 0 mapping: | let jokes = [ "Why don't scientists trust atoms? Because they make up everything!", "I'm reading a book about anti-gravity. It's impossible to put down!", "Why did the scarecrow win an award? He was outstanding in his field!", "What do you call a fake noodle? An impasta!", "Why don't eggs tell jokes? They'd crack each other up!", "I used to play piano by ear, but now I use my hands.", "What do you call a bear with no teeth? A gummy bear!", "Why did the bicycle fall over? It was two tired!", "What do you call a fish wearing a crown? A king fish!", "Why don't skeletons fight each other? They don't have the guts!", "What do you call cheese that isn't yours? Nacho cheese!", "Why can't you hear a pterodactyl using the bathroom? Because the 'p' is silent!", "What did the ocean say to the beach? Nothing, it just waved!", "Why did the math book look sad? It had too many problems!", "What do you call a sleeping bull? A bulldozer!", "How do you organize a space party? You planet!", "What's orange and sounds like a parrot? A carrot!", "Why did the coffee file a police report? It got mugged!", "What do you call a can opener that doesn't work? A can't opener!", "Why don't oysters donate to charity? Because they're shellfish!" ] let joke_index = random_int() % $jokes.length() root.joke = $jokes.index($joke_index) root.id = uuid_v4() root.timestamp = now() root.source = "dad-joke-generator" root.joke_length = root.joke.length() pipeline: processors: - log: level: INFO message: "📝 Generating joke: ${! json(\"joke\") }" - catch: - log: level: ERROR message: "❌ Error generating joke: ${! error() }" output: redpanda: seed_brokers: # Optional - ${REDPANDA_BROKERS} tls: enabled: true # Optional (default: false) client_certs: [] sasl: - mechanism: SCRAM-SHA-256 username: ${secrets.KAFKA_USER_CONNECT} password: ${secrets.KAFKA_PASSWORD_CONNECT} topic: dad-jokes # Optional ``` > 📝 **NOTE** > > - Notice the `${REDPANDA_BROKERS}` [contextual variable](../configuration/contextual-variables/) in the configuration. This references your cluster’s bootstrap server address, so you can use it in any pipeline without hardcoding connection details. Use the slash command menu in the YAML editor or use the command palette to insert the Redpanda broker’s contextual variable. > > - Notice `${secrets.KAFKA_USER_CONNECT}` and `${secrets.KAFKA_PASSWORD_CONNECT}`. These reference secrets that you can create using the slash command menu in the YAML editor or on the **Security** page. > > - The Brave browser does not fully support code snippets. 7. Click **Save**. Your pipeline details display, and after a few seconds, the pipeline starts running. The pipeline generates jokes and writes the jokes to your Redpanda topic. ### [](#review-the-pipeline-logs)Review the pipeline logs The page loads new log messages as they come in. When Live mode is disabled, you can filter logs, for example, by level, message content, or path. The log shows activity from the past five hours. Click through the log messages to see the startup sequence. For example, you’ll see when the output becomes active: ```json { "instance_id": "d73c39bp7l8c73d7lll0", "label": "", "level": "INFO", "message": "Output type redpanda is now active", "path": "root.output", "pipeline_id": "d73a55ptub9s73agpthg", "time": "2026-03-27T17:43:02.36416142Z" } ``` ### [](#view-the-processed-messages)View the processed messages 1. Go to the **Topics** page for your cluster and select the `dad-jokes` topic. 2. Click any message to see the structure. For example: ```json { "id": "d242c355-4cee-4382-817a-190c7a115a19", "joke": "I used to play piano by ear, but now I use my hands.", "joke_length": 52, "source": "dad-joke-generator", "timestamp": "2026-03-27T15:30:38.963227997Z" } ``` ## [](#build-a-consumer-pipeline)Build a consumer pipeline This next pipeline rates the jokes that you generated in the first pipeline. To create the consumer pipeline: 1. Go back to the **Connect** page for your cluster, and click **Create a pipeline**. 2. Enter this name for the pipeline: `joke-generator-consumer`. 3. In the left sidebar, click **Add input +**, and search for and select the `redpanda` input connector. 4. The `redpanda` connector requires a Redpanda topic and user: 1. In the `redpanda` input connector, click **Topic +** and select the existing topic `dad-jokes`. Click **Add**. 2. In the `redpanda` input connector, click **User +** and select the existing user `connect`. For consumer group, enter `dad-joke-raters`. This allows the user `connect` to be granted READ and DESCRIBE permissions for the `dad-joke-raters` consumer group. Click **Add**. 5. Click **Add output +**, and search for and select the `drop` output connector. (For testing purposes, this output drops messages instead of forwarding them. In a real scenario you would replace the `drop` connector with your real destination.) 6. Replace the generated YAML in the editor with the following configuration, which includes the `bloblang`, `log`, and `catch` processors. > 📝 **NOTE** > > This example explicitly includes several optional configuration fields for the `redpanda` input. They’re shown here for demonstration purposes, so you can see a range of available settings. ```yaml input: redpanda: seed_brokers: # Optional - ${REDPANDA_BROKERS} client_id: benthos # Optional (default: "benthos") tls: enabled: true # Optional (default: false) client_certs: [] sasl: - mechanism: SCRAM-SHA-256 username: ${secrets.KAFKA_USER_CONNECT} password: ${secrets.KAFKA_PASSWORD_CONNECT} metadata_max_age: 5m # Optional (default: "5m") request_timeout_overhead: 10s # Optional (default: "10s") conn_idle_timeout: 20s # Optional (default: "20s") topics: # Required (mutually exclusive with regexp_topics) - dad-jokes regexp_topics: false # Optional (default: false). Mutually exclusive with topics. rebalance_timeout: 45s # Optional (default: "45s") session_timeout: 1m # Optional (default: "1m") heartbeat_interval: 3s # Optional (default: "3s") start_from_oldest: true # Optional (default: true) start_offset: earliest # Optional (default: "earliest") fetch_max_bytes: 50MiB # Optional (default: "50MiB") fetch_max_wait: 5s # Optional (default: "5s") fetch_min_bytes: 1B # Optional (default: "1B") fetch_max_partition_bytes: 1MiB # Optional (default: "1MiB") transaction_isolation_level: read_uncommitted # Optional (default: "read_uncommitted") consumer_group: dad-joke-raters # Optional commit_period: 5s # Optional (default: "5s") partition_buffer_bytes: 1MB # Optional (default: "1MB") topic_lag_refresh_period: 5s # Optional (default: "5s") max_yield_batch_bytes: 32KB # Optional (default: "32KB") auto_replay_nacks: true # Optional (default: true) pipeline: processors: - bloblang: | root = this let rating = random_int(min: 1, max: 11) root.cringe_rating = $rating root.cringe_level = if $rating <= 3 { "Mild - Almost acceptable" } else if $rating <= 6 { "Medium - Classic dad joke territory" } else if $rating <= 8 { "High - Eye-roll inducing" } else { "EXTREME - Peak dad joke achievement" } root.processed_at = now() root.rating_emoji = match { $rating <= 3 => "😐", $rating <= 6 => "😬", $rating <= 8 => "🤦", _ => "💀" } let age_seconds = (timestamp_unix() - this.timestamp.ts_parse("2006-01-02T15:04:05Z07:00").ts_unix()) root.age_seconds = $age_seconds - log: level: INFO message: | 🎭 JOKE RATED! ${! json("rating_emoji") } Joke: "${! json("joke") }" Cringe Rating: ${! json("cringe_rating") }/10 - ${! json("cringe_level") } Age: ${! json("age_seconds") } seconds old Processed at: ${! json("processed_at") } - catch: - log: level: ERROR message: "❌ Failed to process joke: ${! error() }" output: drop: {} ``` 7. Click **Save** to start your pipeline. 8. Your pipeline details display, and after a few seconds, the pipeline starts running. Check the logs to see a rated joke. For example: ```json { "custom_source": "true", "instance_id": "d454dkn4u2is73ava480", "label": "", "level": "INFO", "message": "🎭 JOKE RATED! 💀\nJoke: \"I used to play piano by ear, but now I use my hands.\"\nCringe Rating: 9/10 - EXTREME - Peak dad joke achievement\nAge: 659 seconds old\nProcessed at: 2026-03-27T17:54:13.340229297Z\n", "path": "root.pipeline.processors.1", "pipeline_id": "d454djahlips73dmcll0", "time": "2026-03-27T17:54:13.341137527Z" } ``` ## [](#clean-up)Clean up When you’ve finished experimenting with your data pipeline, you can delete the pipelines and the topic you created for this quickstart. 1. On the **Connect** page, click the **…​** icon next to the `joke-generator-producer` pipeline and select **Delete**. Repeat for the `joke-generator-consumer` pipeline. 2. Confirm your deletion to remove the pipelines and associated logs. 3. On the **Topics** page, delete the `dad-jokes` topic. ## [](#next-steps)Next steps - Try one of the [Redpanda Connect cookbooks](../cookbooks/). - Choose [connectors for your use case](../components/about/). - [Add secrets to your pipeline](../configuration/secret-management/). - [Monitor a data pipeline on a BYOC or Dedicated cluster](../configuration/monitor-connect/). - [Manually scale resources for a pipeline](../configuration/resource-management/). - [Configure, test, and run a data pipeline locally](../../../../redpanda-connect/get-started/quickstarts/rpk/). --- # Page 321: Cookbooks **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks.md --- # Cookbooks --- title: Cookbooks latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/index.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- - [DynamoDB CDC Patterns](dynamodb_cdc/) Learn how to capture, filter, transform, and route DynamoDB change data capture (CDC) events with Redpanda Connect. - [Enrichment Workflows](enrichments/) How to configure Redpanda Connect to process a workflow of enrichment services. - [Filtering and Sampling](filtering/) Configure Redpanda Connect to conditionally drop messages. - [Ingest data into Snowflake](snowflake_ingestion/) Configure Redpanda Connect to ingest data from a Redpanda topic into Snowflake using Snowpipe Streaming. - [Joining Streams](joining_streams/) How to hydrate documents by joining multiple streams. - [Redpanda Migrator](redpanda_migrator/) Move your workloads from any Kafka system to Redpanda Cloud using a single command. - [Retrieval-Augmented Generation (RAG)](rag/) How to configure Redpanda Connect to create a RAG pipeline, using PostgreSQL and PGVector. - [Work with Jira Issues](jira/) Learn how to query, filter, and create Jira issues using Redpanda Connect pipelines. --- # Page 322: DynamoDB CDC Patterns **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/dynamodb_cdc.md --- # DynamoDB CDC Patterns --- title: DynamoDB CDC Patterns latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/dynamodb_cdc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/dynamodb_cdc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/dynamodb_cdc.adoc description: Learn how to capture, filter, transform, and route DynamoDB change data capture (CDC) events with Redpanda Connect. page-topic-type: cookbook personas: streaming_developer, data_engineer learning-objective-1: Find reusable patterns for capturing DynamoDB CDC events learning-objective-2: Look up integration patterns for routing CDC data to Redpanda and S3 learning-objective-3: Identify patterns for filtering and transforming change events page-git-created-date: "2026-03-04" page-git-modified-date: "2026-03-04" --- The DynamoDB CDC input enables capturing item-level changes from DynamoDB tables with streams enabled. This cookbook provides reusable patterns for filtering, transforming, and routing DynamoDB CDC events to Redpanda, S3, and other destinations. Use this cookbook to: - Find reusable patterns for capturing DynamoDB CDC events - Look up integration patterns for routing CDC data to Redpanda and S3 - Identify patterns for filtering and transforming change events ## [](#prerequisites)Prerequisites Before using these patterns, ensure you have the following configured: ### [](#redpanda-cli)Redpanda CLI Install the Redpanda CLI (`rpk`) to run Redpanda Connect. See [Install or Update rpk](../../../../manage/rpk/rpk-install/) for installation instructions. ### [](#dynamodb-streams)DynamoDB Streams The source DynamoDB table must have [DynamoDB Streams](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) enabled with an appropriate view type: - `KEYS_ONLY`: Only the key attributes of the modified item - `NEW_IMAGE`: The entire item as it appears after the modification - `OLD_IMAGE`: The entire item as it appeared before the modification - `NEW_AND_OLD_IMAGES`: Both the new and old item images (recommended for detecting changes) To enable streams on an existing table using the AWS CLI: ```bash aws dynamodb update-table \ --table-name my-table \ --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES ``` ### [](#environment-variables)Environment variables The examples in this cookbook use environment variables for AWS configuration. This allows you to keep credentials secure and separate from your pipeline configuration files. ```bash export DYNAMODB_TABLE=my-table (1) export AWS_REGION=us-east-1 (2) export REDPANDA_BROKERS=localhost:9092 (3) export S3_BUCKET=my-cdc-bucket (4) ``` | 1 | The name of the DynamoDB table with streams enabled. | | --- | --- | | 2 | The AWS region where your DynamoDB table is located. | | 3 | The Redpanda broker addresses (for Redpanda output examples). | | 4 | The S3 bucket name (for S3 output examples). | Redpanda Connect loads AWS credentials from the standard [credential chain](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) (environment variables, `~/.aws/credentials`, or IAM roles). ## [](#capture-cdc-events)Capture CDC events The simplest pattern captures all change events from a DynamoDB table and outputs them with metadata: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} checkpoint_table: redpanda_dynamodb_checkpoints start_from: trim_horizon pipeline: processors: # Extract the change event details - mapping: | root.event_type = this.eventName root.table = this.tableName root.event_id = this.eventID root.keys = this.dynamodb.keys root.new_image = this.dynamodb.newImage root.old_image = this.dynamodb.oldImage root.sequence_number = this.dynamodb.sequenceNumber root.timestamp = now() output: stdout: codec: lines ``` For details on the CDC event message structure and available fields for Bloblang mappings, see the [message structure](../../components/inputs/aws_dynamodb_cdc/#_message_structure) section in the connector reference. ## [](#filter-cdc-events)Filter CDC events You can filter events to process only specific change types: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} start_from: latest pipeline: processors: # Filter to only process INSERT and MODIFY events (ignore REMOVE) - mapping: | root = if this.eventName == "REMOVE" { deleted() } else { this } # Transform to a simplified format - mapping: | root.event_type = this.eventName root.keys = this.dynamodb.keys root.new_data = this.dynamodb.newImage root.old_data = this.dynamodb.oldImage output: stdout: codec: lines ``` This example: - Filters out `REMOVE` events using `deleted()` - Transforms the event to a simplified format ## [](#route-to-redpanda)Route to Redpanda Stream DynamoDB changes to Redpanda for real-time processing: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} checkpoint_table: redpanda_dynamodb_checkpoints batch_size: 100 poll_interval: 500ms pipeline: processors: # Transform to a Kafka-friendly format with a composite key - mapping: | let keys = this.dynamodb.keys meta kafka_key = [$keys.pk, $keys.sk].filter(v -> v != null).join("#") root.event_type = this.eventName root.table = this.tableName root.timestamp = now() root.keys = this.dynamodb.keys root.new_image = this.dynamodb.newImage root.old_image = this.dynamodb.oldImage output: redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: dynamodb-cdc-events key: ${! @kafka_key } partitioner: murmur2_hash compression: snappy batching: count: 100 period: 1s ``` This example: - Creates a composite message key from the DynamoDB primary key - Transforms the DynamoDB format to plain JSON - Batches messages for efficient delivery ## [](#route-to-s3)Route to S3 Archive CDC events to S3 for long-term storage and analytics: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} checkpoint_table: redpanda_dynamodb_checkpoints start_from: trim_horizon pipeline: processors: # Add partitioning metadata for S3 organization - mapping: | let event_time = now() meta s3_path = "year=%s/month=%s/day=%s/hour=%s".format( $event_time.ts_format("2006"), $event_time.ts_format("01"), $event_time.ts_format("02"), $event_time.ts_format("15") ) root.event_type = this.eventName root.table = this.tableName root.sequence_number = this.dynamodb.sequenceNumber root.event_time = $event_time root.keys = this.dynamodb.keys root.new_image = this.dynamodb.newImage root.old_image = this.dynamodb.oldImage output: aws_s3: bucket: ${S3_BUCKET} path: dynamodb-cdc/${DYNAMODB_TABLE}/${! @s3_path }/${! uuid_v4() }.json region: ${AWS_REGION} batching: count: 1000 period: 1m processors: - archive: format: lines ``` This example: - Organizes files by time-based partitions (year/month/day/hour) - Batches events and archives them as newline-delimited JSON - Uses UUID file names to prevent collisions ## [](#route-by-event-type)Route by event type Route different event types to different destinations: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} pipeline: processors: # Transform to a common format - mapping: | root.event_type = this.eventName root.table = this.tableName root.timestamp = now() root.keys = this.dynamodb.keys root.data = if this.dynamodb.exists("newImage") { this.dynamodb.newImage } else { this.dynamodb.oldImage } output: switch: cases: # Route INSERT events to a topic for new records - check: this.event_type == "INSERT" output: redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: dynamodb-inserts # Route MODIFY events to a topic for updates - check: this.event_type == "MODIFY" output: redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: dynamodb-updates # Route REMOVE events to a topic for deletes - check: this.event_type == "REMOVE" output: redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: dynamodb-deletes # Fallback for any unexpected event types - output: drop: {} ``` This pattern: - Separates processing pipelines for inserts, updates, and deletes - Applies different retention policies per event type - Enables specialized downstream consumers ## [](#detect-changed-fields)Detect changed fields Compare old and new images to identify which fields changed: ```yaml input: aws_dynamodb_cdc: tables: ["${DYNAMODB_TABLE}"] region: ${AWS_REGION} pipeline: processors: # Only process MODIFY events - mapping: | root = if this.eventName != "MODIFY" { deleted() } else { this } # Compare old and new images to find changed fields - mapping: | let old_data = this.dynamodb.oldImage let new_data = this.dynamodb.newImage root.table = this.tableName root.keys = this.dynamodb.keys root.timestamp = now() # Find fields that changed by comparing key-value pairs root.changes = $new_data.key_values().filter(kv -> !$old_data.exists(kv.key) || $old_data.get(kv.key) != kv.value).map_each(kv -> {"field": kv.key, "old_value": if $old_data.exists(kv.key) { $old_data.get(kv.key) } else { null }, "new_value": kv.value}) # Find fields that were removed root.removed_fields = $old_data.keys().filter(k -> !$new_data.exists(k)) output: stdout: codec: lines ``` This pattern: - Filters to only MODIFY events - Compares old and new images to find differences - Outputs a list of changed fields with their old and new values > 📝 **NOTE** > > This pattern requires the `NEW_AND_OLD_IMAGES` stream view type. The `.key_values()` method converts an object to an array of key-value pairs that can be filtered and mapped. ## [](#checkpointing)Checkpointing The DynamoDB CDC input automatically manages checkpoints in a separate DynamoDB table: ```yaml input: aws_dynamodb_cdc: table: my-table checkpoint_table: my-app-checkpoints (1) checkpoint_limit: 500 (2) start_from: trim_horizon (3) ``` | 1 | Custom checkpoint table name (default: redpanda_dynamodb_checkpoints). | | --- | --- | | 2 | Checkpoint after every 500 messages (lower = better recovery, higher = fewer writes). | | 3 | Start from the oldest available record when no checkpoint exists. | If a checkpoint table doesn’t exist, it’s created automatically with the required schema. ## [](#performance-tuning)Performance tuning Optimize throughput and latency with these settings: ```yaml input: aws_dynamodb_cdc: table: my-table batch_size: 1000 (1) poll_interval: 100ms (2) max_tracked_shards: 10000 (3) throttle_backoff: 50ms (4) ``` | 1 | Maximum records per shard per request (1-1000). | | --- | --- | | 2 | Time between polls when no records are available. | | 3 | Maximum shards to track (for very large tables). | | 4 | Backpressure delay when too many messages are in-flight. | ### [](#throughput-considerations)Throughput considerations - DynamoDB Streams allows 5 `GetRecords` calls per second per shard - Higher `batch_size` improves throughput but increases memory usage - Shorter `poll_interval` reduces latency but increases API calls ## [](#troubleshoot)Troubleshoot ### [](#no-events-received)No events received If you’re not receiving events: 1. Verify streams are enabled on the table: ```bash aws dynamodb describe-table --table-name my-table \ --query 'Table.StreamSpecification' ``` 2. Check that changes are being made to the table 3. Verify `start_from` is set to `trim_horizon` to capture existing stream data ### [](#duplicate-events)Duplicate events Each stream record appears exactly once in DynamoDB Streams. However, if your pipeline fails before checkpointing, records may be re-read on restart, resulting in at-least-once processing semantics. To handle potential duplicates: - Use idempotent processing in downstream systems - Deduplicate using the `dynamodb_sequence_number` metadata - Lower `checkpoint_limit` to reduce the window of possible duplicates ### [](#stream-retention)Stream retention DynamoDB Streams retains data for 24 hours. If your pipeline is offline longer than that: - Consider using [Kinesis Data Streams for DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/kds.html) with the [`aws_kinesis` input](../../components/inputs/aws_kinesis/) instead (up to 1 year retention) - Implement a full-table scan fallback for disaster recovery ## [](#suggested-reading)Suggested reading - [DynamoDB CDC Input Reference](../../components/inputs/aws_dynamodb_cdc/) - [AWS Configuration Guide](../../guides/cloud/aws/) - [Kinesis Input](../../components/inputs/aws_kinesis/) (for Kinesis Data Streams for DynamoDB) - [DynamoDB Streams Documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) --- # Page 323: Enrichment Workflows **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/enrichments.md --- # Enrichment Workflows --- title: Enrichment Workflows latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/enrichments page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/enrichments.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/enrichments.adoc description: How to configure Redpanda Connect to process a workflow of enrichment services. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- This cookbook demonstrates how to enrich a stream of JSON documents with HTTP services. This method also works with [AWS Lambda functions](../../components/processors/aws_lambda/). We will start off by configuring a single enrichment, then we will move onto a workflow of enrichments with a network of dependencies using the [`workflow` processor](../../components/processors/workflow/). Each enrichment will be performed in parallel across a [pre-batched](../../configuration/batching/) stream of documents. Workflow enrichments that do not depend on each other will also be performed in parallel, making this orchestration method very efficient. The imaginary problem we are going to solve is applying a set of NLP based enrichments to a feed of articles in order to detect fake news. We will be consuming and writing to Kafka, but the example works with any [input](../../components/inputs/about/) and [output](../../components/outputs/about/) combination. Articles are received over the topic `articles` and look like this: ```json { "type": "article", "article": { "id": "123foo", "title": "Dogs Stop Barking", "content": "The world was shocked this morning to find that all dogs have stopped barking." } } ``` ## [](#meet-the-enrichments)Meet the enrichments ### [](#claims-detector)Claims detector To start us off we will configure a single enrichment, which is an imaginary 'claims detector' service. This is an HTTP service that wraps a trained machine learning model to extract claims that are made within a body of text. The service expects a `POST` request with JSON payload of the form: ```json { "text": "The world was shocked this morning to find that all dogs have stopped barking." } ``` And returns a JSON payload of the form: ```json { "claims": [ { "entity": "world", "claim": "shocked" }, { "entity": "dogs", "claim": "NOT barking" } ] } ``` Since each request only applies to a single document we will make this enrichment scale by deploying multiple HTTP services and hitting those instances in parallel across our document batches. In order to send a mapped request and map the response back into the original document we will use the [`branch` processor](../../components/processors/branch/), with a child `http` processor. ```yaml input: kafka: addresses: [ TODO ] topics: [ articles ] consumer_group: benthos_articles_group batching: count: 20 # Tune this to set the size of our document batches. period: 1s pipeline: processors: - branch: request_map: 'root.text = this.article.content' processors: - http: url: http://localhost:4197/claims verb: POST result_map: 'root.tmp.claims = this.claims' output: kafka: addresses: [ TODO ] topic: comments_hydrated ``` With this pipeline our documents will come out looking something like this: ```json { "type": "article", "article": { "id": "123foo", "title": "Dogs Stop Barking", "content": "The world was shocked this morning to find that all dogs have stopped barking." }, "tmp": { "claims": [ { "entity": "world", "claim": "shocked" }, { "entity": "dogs", "claim": "NOT barking" } ] } } ``` ### [](#hyperbole-detector)Hyperbole detector Next up is a 'hyperbole detector' that takes a `POST` request containing the article contents and returns a hyperbole score between 0 and 1. This time the format is array-based and therefore supports calculating multiple documents in a single request, making better use of the host machines GPU. A request should take the following form: ```json [ { "text": "The world was shocked this morning to find that all dogs have stopped barking." } ] ``` And the response looks like this: ```json [ { "hyperbole_rank": 0.73 } ] ``` In order to create a single request from a batch of documents, and subsequently map the result back into our batch, we will use the [`archive`](../../components/processors/archive/) and [`unarchive`](../../components/processors/unarchive/) processors in our [`branch`](../../components/processors/branch/) flow, like this: ```yaml pipeline: processors: - branch: request_map: 'root.text = this.article.content' processors: - archive: format: json_array - http: url: http://localhost:4198/hyperbole verb: POST - unarchive: format: json_array result_map: 'root.tmp.hyperbole_rank = this.hyperbole_rank' ``` The purpose of the `json_array` format `archive` processor is to take a batch of JSON documents and place them into a single document as an array. Subsequently, we then send one single request for each batch. After the request is made we do the opposite with the `unarchive` processor in order to convert it back into a batch of the original size. ### [](#fake-news-detector)Fake news detector Finally, we are going to use a 'fake news detector' that takes the article contents as well as the output of the previous two enrichments and calculates a fake news rank between 0 and 1. This service behaves similarly to the claims detector service and takes a document of the form: ```json { "text": "The world was shocked this morning to find that all dogs have stopped barking.", "hyperbole_rank": 0.73, "claims": [ { "entity": "world", "claim": "shocked" }, { "entity": "dogs", "claim": "NOT barking" } ] } ``` And returns an object of the form: ```json { "fake_news_rank": 0.893 } ``` We then wish to map the field `fake_news_rank` from that result into the original document at the path `article.fake_news_score`. Our [`branch`](../../components/processors/branch/) block for this enrichment would look like this: ```yaml pipeline: processors: - branch: request_map: | root.text = this.article.content root.claims = this.tmp.claims root.hyperbole_rank = this.tmp.hyperbole_rank processors: - http: url: http://localhost:4199/fakenews verb: POST result_map: 'root.article.fake_news_score = this.fake_news_rank' ``` Note that in our `request_map` we are targeting fields that are populated from the previous two enrichments. If we were to execute all three enrichments in a sequence we’ll end up with a document looking like this: ```json { "type": "article", "article": { "id": "123foo", "title": "Dogs Stop Barking", "content": "The world was shocked this morning to find that all dogs have stopped barking.", "fake_news_score": 0.76 }, "tmp": { "hyperbole_rank": 0.34, "claims": [ { "entity": "world", "claim": "shocked" }, { "entity": "dogs", "claim": "NOT barking" } ] } } ``` Great! However, as a streaming pipeline this set up isn’t ideal as our first two enrichments are independent and could potentially be executed in parallel in order to reduce processing latency. ## [](#combining-into-a-workflow)Combining into a workflow If we configure our enrichments within a [`workflow` processor](../../components/processors/workflow/) we can use Redpanda Connect to automatically detect our dependency graph, giving us two key benefits: 1. Enrichments at the same level of a dependency graph (claims and hyperbole) will be executed in parallel. 2. When introducing more enrichments to our pipeline the added complexity of resolving the dependency graph is handled automatically by Redpanda Connect. Placing our branches within a [`workflow` processor](../../components/processors/workflow/) makes our final pipeline configuration look like this: ```yaml input: kafka: addresses: [ TODO ] topics: [ articles ] consumer_group: benthos_articles_group batching: count: 20 # Tune this to set the size of our document batches. period: 1s pipeline: processors: - workflow: meta_path: '' # Don't bother storing branch metadata. branches: claims: request_map: 'root.text = this.article.content' processors: - http: url: http://localhost:4197/claims verb: POST result_map: 'root.tmp.claims = this.claims' hyperbole: request_map: 'root.text = this.article.content' processors: - archive: format: json_array - http: url: http://localhost:4198/hyperbole verb: POST - unarchive: format: json_array result_map: 'root.tmp.hyperbole_rank = this.hyperbole_rank' fake_news: request_map: | root.text = this.article.content root.claims = this.tmp.claims root.hyperbole_rank = this.tmp.hyperbole_rank processors: - http: url: http://localhost:4199/fakenews verb: POST result_map: 'root.article.fake_news_score = this.fake_news_rank' - catch: - log: fields_mapping: 'root.content = content().string()' message: "Enrichments failed due to: ${!error()}" - mapping: | root = this root.tmp = deleted() output: kafka: addresses: [ TODO ] topic: comments_hydrated ``` Since the contents of `tmp` won’t be required downstream we remove it after our enrichments using a [`mapping` processor](../../components/processors/mapping/). A [`catch`](../../components/processors/catch/) processor was added at the end of the pipeline which catches documents that failed enrichment. You can replace the log event with a wide range of recovery actions such as sending to a dead-letter/retry queue, dropping the message entirely, etc. You can read more about error handling [in this article](../../configuration/error_handling/). --- # Page 324: Filtering and Sampling **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/filtering.md --- # Filtering and Sampling --- title: Filtering and Sampling latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/filtering page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/filtering.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/filtering.adoc description: Configure Redpanda Connect to conditionally drop messages. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Filtering events in Redpanda Connect is both easy and flexible, this cookbook demonstrates a few different types of filtering you can do. All of these examples make use of the [`mapping` processor](../../components/processors/mapping/) but shouldn’t require any prior knowledge. ## [](#the-basic-filter)The basic filter Dropping events with [Bloblang](../../guides/bloblang/about/) is done by mapping the function `deleted()` to the `root` of the mapped document. To remove all events indiscriminately you can simply do: ```yaml pipeline: processors: - mapping: root = deleted() ``` But that’s most likely not what you want. We can instead only delete an event under certain conditions with a [`match`](../../guides/bloblang/about/#pattern-matching) or [`if`](../../guides/bloblang/about/#conditional-mapping) expression: ```yaml pipeline: processors: - mapping: | root = if @topic.or("") == "foo" || this.doc.type == "bar" || this.doc.urls.contains("https://www.benthos.dev/").catch(false) { deleted() } ``` The above config removes any events where: - The metadata field `topic` is equal to `foo` - The event field `doc.type` (a string) is equal to `bar` - The event field `doc.urls` (an array) contains the string `https://www.benthos.dev/` Events that do not match any of these conditions will remain unchanged. ## [](#sample-events)Sample events Another type of filter we might want is a sampling filter, we can do that with a random number generator: ```yaml pipeline: processors: - mapping: | # Drop 50% of documents randomly root = if random_int() % 2 == 0 { deleted() } ``` We can also do this in a deterministic way by hashing events and filtering by that hash value: ```yaml pipeline: processors: - mapping: | # Drop ~10% of documents deterministically (same docs filtered each run) root = if content().hash("xxhash64").slice(-8).number() % 10 == 0 { deleted() } ``` --- # Page 325: Work with Jira Issues **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/jira.md --- # Work with Jira Issues --- title: Work with Jira Issues latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/jira page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/jira.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/jira.adoc description: Learn how to query, filter, and create Jira issues using Redpanda Connect pipelines. page-topic-type: cookbook personas: streaming_developer, data_engineer learning-objective-1: Query Jira issues using JQL patterns with the Jira processor learning-objective-2: Combine generate input with Jira processor for scheduled queries learning-objective-3: Create Jira issues using the HTTP processor and REST API page-git-created-date: "2026-02-18" page-git-modified-date: "2026-02-18" --- The Jira processor enables querying Jira issues using JQL (Jira Query Language) and returning structured data. It’s a processor, so you can use it in pipelines for input-style flows (pair with `generate`) or output-style flows (pair with `drop`). Use this cookbook to: - Query Jira issues on a schedule or on-demand - Filter issues using JQL patterns - Create Jira issues using the HTTP processor ## [](#prerequisites)Prerequisites The examples in this cookbook use the Secrets Store for Jira credentials. This keeps sensitive credentials secure and separate from your pipeline configuration. 1. [Generate a Jira API token](https://id.atlassian.com/manage-profile/security/api-tokens). 2. Add your Jira credentials to the [Secrets Store](../../configuration/secret-management/): - `JIRA_BASE_URL`: Your Jira instance URL (for example, `https://your-domain.atlassian.net`) - `JIRA_USERNAME`: Your Jira account email address - `JIRA_API_TOKEN`: The API token generated from your Atlassian account - `JIRA_AUTH_TOKEN` (optional, for creating issues): Base64-encoded `username:api_token` string ## [](#use-jira-as-an-input)Use Jira as an input To use Jira as an input, combine the `generate` input with the Jira processor. This pattern triggers Jira queries at regular intervals or on-demand. > 💡 **TIP** > > Replace `MYPROJECT` in the examples with your actual Jira project key. ### [](#query-jira-periodically)Query Jira periodically This example queries Jira every 30 seconds for recent issues: ```yaml input: generate: interval: 30s mapping: | root.jql = "project = MYPROJECT AND updated >= -1h ORDER BY updated DESC" root.maxResults = 50 root.fields = ["key", "summary", "status", "assignee", "priority"] pipeline: processors: - jira: base_url: "${secrets.JIRA_BASE_URL}" username: "${secrets.JIRA_USERNAME}" api_token: "${secrets.JIRA_API_TOKEN}" output: stdout: {} ``` ### [](#one-time-query)One-time query For a single query, use `count` instead of `interval`: ```yaml input: generate: count: 1 mapping: | root.jql = "project = MYPROJECT AND status = Open" root.maxResults = 100 pipeline: processors: - jira: base_url: "${secrets.JIRA_BASE_URL}" username: "${secrets.JIRA_USERNAME}" api_token: "${secrets.JIRA_API_TOKEN}" output: stdout: {} ``` ## [](#input-message-format)Input message format The Jira processor expects input messages containing valid Jira queries in JSON format: ```json { "jql": "project = MYPROJECT AND status = Open", "maxResults": 50, "fields": ["key", "summary", "status", "assignee"] } ``` ### [](#required-fields)Required fields - `jql`: The JQL (Jira Query Language) query string ### [](#optional-fields)Optional fields - `maxResults`: Maximum number of results to return (default: 50) - `fields`: Array of field names to include in the response ## [](#jql-query-patterns)JQL query patterns Here are common JQL patterns for filtering issues: ### [](#recent-issues-by-project)Recent issues by project ```jql project = AND created >= -7d ORDER BY created DESC ``` ### [](#issues-assigned-to-current-user)Issues assigned to current user ```jql assignee = currentUser() AND status != Done ``` ### [](#issues-by-status)Issues by status ```jql project = AND status IN (Open, 'In Progress', 'To Do') ``` ### [](#issues-by-priority)Issues by priority ```jql project = AND priority = High ORDER BY created DESC ``` ## [](#output-message-format)Output message format The Jira processor returns individual issue messages, rather than a response object with an `issues` array. Each message output by the Jira processor represents a single issue: ```json { "id": "12345", "key": "DOC-123", "fields": { "summary": "Example issue", "status": { "name": "In Progress" }, "assignee": { "displayName": "John Doe" } } } ``` The Jira processor automatically handles pagination internally. The processor: 1. Makes the initial request with `startAt=0`. 2. Checks if more results are available. 3. Automatically fetches subsequent pages until all results are retrieved. 4. Outputs each issue as an individual message. You don’t need to handle pagination manually. ## [](#create-and-update-jira-issues)Create and update Jira issues The Jira processor is read-only and only supports querying. To create or update Jira issues, use the [`http` processor](../../components/processors/http/) with the Jira REST API. ### [](#create-a-jira-issue)Create a Jira issue ```yaml input: generate: count: 1 mapping: | root.fields = { "project": {"key": "MYPROJECT"}, "summary": "Issue created from Redpanda Connect", "description": { "type": "doc", "version": 1, "content": [{"type": "paragraph", "content": [{"type": "text", "text": "Created via API"}]}] }, "issuetype": {"name": "Task"} } pipeline: processors: - http: url: "${secrets.JIRA_BASE_URL}/rest/api/3/issue" verb: POST headers: Content-Type: application/json Authorization: "Basic ${secrets.JIRA_AUTH_TOKEN}" output: stdout: {} ``` ## [](#see-also)See also - [Jira processor reference](../../components/processors/jira/) - [Jira REST API documentation](https://developer.atlassian.com/cloud/jira/platform/rest/v3/intro/) - [JQL query guide](https://www.atlassian.com/software/jira/guides/jql) --- # Page 326: Joining Streams **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/joining_streams.md --- # Joining Streams --- title: Joining Streams latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/joining_streams page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/joining_streams.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/joining_streams.adoc description: How to hydrate documents by joining multiple streams. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- This cookbook demonstrates how to merge JSON events from parallel streams using content based rules and a [cache](../../components/caches/about/) of your choice. The imaginary problem we are going to solve is hydrating a feed of article comments with information from their parent articles. We will be consuming and writing to Kafka, but the example works with any [input](../../components/inputs/about/) and [output](../../components/outputs/about/) combination. Articles are received over the topic `articles` and look like this: ```json { "type": "article", "article": { "id": "123foo", "title": "Good article", "content": "this is a totally good article" }, "user": { "id": "user1" } } ``` Comments can either be posted on an article or a parent comment, are received over the topic `comments`, and look like this: ```json { "type": "comment", "comment": { "id": "456bar", "parent_id": "123foo", "content": "this article is bad" }, "user": { "id": "user2" } } ``` Our goal is to end up with a single stream of comments, where information about the root article of the comment is attached to the event. The above comment should exit our pipeline looking like this: ```json { "type": "comment", "comment": { "id": "456bar", "parent_id": "123foo", "content": "this article is bad" }, "article": { "title": "Good article", "content": "this is a totally good article" }, "user": { "id": "user2" } } ``` In order to achieve this we will need to cache articles as they pass through our pipelines and then retrieve them for each comment passing through. Since the parent of a comment might be another comment we will also need to cache and retrieve comments in the same way. ## [](#caching-articles)Caching articles Our first pipeline is very simple, we just consume articles, reduce them to only the fields we wish to cache, and then cache them. If we receive the same article multiple times we’re going to assume it’s okay to overwrite the old article in the cache. In this example I’m targeting Redis, but you can choose any of the supported [cache targets](../../components/caches/about/). The TTL of cached articles is set to one week. ```yaml input: kafka: addresses: [ TODO ] topics: [ articles ] consumer_group: benthos_articles_group pipeline: processors: # Reduce document into only fields we wish to cache. - mapping: 'article = article' # Store reduced articles into our cache. - cache: operator: set resource: hydration_cache key: '${!json("article.id")}' value: '${!content()}' # Drop all articles after they are cached. output: drop: {} cache_resources: - label: hydration_cache redis: url: TODO default_ttl: 168h ``` ## [](#hydrating-comments)Hydrating comments Our second pipeline consumes comments, caches them in case a subsequent comment references them, obtains its parent (article or comment), and attaches the root article to the event before sending it to our output topic `comments_hydrated`. In this config we make use of the [`branch`](../../components/processors/branch/) processor as it allows us to reduce documents into smaller maps for caching and gives us greater control over how results are mapped back into the document. ```yaml input: kafka: addresses: [ TODO ] topics: [ comments ] consumer_group: benthos_comments_group pipeline: processors: # Perform both hydration and caching within a for_each block as this ensures # that a given message of a batch is cached before the next message is # hydrated, ensuring that when a message of the batch has a parent within # the same batch hydration can still work. - for_each: # Attempt to obtain parent event from cache (if the ID exists). - branch: request_map: 'root = this.comment.parent_id | deleted()' processors: - cache: operator: get resource: hydration_cache key: '${!content()}' # And if successful copy it into the field `article`. result_map: 'root.article = this.article' # Reduce comment into only fields we wish to cache. - branch: request_map: | root.comment.id = this.comment.id root.article = this.article processors: # Store reduced comment into our cache. - cache: operator: set resource: hydration_cache key: '${!json("comment.id")}' value: '${!content()}' # No `result_map` since we don't need to map into the original message. # Send resulting documents to our hydrated topic. output: kafka: addresses: [ TODO ] topic: comments_hydrated cache_resources: - label: hydration_cache redis: url: TODO default_ttl: 168h ``` This pipeline satisfies our basic needs but errors aren’t handled at all, meaning intermittent cache connectivity problems that span beyond our cache retries will result in failed documents entering our `comments_hydrated` topic. This is also the case if a comment arrives in our pipeline before its parent. There are [many patterns for error handling](../../configuration/error_handling/) to choose from in Redpanda Connect. In this example we’re going to introduce a delayed retry queue as it enables us to reprocess failed documents after a grace period, which is isolated from our main pipeline. ## [](#adding-a-retry-queue)Adding a retry queue Our retry queue is going to be another topic called `comments_retried`. Since most errors are related to time we will delay retry attempts by storing the current timestamp after a failed request as a metadata field. We will use an input [`broker`](../../components/inputs/broker/) so that we can consume both the `comments` and `comments_retry` topics in the same pipeline. Our config (omitting the caching sections for brevity) now looks like this: ```yaml input: broker: inputs: - kafka: addresses: [ TODO ] topics: [ comments ] consumer_group: benthos_comments_group - kafka: addresses: [ TODO ] topics: [ comments_retry ] consumer_group: benthos_comments_group processors: - for_each: # Calculate time until next retry attempt and sleep for that duration. # This sleep blocks the topic 'comments_retry' but NOT 'comments', # because both topics are consumed independently and these processors # only apply to the 'comments_retry' input. - sleep: duration: '${! 3600 - ( timestamp_unix() - meta("last_attempted").number() ) }s' pipeline: processors: - try: - for_each: # Attempt to obtain parent event from cache. - branch: {} # Omitted # Reduce document into only fields we wish to cache. - branch: {} # Omitted # If we've reached this point then both processors succeeded. - mapping: 'meta output_topic = "comments_hydrated"' - catch: # If we reach here then a processing stage failed. - mapping: | meta output_topic = "comments_retry" meta last_attempted = timestamp_unix() # Send resulting documents either to our hydrated topic or the retry topic. output: kafka: addresses: [ TODO ] topic: '${!meta("output_topic")}' cache_resources: - label: hydration_cache redis: url: TODO default_ttl: 168h ``` You can find a full example [in the project repo](https://github.com/redpanda-data/connect/blob/master/config/examples/joining_streams.yaml), and with this config we can deploy as many instances of Redpanda Connect as we need as the partitions will be balanced across the consumers. --- # Page 327: Retrieval-Augmented Generation (RAG) **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/rag.md --- # Retrieval-Augmented Generation (RAG) --- title: Retrieval-Augmented Generation (RAG) latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/rag page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/rag.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/rag.adoc description: How to configure Redpanda Connect to create a RAG pipeline, using PostgreSQL and PGVector. page-git-created-date: "2024-09-12" page-git-modified-date: "2024-09-19" --- This cookbook shows you how to create a vector embeddings indexing pipeline for Retrieval-Augmented Generation (RAG), using PostgreSQL and [PGVector](https://github.com/pgvector/pgvector). Follow the cookbook to: - Take textual data from a Redpanda topic and compute vector embeddings for it using [Ollama](https://ollama.ai) - Write the pipeline output into a PostgreSQL table with a [PGVector](https://github.com/pgvector/pgvector) index on the embeddings column. ## [](#compute-the-embeddings)Compute the embeddings Start by creating a Redpanda topic, which you can use as an input for an indexing data pipeline. ```bash rpk topic create articles echo '{ "type": "article", "article": { "id": "123foo", "title": "Dogs Stop Barking", "content": "The world was shocked this morning to find that all dogs have stopped barking." } }' | rpk topic produce articles -f '%v' ``` Your indexing pipeline can read from the Redpanda topic, using the [`kafka`](../../components/inputs/kafka/) input: ```yaml input: kafka: addresses: [ "TODO" ] topics: [ articles ] consumer_group: rp_connect_articles_group tls: enabled: true sasl: mechanism: SCRAM-SHA-256 user: "TODO" password: "TODO" ``` Use [Nomic Embed](https://ollama.com/library/nomic-embed-text) to compute embeddings. Since each request only applies to a single document, you can scale this by making requests in parallel across document batches. To send a mapped request and map the response back into the original document, use the [`branch` processor](../../components/processors/branch/) with a child [`ollama_embeddings`](../../components/processors/ollama_embeddings/) processor. ```yaml pipeline: threads: -1 processors: - branch: request_map: 'root = "search_document: %s\n%s".format(this.article.title, this.article.content)' processors: - ollama_embeddings: model: nomic-embed-text result_map: 'root.article.embeddings = this' ``` With this pipeline, your processed documents should look something like this: ```yaml { "type": "article", "article": { "id": "123foo", "title": "Dogs Stop Barking", "content": "The world was shocked this morning to find that all dogs have stopped barking.", "embeddings": [0.754, 0.19283, 0.231, 0.834], # This vector will actually have 768 dimensions } } ``` Now, try sending this transformed data to PostgreSQL using the [`sql_insert`](../../components/outputs/sql_insert/) output. You can take advantage of the `init_statement` functionality to set up `pgvector` and a table to write the data to. ```yaml output: sql_insert: driver: postgres dsn: "TODO" init_statement: | CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE IF NOT EXISTS searchable_text ( id varchar(128) PRIMARY KEY, title text NOT NULL, body text NOT NULL, embeddings vector(768) NOT NULL ); CREATE INDEX IF NOT EXISTS text_hnsw_index ON searchable_text USING hnsw (embeddings vector_l2_ops); table: searchable_text columns: ["id", "title", "body", "embeddings"] args_mapping: "[this.article.id, this.article.title, this.article.content, this.article.embeddings.vector()]" ``` After deploying this pipeline using the Redpanda Console, you can verify data is being written into PostgreSQL using `psql` to execute `SELECT count(*) FROM searchable_text;`. --- # Page 328: Redpanda Migrator **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/redpanda_migrator.md --- # Redpanda Migrator --- title: Redpanda Migrator latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/redpanda_migrator page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/redpanda_migrator.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/redpanda_migrator.adoc description: Move your workloads from any Kafka system to Redpanda Cloud using a single command. page-git-created-date: "2024-10-02" page-git-modified-date: "2024-10-02" --- With Redpanda Migrator, you can move your workloads from any Apache Kafka system to Redpanda using a single command. It lets you migrate Kafka messages, schemas, and ACLs quickly and efficiently. Redpanda Connect’s Redpanda Migrator uses the unified migrator components (available in Redpanda Connect 4.67.5+): - [`redpanda_migrator` input](../../components/inputs/redpanda_migrator/) connects to the source Kafka cluster and Schema Registry. - [`redpanda_migrator` output](../../components/outputs/redpanda_migrator/) handles all migration logic including topic creation, schema synchronization, and consumer group offset translation. > 📝 **NOTE** > > If you’re currently using the legacy `redpanda_migrator_bundle` components, see [Migrate to the Unified Redpanda Migrator](../../guides/migrate-unified-redpanda-migrator/) for migration instructions. ## [](#create-a-kafka-cluster-and-a-redpanda-cloud-cluster)Create a Kafka cluster and a Redpanda Cloud cluster First, you need to provision two clusters, a Kafka one called `source` and a Redpanda Cloud one called `destination`. This cookbook uses the following sample connection details throughout the rest of this cookbook: Source broker: source.cloud.kafka.com:9092 schema registry: https://schema-registry-source.cloud.kafka.com:30081 username: kafka password: testpass Destination broker: destination.cloud.redpanda.com:9092 schema registry: https://schema-registry-destination.cloud.redpanda.com:30081 username: redpanda password: testpass Then you create two topics in the `source` Kafka cluster, `foo` and `bar`, and an ACL for each topic: ```bash cat > ./config.properties < 📝 **NOTE** > > The Brave browser does not fully support code snippets. `generate_data.yaml` ```yaml http: enabled: false input: sequence: inputs: - generate: mapping: | let msg = counter() root.data = $msg meta kafka_topic = match $msg % 2 { 0 => "foo" 1 => "bar" } interval: 1s count: 0 batch_size: 1 processors: - schema_registry_encode: url: "https://schema-registry-source.cloud.kafka.com:30081" subject: ${! metadata("kafka_topic") } avro_raw_json: true basic_auth: enabled: true username: kafka password: testpass output: kafka_franz: seed_brokers: [ "source.cloud.kafka.com:9092" ] topic: ${! @kafka_topic } partitioner: manual partition: ${! random_int(min:0, max:1) } tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: kafka password: testpass ``` > 📝 **NOTE** > > The Brave browser does not fully support code snippets. 5. Click **Create**. Your pipeline details are displayed and the pipeline state changes from **Starting** to **Running**, which may take a few minutes. If you don’t see this state change, refresh your page. Next, add a Redpanda Connect consumer, which reads messages from the `source` cluster topics, and leave it running. This consumer uses the `foobar` consumer group, which is reused in a later step when consuming from the `destination` cluster. 1. Go to the **Connect** page on your cluster and click **Create pipeline**. 2. In **Pipeline name**, enter a name and add a short description. 3. For **Compute units**, leave the default value of **1**. 4. For **Configuration**, paste the following configuration. `read_data_source.yaml` ```yaml http: enabled: false input: kafka_franz: seed_brokers: [ "source.cloud.kafka.com:9092" ] topics: - '^[^_]' # Skip topics which start with `_` regexp_topics: true consumer_group: foobar tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: kafka password: testpass processors: - schema_registry_decode: url: "https://schema-registry-source.cloud.kafka.com:30081" avro_raw_json: true basic_auth: enabled: true username: kafka password: testpass output: stdout: {} processors: - mapping: | root = this.merge({"count": counter(), "topic": @kafka_topic, "partition": @kafka_partition}) ``` > 📝 **NOTE** > > The Brave browser does not fully support code snippets. 5. Click **Create**. Your pipeline details are displayed and the pipeline state changes from **Starting** to **Running**, which may take a few minutes. If you don’t see this state change, refresh your page. At this point, the `source` cluster has some data in both `foo` and `bar` topics, and the consumer prints the messages it reads from these topics to `stdout`. ## [](#configure-and-start-redpanda-migrator)Configure and start Redpanda Migrator The unified Redpanda Migrator does the following: - The `redpanda_migrator` input connects to the source Kafka cluster and Schema Registry to consume messages and schema information. - The `redpanda_migrator` output handles all migration logic: - Schema migration: reads schemas from the source Schema Registry and synchronizes them to the destination. - Topic creation: automatically creates destination topics that don’t exist with proper configurations. - ACL migration: migrates access control lists according to the migration rules. - Message streaming: processes and routes messages from source to destination topics. - Consumer group offset translation: maps source consumer group offsets to equivalent destination positions. - If new topics are created in the source cluster while the migrator is running, they are migrated when messages are written to them. ACL migration for topics adheres to the following principles: - `ALLOW WRITE` ACLs for topics are not migrated - `ALLOW ALL` ACLs for topics are downgraded to `ALLOW READ` - Group ACLs are not migrated > 📝 **NOTE** > > Changing topic configurations, such as partition count, isn’t currently supported. Now, use the following unified Redpanda Migrator configuration. See the [`redpanda_migrator` input](../../components/inputs/redpanda_migrator/) and [`redpanda_migrator` output](../../components/outputs/redpanda_migrator/) docs for details. 1. Go to the **Connect** page on your cluster and click **Create pipeline**. 2. In **Pipeline name**, enter a name and add a short description. 3. For **Compute units**, leave the default value of **1**. 4. For **Configuration**, paste the following configuration. `redpanda_migrator.yaml` ```yaml input: label: "migration_pipeline" (1) redpanda_migrator: # Source Kafka settings seed_brokers: [ "source.cloud.kafka.com:9092" ] topics: - '^[^_]' # Skip internal topics which start with `_` regexp_topics: true consumer_group: migrator tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: kafka password: testpass # Source Schema Registry settings schema_registry: url: "https://schema-registry-source.cloud.kafka.com:30081" basic_auth: enabled: true username: kafka password: testpass output: label: "migration_pipeline" (2) redpanda_migrator: # Destination Redpanda settings seed_brokers: [ "destination.cloud.redpanda.com:9092" ] tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: redpanda password: testpass # Destination Schema Registry and migration settings schema_registry: url: https://schema-registry-destination.cloud.redpanda.com:30081 include_deleted: true translate_ids: true basic_auth: enabled: true username: redpanda password: testpass # Consumer group migration settings consumer_groups: enabled: true interval: 30s serverless: false (3) ``` > 💡 **TIP** > > Label names must be between 3 and 128 characters and can only contain alphanumeric characters, hyphens, and underscores (`A-Za-z0-9-_`). ## [](#check-the-status-of-migrated-topics)Check the status of migrated topics You can use the Redpanda [`rpk` CLI tool](../../../../../current/get-started/rpk/) to check which topics and ACLs have been migrated to the `destination` cluster. You can quickly [install `rpk`](../../../../../current/get-started/rpk-install/) if you don’t already have it. > 📝 **NOTE** > > For now, users require manual migration. However, this step is not required for the current demo. Similarly, roles are specific to Redpanda and, for now, also require manual migration if the `source` cluster is based on Redpanda. ```bash rpk -X brokers=destination.cloud.redpanda.com:9092 -X tls.enabled=true -X sasl.mechanism=SCRAM-SHA-256 -X user=redpanda -X pass=testpass topic list NAME PARTITIONS REPLICAS _schemas 1 1 bar 2 1 foo 2 1 rpk -X brokers=destination.cloud.redpanda.com:9092 -X tls.enabled=true -X sasl.mechanism=SCRAM-SHA-256 -X user=redpanda -X pass=testpass security acl list PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:redpanda * TOPIC bar LITERAL READ DENY User:redpanda * TOPIC foo LITERAL READ ALLOW ``` ## [](#check-metrics-to-monitor-progress)Check metrics to monitor progress Redpanda Connect provides a comprehensive suite of metrics in various formats, such as Prometheus, which you can use to monitor its performance in your observability stack. Besides the [standard Redpanda Connect metrics](../../components/metrics/about/#metric-names), the `redpanda_migrator` input also emits an `input_redpanda_migrator_lag` metric for monitoring the migration progress of each topic and partition. To monitor the migration progress, use the Redpanda Cloud OpenMetrics endpoint, which exposes all Redpanda and connector metrics for your cluster. You can integrate this endpoint with Prometheus, Datadog, or other observability platforms. For step-by-step instructions on configuring monitoring and connecting your observability tool, see [Monitor Redpanda Cloud](../../../../manage/monitor-cloud/). After ingesting the metrics, search for the `input_redpanda_migrator_lag` metric in your monitoring tool and filter by `topic` and `partition` as needed to track migration lag for each topic and partition. ## [](#read-from-the-migrated-topics)Read from the migrated topics Stop the `read_data_source.yaml` consumer you started earlier and, afterwards, start a similar consumer for the `destination` cluster. Before starting the consumer up on the `destination` cluster, make sure you give the migrator bundle some time to replicate the translated offset. 1. On the **Connect** page, stop the `read_data_source` pipeline you created earlier. 2. Go to the **Connect** page on your cluster and click **Create pipeline**. 3. In **Pipeline name**, enter a name and add a short description. 4. For **Compute units**, leave the default value of **1**. 5. For **Configuration**, paste the following configuration. `read_data_destination.yaml` ```yaml http: enabled: false input: kafka_franz: seed_brokers: [ "destination.cloud.redpanda.com:9092" ] topics: - '^[^_]' # Skip topics which start with `_` regexp_topics: true consumer_group: foobar sasl: - mechanism: SCRAM-SHA-256 username: redpanda password: testpass processors: - schema_registry_decode: url: "https://schema-registry-destination.cloud.redpanda.com:30081" avro_raw_json: true basic_auth: enabled: true username: redpanda password: testpass output: stdout: {} processors: - mapping: | root = this.merge({"count": counter(), "topic": @kafka_topic, "partition": @kafka_partition}) ``` > 📝 **NOTE** > > The Brave browser does not fully support code snippets. 6. Click **Create**. Your pipeline details are displayed and the pipeline state changes from **Starting** to **Running**, which may take a few minutes. If you don’t see this state change, refresh your page. The `source` cluster consumer uses the same `foobar` consumer group. This consumer resumes reading messages from where the `source` consumer left off. Redpanda Migrator performs offset remapping when migrating consumer group offsets to the `destination` cluster. While more sophisticated approaches are possible, Redpanda chose to use a simple timestamp-based approach. So, for each migrated offset, the `destination` cluster is queried to find the latest offset before the received offset timestamp. Redpanda Migrator then writes this offset as the `destination` consumer group offset for the corresponding topic and partition pair. Although the timestamp-based approach doesn’t guarantee exactly-once delivery, it minimizes the likelihood of message duplication and avoids the need for complex and error-prone offset remapping logic. --- # Page 329: Ingest data into Snowflake **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/cookbooks/snowflake_ingestion.md --- # Ingest data into Snowflake --- title: Ingest data into Snowflake latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/cookbooks/snowflake_ingestion page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/cookbooks/snowflake_ingestion.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/cookbooks/snowflake_ingestion.adoc description: Configure Redpanda Connect to ingest data from a Redpanda topic into Snowflake using Snowpipe Streaming. page-git-created-date: "2025-01-28" page-git-modified-date: "2025-01-28" --- Configure a Redpanda Connect pipeline to generate and write data into a Redpanda Serverless topic, and then ingest that data into [Snowflake](https://www.snowflake.com/en/) using [Snowpipe Streaming](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview). ## [](#prerequisites)Prerequisites - A [Redpanda Cloud account](https://cloud.redpanda.com/sign-up) - [`rpk` installed](https://docs.redpanda.com/current/get-started/rpk-install/) and [signed into your Cloud account](https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-login/) - A [Snowflake account](https://trial.snowflake.com/) - `openssl` command-line tool ## [](#set-up-your-redpanda-cluster)Set up your Redpanda cluster In [Redpanda Cloud](https://cloud.redpanda.com/), create a new Serverless Standard cluster. When the cluster is ready, run `rpk cloud cluster select` to select the cluster and set it to be your current [rpk profile](https://docs.redpanda.com/current/get-started/config-rpk-profile/). Next, create a `demo_topic` to use as the data source for ingesting data into Snowflake: ```bash rpk topic create demo_topic ``` Create a user with minimal [ACLs](https://docs.redpanda.com/current/manage/security/authorization/acl/) to run the ingestion pipeline into Snowflake: ```bash rpk security user create ingestion_user --password Testing1234 ``` Now that the user exists, give them read permissions to `demo_topic`, as well as full control over any consumer group with the prefix `redpanda_connect`: ```bash rpk security acl create --allow-principal ingestion_user --operation read --topic demo_topic rpk security acl create --allow-principal ingestion_user --resource-pattern-type prefixed --operation all --group redpanda_connect ``` ## [](#set-up-your-snowflake-account)Set up your Snowflake account Log in to your Snowflake account with a user who has the ACCOUNTADMIN role. Then, run the following SQL commands in a worksheet. They set up another user with minimal permissions to write data into a specified database and schema, ready for streaming data to Snowflake. ```sql -- Set default values for multiple variables SET PWD = 'Test1234567'; SET USER = 'STREAMING_USER'; SET DB = 'STREAMING_DB'; SET ROLE = 'REDPANDA_CONNECT'; SET WH = 'STREAMING_WH'; USE ROLE ACCOUNTADMIN; -- Create users CREATE USER IF NOT EXISTS IDENTIFIER($USER) PASSWORD=$PWD COMMENT='STREAMING USER FOR REDPANDA CONNECT'; -- Create roles CREATE OR REPLACE ROLE IDENTIFIER($ROLE); -- Create the destination database and virtual warehouse CREATE DATABASE IF NOT EXISTS IDENTIFIER($DB); USE IDENTIFIER($DB); CREATE OR REPLACE WAREHOUSE IDENTIFIER($WH) WITH WAREHOUSE_SIZE = 'SMALL'; -- Grant privileges GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE IDENTIFIER($ROLE); GRANT ROLE IDENTIFIER($ROLE) TO USER IDENTIFIER($USER); GRANT OWNERSHIP ON DATABASE IDENTIFIER($DB) TO ROLE IDENTIFIER($ROLE); GRANT USAGE ON WAREHOUSE IDENTIFIER($WH) TO ROLE IDENTIFIER($ROLE); -- Set defaults ALTER USER IDENTIFIER($USER) SET DEFAULT_ROLE=$ROLE; ALTER USER IDENTIFIER($USER) SET DEFAULT_WAREHOUSE=$WH; -- Run the following commands to find your account identifier. Copy it down for later use. -- It will be something like `organization_name-account_name` -- e.g. ykmxgak-wyb52636 WITH HOSTLIST AS (SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist())))) SELECT REPLACE(VALUE:host,'.snowflakecomputing.com','') AS ACCOUNT_IDENTIFIER FROM HOSTLIST WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT_REGIONLESS'; ``` ### [](#create-an-rsa-key-pair)Create an RSA key pair Create an [RSA key pair](https://docs.snowflake.com/en/user-guide/key-pair-auth) using `openssl` to authenticate Redpanda Connect to Snowflake. When you’re prompted to give an encryption password, record it for later. ```bash openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -passout pass:Testing123 -out rsa_key.p8 ``` Create a public key. You’re prompted to enter your encryption password. ```bash openssl rsa -in rsa_key.p8 -pubout -passout pass:Testing123 -out rsa_key.pub ``` To register the public key in Snowflake, remove the public key delimiters and output only the base64-encoded portion of the PEM file. Run the following bash command to print it: ```bash cat rsa_key.pub | sed -e '1d' -e '$d' | tr -d '\n' ``` In the Snowflake worksheet, add the output of the bash command you just ran to the following SQL command and execute it: ```sql use role accountadmin; alter user streaming_user set rsa_public_key='< PubKeyWithoutDelimiters >'; ``` ### [](#create-a-schema-using-streaming_user)Create a schema using `streaming_user` Log out of Snowflake and sign back in as the default user (`streaming_user`) with the associated password (default: `Test1234567`). You created these credentials in [Set up your Snowflake account](#set-up-your-snowflake-account). Run the following SQL commands in a worksheet to create a schema (e.g. `STREAMING_SCHEMA`) in the default database (e.g. `STREAMING_DB`): ```sql SET DB = 'STREAMING_DB'; SET SCHEMA = 'STREAMING_SCHEMA'; USE IDENTIFIER($DB); CREATE OR REPLACE SCHEMA IDENTIFIER($SCHEMA); ``` ## [](#create-a-pipeline-from-your-redpanda-cluster-to-snowflake)Create a pipeline from your Redpanda cluster to Snowflake You can now create the pipeline. First create [secrets](../../configuration/secret-management/) for the passwords and keys you created during setup. On your Serverless cluster, go to the **Connect** page, select the **Secrets** tab and then create three secrets: - `REDPANDA_PASS` with the value `Testing1234` - `SNOWFLAKE_KEY` with the output value of `awk '{printf "%s\\n", $0}' rsa_key.p8` - `SNOWFLAKE_KEY_PASS` with the value `Testing123` Select the **Pipelines** tab and create a pipeline called **RedpandaToSnowflake**. Use the following YAML configuration: ```yaml input: # Reads data from our `demo_topic` kafka_franz: seed_brokers: ["${REDPANDA_BROKERS}"] topics: ["demo_topic"] consumer_group: "redpanda_connect_to_snowflake" tls: {enabled: true} checkpoint_limit: 4096 sasl: - mechanism: SCRAM-SHA-256 username: ingestion_user password: ${secrets.REDPANDA_PASS} # Define the batching policy. This cookbook creates small batches, # but in a production environment use the largest file size you can. batching: count: 100 # Collect 10 messages before flushing period: 10s # or after 10 seconds, whichever comes first output: snowflake_streaming: # Replace this placeholder with your account identifier account: "< OrgName-AccountName >" user: STREAMING_USER role: REDPANDA_CONNECT database: STREAMING_DB schema: STREAMING_SCHEMA table: STREAMING_DATA # Inject your private key and password private_key_file: "${secrets.SNOWFLAKE_KEY}" private_key_pass: "${secrets.SNOWFLAKE_KEY_PASS}" schema_evolution: enabled: true max_in_flight: 1 ``` You now can produce some data using `rpk` to test that everything works: ```bash echo '{"animal":"redpanda","attributes":"cute","age":6}' | rpk topic produce demo_topic -f '%v\n' echo '{"animal":"polar bear","attributes":"cool","age":13}' | rpk topic produce demo_topic -f '%v\n' echo '{"animal":"unicorn","attributes":"rare","age":999}' | rpk topic produce demo_topic -f '%v\n' ``` The data produced into the `demo_topic` is consumed and streamed into Snowflake in seconds. Go back to the Snowflake worksheet and run the following query to see data arrive in Snowflake with the schema from the JSON data you produced. ```sql SELECT * FROM STREAMING_DB.STREAMING_SCHEMA.STREAMING_DATA LIMIT 50; ``` See also: - The [`kafka_franz` input](../../components/inputs/kafka_franz/) - The [`snowflake_streaming`](../../components/outputs/snowflake_streaming/) output --- # Page 330: Guides **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides.md --- # Guides --- title: Guides latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/index.adoc page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- - [Bloblang](bloblang/about/) Learn what Bloblang is and how to use the native mapping language. - Cloud Credentials - [Amazon Web Services](cloud/aws/) Find out about AWS components in Redpanda Connect. - [Google Cloud Platform](cloud/gcp/) Find out about GCP components in Redpanda Connect. - [Ingest Real-Time Sensor Telemetry with the HTTP Gateway](cloud/gateway/) Learn how to stream sensor telemetry data into Redpanda Cloud using the gateway input in Redpanda Connect. - [Synchronous Responses](sync_responses/) Understand synchronous response handling in Redpanda Connect, ensuring reliable and efficient data processing. - [Migrate to the Unified Redpanda Migrator](migrate-unified-redpanda-migrator/) Learn how to migrate from legacy migrator components to the unified \`redpanda\_migrator\` input/output pair in Redpanda Connect 4.67.5+. --- # Page 331: Bloblang **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/bloblang/about.md --- # Bloblang --- title: Bloblang latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/bloblang/about page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/bloblang/about.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/bloblang/about.adoc description: Learn what Bloblang is and how to use the native mapping language. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Bloblang, or blobl for short, is a language designed for mapping data of a wide variety of forms. It’s a safe, fast, and powerful way to perform document mapping within Redpanda Connect. It also has a [Go API for writing your own functions and methods](https://pkg.go.dev/github.com/redpanda-data/connect/v4/public/bloblang) as plugins. Bloblang is available as a [processor](../../../components/processors/mapping/) and it’s also possible to use blobl queries in [function interpolations](../../../configuration/interpolation/#bloblang-queries). This document outlines the core features of the Bloblang language, but if you’re totally new to Bloblang then it’s worth following [the walkthrough first](../walkthrough/). ## [](#learn-bloblang)Learn Bloblang [learnbloblang.com](https://www.learnbloblang.com) is an interactive resource for learning Bloblang with hands-on exercises. ## [](#assignment)Assignment A Bloblang mapping expresses how to create a new document by extracting data from an existing input document. Assignments consist of a dot separated path segments on the left-hand side describing a field to be created within the new document, and a right-hand side query describing what the content of the new field should be. The keyword `root` on the left-hand side refers to the root of the new document, the keyword `this` on the right-hand side refers to the current context of the query, which is the read-only input document when querying from the root of a mapping: ```bloblang root.id = this.thing.id root.type = "yo" # Both `root` and `this` are optional, and will be inferred in their absence. content = thing.doc.message # In: {"thing":{"id":"wat1","doc":{"title":"wut","message":"hello world"}}} ``` Since the document being created starts off empty it is sometimes useful to begin a mapping by copying the entire contents of the input document, which can be expressed by assigning `this` to `root`. ```bloblang root = this root.foo = "added value" # In: {"id":"wat1","message":"hello world"} ``` If the new document `root` is never assigned to or otherwise mutated then the original document remains unchanged. ### [](#special-characters-in-paths)Special characters in paths Quotes can be used to describe sections of a field path that contain whitespace, dots or other special characters: ```bloblang # Use quotes around a path segment in order to include whitespace or dots within # the path root."foo.bar".baz = this."buz bev".fub # In: {"buz bev":{"fub":"hello world"}} ``` ### [](#non-structured-data)Non-structured data Bloblang is able to map data that is unstructured, whether it’s a log line or a binary blob, by referencing it with the [`content` function](../functions/#content), which returns the raw bytes of the input document: ```bloblang # Parse a base64 encoded JSON document root = content().decode("base64").parse_json() # In: eyJmb28iOiJiYXIifQ== ``` And your newly mapped document can also be unstructured, simply assign a value type to the `root` of your document: ```bloblang root = this.foo # In: {"foo":"hello world"} ``` And the resulting message payload will be the raw value you’ve assigned. ### [](#deleting)Deleting It’s possible to selectively delete fields from an object by assigning the function `deleted()` to the field path: ```bloblang root = this root.bar = deleted() # In: {"id":"wat1","message":"hello world","bar":"remove me"} ``` ### [](#variables)Variables Another type of assignment is a `let` statement, which creates a variable that can be referenced elsewhere within a mapping. Variables are discarded at the end of the mapping and are mostly useful for query reuse. Variables are referenced within queries with `$`: ```bloblang # Set a temporary variable let foo = "yo" root.new_doc.type = $foo ``` ### [](#metadata)Metadata Redpanda Connect messages contain metadata that is separate from the main payload, in Bloblang you can modify the metadata of the resulting message with the `meta` assignment keyword. Metadata values of the resulting message are referenced within queries with the `@` operator or the [`metadata()` function](../functions/#metadata): ```bloblang # Reference a metadata value root.new_doc.bar = @kafka_topic # Or `@.kafka_topic` or `metadata("kafka_topic")` # Delete all metadata meta = deleted() # Set metadata values meta bar = "hello world" meta baz = { "something": "structured" } # Get an object of key/values for all metadata root.meta_obj = @ # Or `metadata()` ``` ## [](#coalesce)Coalesce The pipe operator (`|`) used within brackets allows you to coalesce multiple candidates for a path segment. The first field that exists and has a non-null value will be selected: ```bloblang root.new_doc.type = this.thing.(article | comment | this).type # In: {"thing":{"article":{"type":"foo"}}} # In: {"thing":{"comment":{"type":"bar"}}} # In: {"thing":{"type":"baz"}} ``` Opening brackets on a field begins a query where the context of `this` changes to value of the path it is opened upon, therefore in the above example `this` within the brackets refers to the contents of `this.thing`. ## [](#literals)Literals Bloblang supports number, boolean, string, null, array and object literals: ```bloblang root = [ 7, false, "string", null, { "first": 11, "second": {"foo":"bar"}, "third": """multiple lines on this string""" } ] # In: {} ``` The values within literal arrays and objects can be dynamic query expressions, as well as the keys of object literals. ## [](#comments)Comments You might’ve already spotted, comments are started with a hash (`#`) and end with a line break: ```bloblang root = this.some.value # And now this is a comment ``` ## [](#boolean-logic-and-arithmetic)Boolean logic and arithmetic Bloblang supports a range of boolean operators `!`, `>`, `>=`, `==`, `<`, `<=`, `&&`, `||` and mathematical operators `+`, `-`, `*`, `/`, `%`: ```bloblang root.is_big = this.number > 100 root.multiplied = this.number * 7 # In: {"number":50} # In: {"number":150} ``` For more information about these operators and how they work check out [the arithmetic page](../arithmetic/). ## [](#conditional-mapping)Conditional mapping Use `if` as either a statement or an expression in order to perform maps conditionally: ```bloblang root = this root.sorted_foo = if this.foo.type() == "array" { this.foo.sort() } if this.foo.type() == "string" { root.upper_foo = this.foo.uppercase() root.lower_foo = this.foo.lowercase() } # In: {"foo":"FooBar"} # In: {"foo":["foo","bar"]} ``` And add as many `else if` queries as you like, followed by an optional final fallback `else`: ```bloblang root.sound = if this.type == "cat" { this.cat.meow } else if this.type == "dog" { this.dog.woof.uppercase() } else { "sweet sweet silence" } # In: {"type":"cat","cat":{"meow":"meeeeooooow!"}} # In: {"type":"dog","dog":{"woof":"guurrrr woof woof!"}} # In: {"type":"caterpillar","caterpillar":{"name":"oleg"}} ``` ## [](#pattern-matching)Pattern matching A `match` expression allows you to perform conditional mappings on a value, each case should be either a boolean expression, a literal value to compare against the target value, or an underscore (`_`) which captures values that have not matched a prior case: ```bloblang root.new_doc = match this.doc { this.type == "article" => this.article this.type == "comment" => this.comment _ => this } # In: {"doc":{"type":"article","article":{"id":"foo","content":"qux"}}} # In: {"doc":{"type":"comment","comment":{"id":"bar","content":"quz"}}} # In: {"doc":{"type":"neither","content":"some other stuff unchanged"}} ``` Within a match block the context of `this` changes to the pattern matched expression, therefore `this` within the match expression above refers to `this.doc`. Match cases can specify a literal value for simple comparison: ```bloblang root = this root.type = match this.type { "doc" => "document", "art" => "article", _ => this } # In: {"type":"doc","foo":"bar"} ``` The match expression can also be left unset which means the context remains unchanged, and the catch-all case can also be omitted: ```bloblang root.new_doc = match { this.doc.type == "article" => this.doc.article this.doc.type == "comment" => this.doc.comment } # In: {"doc":{"type":"neither","content":"some other stuff unchanged"}} ``` If no case matches then the mapping is skipped entirely, hence we would end up with the original document in this case. ## [](#functions)Functions Functions can be placed anywhere and allow you to extract information from your environment, generate values, or access data from the underlying message being mapped: ```bloblang root.doc.id = uuid_v4() root.doc.received_at = now() root.doc.host = hostname() ``` Functions support both named and nameless style arguments: ```bloblang root.values_one = range(start: 0, stop: this.max, step: 2) root.values_two = range(0, this.max, 2) # In: {"max":10} ``` You can find a full list of functions and their parameters in [the functions page](../functions/). ## [](#methods)Methods Methods are similar to functions but enact upon a target value, these provide most of the power in Bloblang as they allow you to augment query values and can be added to any expression (including other methods): ```bloblang root.doc.id = this.thing.id.string().catch(uuid_v4()) root.doc.reduced_nums = this.thing.nums.map_each(num -> if num < 10 { deleted() } else { num - 10 }) root.has_good_taste = ["pikachu","mewtwo","magmar"].contains(this.user.fav_pokemon) # In: {"thing":{"id":123,"nums":[5,12,8,15,20]},"user":{"fav_pokemon":"pikachu"}} ``` Methods also support both named and nameless style arguments: ```bloblang root.foo_one = this.(bar | baz).trim().replace_all(old: "dog", new: "cat") root.foo_two = this.(bar | baz).trim().replace_all("dog", "cat") # In: {"bar":" I love my dog "} ``` You can find a full list of methods and their parameters in [the methods page](../methods/). ## [](#maps)Maps Defining named maps allows you to reuse common mappings on values with the [`apply` method](../methods/#apply): ```bloblang map things { root.first = this.thing_one root.second = this.thing_two } root.foo = this.value_one.apply("things") root.bar = this.value_two.apply("things") # In: {"value_one":{"thing_one":"hey","thing_two":"yo"},"value_two":{"thing_one":"sup","thing_two":"waddup"}} ``` Within a map the keyword `root` refers to a newly created document that will replace the target of the map, and `this` refers to the original value of the target. The argument of `apply` is a string, which allows you to dynamically resolve the mapping to apply. ## [](#import-maps)Import maps It’s possible to import maps defined in a file with an `import` statement: ```bloblang import "./common_maps.blobl" root.foo = this.value_one.apply("things") root.bar = this.value_two.apply("things") # In: {"value_one":{"thing_one":"hey","thing_two":"yo"},"value_two":{"thing_one":"sup","thing_two":"waddup"}} ``` Imports from a Bloblang mapping within a Redpanda Connect config are relative to the process running the config. Imports from an imported file are relative to the file that is importing it. ## [](#filtering)Filtering By assigning the root of a mapped document to the `deleted()` function you can delete a message entirely: ```bloblang # Filter all messages that have fewer than 10 URLs. root = if this.doc.urls.length() < 10 { deleted() } # In: {"doc":{"urls":["a","b","c"]}} # In: {"doc":{"urls":["a","b","c","d","e","f","g","h","i","j"]}} ``` ## [](#error-handling)Error handling Functions and methods can fail under certain circumstances, such as when they receive types they aren’t able to act upon. These failures, when not caught, will cause the entire mapping to fail. However, the [method `catch`](../methods/#catch) can be used in order to return a value when a failure occurs instead: ```bloblang # Map an empty array to `foo` if the field `bar` is not a string. root.foo = this.bar.split(",").catch([]) # In: {"bar":"a,b,c"} # In: {"bar":123} ``` Since `catch` is a method it can also be attached to bracketed map expressions: ```bloblang # Map `false` if any of the operations in this boolean query fail. root.thing = ( this.foo > this.bar && this.baz.contains("wut") ).catch(false) # In: {"foo":10,"bar":5,"baz":"wut wut"} # In: {"foo":"not a number","bar":5,"baz":"wut wut"} ``` And one of the more powerful features of Bloblang is that a single `catch` method at the end of a chain of methods can recover errors from any method in the chain: ```bloblang # Catch errors caused by: # - foo not existing # - foo not being a string # - an element from split foo not being a valid JSON string root.things = this.foo.split(",").map_each( ele -> ele.parse_json() ).catch([]) # Specifically catch a JSON parse error root.things = this.foo.split(",").map_each( ele -> ele.parse_json().catch({}) ) # In: {"foo":"{\"a\":1},{\"b\":2}"} # In: {"foo":"not valid json"} ``` However, the `catch` method only acts on errors, sometimes it’s also useful to set a fall back value when a query returns `null` in which case the [method `or`](../methods/#or) can be used the same way: ```bloblang # Map "default" if either the element index 5 does not exist, or the underlying # element is `null`. root.foo = this.bar.index(5).or("default") # In: {"bar":["a","b","c"]} # In: {"bar":["a","b","c","d","e","f","g"]} ``` ## [](#unit-testing)Unit testing It’s possible to execute unit tests for your Bloblang mappings using the standard Redpanda Connect unit test capabilities outlined [in this document](../../../configuration/unit_testing/). ## [](#troubleshooting)Troubleshooting 1. I’m seeing `unable to reference message as structured (with 'this')` when I try to run mappings with `rpk connect blobl`. That particular error message means the mapping is failing to parse what’s being fed in as a JSON document. Make sure that the data you are feeding in is valid JSON, and also that the documents _do not_ contain line breaks as `rpk connect blobl` will parse each line individually. Why? That’s a good question. Bloblang supports non-JSON formats too, so it can’t delimit documents with a streaming JSON parser like tools such as `jq`, so instead it uses line breaks to determine the boundaries of each message. --- # Page 332: Bloblang Arithmetic **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/bloblang/arithmetic.md --- # Bloblang Arithmetic --- title: Bloblang Arithmetic latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/bloblang/arithmetic page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/bloblang/arithmetic.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/bloblang/arithmetic.adoc description: How arithmetic works within Bloblang page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Bloblang supports a range of comparison operators `!`, `>`, `>=`, `==`, `<`, `<=`, `&&`, `||` and mathematical operators `+`, `-`, `*`, `/`, `%`. How these operators behave is dependent on the type of the values they’re used with, and therefore it’s worth fully understanding these behaviors if you intend to use them heavily in your mappings. ## [](#mathematical)Mathematical All mathematical operators (`+`, `-`, `*`, `/`, `%`) are valid against number values, and addition (`+`) is also supported when both the left and right hand side arguments are strings. If a mathematical operator is used with an argument that is non-numeric (with the aforementioned string exception) then a [recoverable mapping error will be thrown](../about/#error-handling). ### [](#number-degradation)Number degradation In Bloblang any number resulting from a method, function or arithmetic is either a 64-bit signed integer or a 64-bit floating point value. Numbers from input documents can be any combination of size and be signed or unsigned. When a mathematical operation is performed with two or more integer values Bloblang will create an integer result, with the exception of division. However, if any number within a mathematical operation is a floating point then the result will be a floating point value. In order to explicitly coerce numbers into integer types you can use the [`.ceil()`, `.floor()`, or `.round()` methods](../methods/#number-manipulation). ## [](#comparison)Comparison The not (`!`) operator reverses the boolean value of the expression immediately following it, and is valid to place before any query that yields a boolean value. If the following expression yields a non-boolean value then a [recoverable mapping error will be thrown](../about/#error-handling). If you wish to reverse the boolean result of a complex query then simply place the query within brackets (`!(this.foo > this.bar)`). ### [](#equality)Equality The equality operators (`==` and `!=`) are valid to use against any value type. In order for arguments to be considered equal they must match in both their basic type (`string`, `number`, `null`, `bool`, etc) as well as their value. If you wish to compare mismatched value types then use [coercion methods](../methods/#type-coercion). Number arguments are considered equal if their value is the same when represented the same way, which means their underlying representations (integer, float, etc) do not need to match in order for them to be considered equal. ### [](#numerical)Numerical Numerical comparisons (`>`, `>=`, `<`, `<=`) are valid to use against number values only. If a non-number value is used as an argument then a [recoverable mapping error will be thrown](../about/#error-handling). ### [](#boolean)Boolean Boolean comparison operators (`||`, `&&`) are valid to use against boolean values only (`true` or `false`). If a non-boolean value is used as an argument then a [recoverable mapping error will be thrown](../about/#error-handling). --- # Page 333: Bloblang Functions **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/bloblang/functions.md --- # Bloblang Functions --- title: Bloblang Functions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/bloblang/functions page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/bloblang/functions.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/bloblang/functions.adoc description: A list of Bloblang functions page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Functions can be placed anywhere and allow you to extract information from your environment, generate values, or access data from the underlying message being mapped: ```bloblang root.doc.id = uuid_v4() root.doc.received_at = now() root.doc.host = hostname() ``` Functions support both named and nameless style arguments: ```bloblang root.values_one = range(start: 0, stop: this.max, step: 2) root.values_two = range(0, this.max, 2) # In: {"max":10} ``` ## [](#batch_index)batch\_index Returns the zero-based index of the current message within its batch. Use this to conditionally process messages based on their position, or to create sequential identifiers within a batch. ### [](#examples)Examples ```bloblang root = if batch_index() > 0 { deleted() } ``` Create a unique identifier combining batch position with timestamp: ```bloblang root.id = "%v-%v".format(timestamp_unix(), batch_index()) ``` ## [](#batch_size)batch\_size Returns the total number of messages in the current batch. Use this to determine batch boundaries or compute relative positions. ### [](#examples-2)Examples ```bloblang root.total = batch_size() ``` Check if processing the last message in a batch: ```bloblang root.is_last = batch_index() == batch_size() - 1 ``` ## [](#bytes)bytes Creates a zero-initialized byte array of specified length. Use this to allocate fixed-size byte buffers for binary data manipulation or to generate padding. ### [](#parameters)Parameters | Name | Type | Description | | --- | --- | --- | | length | integer | The size of the resulting byte array. | ### [](#examples-3)Examples ```bloblang root.data = bytes(5) ``` Create a buffer for binary operations: ```bloblang root.header = bytes(16) root.payload = content() ``` ## [](#content)content Returns the raw message payload as bytes, regardless of the current mapping context. Use this to access the original message when working within nested contexts, or to store the entire message as a field. ### [](#examples-4)Examples ```bloblang root.doc = content().string() # In: {"foo":"bar"} # Out: {"doc":"{\"foo\":\"bar\"}"} ``` Preserve original message while adding metadata: ```bloblang root.original = content().string() root.processed_by = "ai" # In: {"foo":"bar"} # Out: {"original":"{\"foo\":\"bar\"}","processed_by":"ai"} ``` ## [](#count)count > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. The `count` function is a counter starting at 1 which increments after each time it is called. Count takes an argument which is an identifier for the counter, allowing you to specify multiple unique counters in your configuration. ### [](#parameters-2)Parameters | Name | Type | Description | | --- | --- | --- | | name | string | An identifier for the counter. | ### [](#examples-5)Examples ```bloblang root = this root.id = count("bloblang_function_example") # In: {"message":"foo"} # Out: {"id":1,"message":"foo"} # In: {"message":"bar"} # Out: {"id":2,"message":"bar"} ``` ## [](#counter)counter Generates an incrementing sequence of integers starting from a minimum value (default 1). Each counter instance maintains its own independent state across message processing. When the maximum value is reached, the counter automatically resets to the minimum. ### [](#parameters-3)Parameters | Name | Type | Description | | --- | --- | --- | | min | query expression | The starting value of the counter. This is the first value yielded. Evaluated once when the mapping is initialized. | | max | query expression | The maximum value before the counter resets to min. Evaluated once when the mapping is initialized. | | set (optional) | query expression | An optional query that controls counter behavior: when it resolves to a non-negative integer, the counter is set to that value; when it resolves to null, the counter is read without incrementing; when it resolves to a deletion, the counter resets to min; otherwise the counter increments normally. | ### [](#examples-6)Examples Generate sequential IDs for each message: ```bloblang root.id = counter() # In: {} # Out: {"id":1} # In: {} # Out: {"id":2} ``` Use a custom range for the counter: ```bloblang root.batch_num = counter(min: 100, max: 200) # In: {} # Out: {"batch_num":100} # In: {} # Out: {"batch_num":101} ``` Increment a counter multiple times within a single mapping using a named map: ```bloblang map increment { root = counter() } root.first_id = null.apply("increment") root.second_id = null.apply("increment") # In: {} # Out: {"first_id":1,"second_id":2} # In: {} # Out: {"first_id":3,"second_id":4} ``` Conditionally reset a counter based on input data: ```bloblang root.streak = counter(set: if this.status != "success" { 0 }) # In: {"status":"success"} # Out: {"streak":1} # In: {"status":"success"} # Out: {"streak":2} # In: {"status":"failure"} # Out: {"streak":0} # In: {"status":"success"} # Out: {"streak":1} ``` Peek at the current counter value without incrementing by using null in the set parameter: ```bloblang root.count = counter(set: if this.peek { null }) # In: {"peek":false} # Out: {"count":1} # In: {"peek":false} # Out: {"count":2} # In: {"peek":true} # Out: {"count":2} # In: {"peek":false} # Out: {"count":3} ``` ## [](#deleted)deleted Returns a deletion marker that removes the target field or message. When applied to root, the entire message is dropped while still being acknowledged as successfully processed. Use this to filter data or conditionally remove fields. ### [](#examples-7)Examples ```bloblang root = this root.bar = deleted() # In: {"bar":"bar_value","baz":"baz_value","foo":"foo value"} # Out: {"baz":"baz_value","foo":"foo value"} ``` Filter array elements by returning deleted for unwanted items: ```bloblang root.new_nums = this.nums.map_each(num -> if num < 10 { deleted() } else { num - 10 }) # In: {"nums":[3,11,4,17]} # Out: {"new_nums":[1,7]} ``` ## [](#env)env Reads an environment variable and returns its value as a string. Returns `null` if the variable is not set. By default, values are cached for performance. ### [](#parameters-4)Parameters | Name | Type | Description | | --- | --- | --- | | name | string | The name of the environment variable to read. | | no_cache | bool | Disable caching to read the latest value on each invocation. | ### [](#examples-8)Examples ```bloblang root.api_key = env("API_KEY") ``` ```bloblang root.database_url = env("DB_URL").or("localhost:5432") ``` Use `no_cache` to read updated environment variables during runtime, useful for dynamic configuration changes: ```bloblang root.config = env(name: "DYNAMIC_CONFIG", no_cache: true) ``` ## [](#error)error Returns the error message string if the message has failed processing, otherwise `null`. Use this in error handling pipelines to log or route failed messages based on their error details. ### [](#examples-9)Examples ```bloblang root.doc.error = error() ``` Route messages to different outputs based on error presence: ```bloblang root = this root.error_msg = error() root.has_error = error() != null ``` ## [](#error_source_label)error\_source\_label Returns the user-defined label of the component that caused the error, empty string if no label is set, or `null` if the message has no error. Use this for more human-readable error tracking when components have custom labels. ### [](#examples-10)Examples ```bloblang root.doc.error_source_label = error_source_label() ``` Route errors based on component labels: ```bloblang root.error_category = error_source_label().or("unknown") ``` ## [](#error_source_name)error\_source\_name Returns the component name that caused the error, or `null` if the message has no error or the error has no associated component. Use this to identify which processor or component in your pipeline caused a failure. ### [](#examples-11)Examples ```bloblang root.doc.error_source_name = error_source_name() ``` Create detailed error logs with component information: ```bloblang root.error_details = if errored() { { "message": error(), "component": error_source_name(), "timestamp": now() } } ``` ## [](#error_source_path)error\_source\_path Returns the dot-separated path to the component that caused the error, or `null` if the message has no error. Use this to identify the exact location of a failed component in nested pipeline configurations. ### [](#examples-12)Examples ```bloblang root.doc.error_source_path = error_source_path() ``` Build comprehensive error context for debugging: ```bloblang root.error_info = { "path": error_source_path(), "component": error_source_name(), "message": error() } ``` ## [](#errored)errored Returns true if the message has failed processing, false otherwise. Use this for conditional logic in error handling workflows or to route failed messages to dead letter queues. ### [](#examples-13)Examples ```bloblang root.doc.status = if errored() { 400 } else { 200 } ``` Send only failed messages to a separate stream: ```bloblang root = if errored() { this } else { deleted() } ``` ## [](#fake)fake Generates realistic fake data for testing and development purposes. Supports a wide variety of data types including personal information, network addresses, dates/times, financial data, and UUIDs. Useful for creating mock data, populating test databases, or anonymizing sensitive information. Supported functions: `latitude`, `longitude`, `unix_time`, `date`, `time_string`, `month_name`, `year_string`, `day_of_week`, `day_of_month`, `timestamp`, `century`, `timezone`, `time_period`, `email`, `mac_address`, `domain_name`, `url`, `username`, `ipv4`, `ipv6`, `password`, `jwt`, `word`, `sentence`, `paragraph`, `cc_type`, `cc_number`, `currency`, `amount_with_currency`, `title_male`, `title_female`, `first_name`, `first_name_male`, `first_name_female`, `last_name`, `name`, `gender`, `chinese_first_name`, `chinese_last_name`, `chinese_name`, `phone_number`, `toll_free_phone_number`, `e164_phone_number`, `uuid_hyphenated`, `uuid_digit`. ### [](#parameters-5)Parameters | Name | Type | Description | | --- | --- | --- | | function | string | The name of the faker function to use. See description for full list of supported functions. | ### [](#examples-14)Examples Generate fake user profile data for testing: ```bloblang root.user = { "id": fake("uuid_hyphenated"), "name": fake("name"), "email": fake("email"), "created_at": fake("timestamp") } ``` Create realistic test data for network monitoring: ```bloblang root.event = { "source_ip": fake("ipv4"), "mac_address": fake("mac_address"), "url": fake("url") } ``` ## [](#file)file Reads a file and returns its contents as bytes. Paths are resolved from the process working directory. For paths relative to the mapping file, use `file_rel`. By default, files are cached after first read. ### [](#parameters-6)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | The absolute or relative path to the file. | | no_cache | bool | Disable caching to read the latest file contents on each invocation. | ### [](#examples-15)Examples ```bloblang root.config = file("/etc/config.json").parse_json() ``` ```bloblang root.template = file("./templates/email.html").string() ``` Use `no_cache` to read updated file contents during runtime, useful for hot-reloading configuration: ```bloblang root.rules = file(path: "/etc/rules.yaml", no_cache: true).parse_yaml() ``` ## [](#file_rel)file\_rel Reads a file and returns its contents as bytes. Paths are resolved relative to the mapping file’s directory, making it portable across different environments. By default, files are cached after first read. ### [](#parameters-7)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | The path to the file, relative to the mapping file’s directory. | | no_cache | bool | Disable caching to read the latest file contents on each invocation. | ### [](#examples-16)Examples ```bloblang root.schema = file_rel("./schemas/user.json").parse_json() ``` ```bloblang root.lookup = file_rel("../data/lookup.csv").parse_csv() ``` Use `no_cache` to read updated file contents during runtime, useful for reloading data files without restarting: ```bloblang root.translations = file_rel(path: "./i18n/en.yaml", no_cache: true).parse_yaml() ``` ## [](#hostname)hostname Returns the hostname of the machine running Benthos. Useful for identifying which instance processed a message in distributed deployments. ### [](#examples-17)Examples ```bloblang root.processed_by = hostname() ``` ## [](#json)json Returns a field from the original JSON message by dot path, always accessing the root document regardless of mapping context. Use this to reference the source message when working in nested contexts or to extract specific fields. ### [](#parameters-8)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | An optional [dot path][field_paths] identifying a field to obtain. | ### [](#examples-18)Examples ```bloblang root.mapped = json("foo.bar") # In: {"foo":{"bar":"hello world"}} # Out: {"mapped":"hello world"} ``` Access the original message from within nested mapping contexts: ```bloblang root.doc = json() # In: {"foo":{"bar":"hello world"}} # Out: {"doc":{"foo":{"bar":"hello world"}}} ``` ## [](#ksuid)ksuid Generates a K-Sortable Unique Identifier with built-in timestamp ordering. Use this for distributed unique IDs that sort chronologically and remain collision-resistant without coordination between generators. ### [](#examples-19)Examples ```bloblang root.id = ksuid() ``` Create sortable event IDs for logging: ```bloblang root.event = { "id": ksuid(), "type": this.event_type, "data": this.payload } ``` ## [](#meta)meta > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Returns the value of a metadata key from the input message as a string, or `null` if the key does not exist. Since values are extracted from the read-only input message they do NOT reflect changes made from within the map. In order to query metadata mutations made within a mapping use the [`root_meta` function](#root_meta). This function supports extracting metadata from other messages of a batch with the `from` method. ### [](#parameters-9)Parameters | Name | Type | Description | | --- | --- | --- | | key | string | An optional key of a metadata value to obtain. | ### [](#examples-20)Examples ```bloblang root.topic = meta("kafka_topic") ``` The key parameter is optional and if omitted the entire metadata contents are returned as an object: ```bloblang root.all_metadata = meta() ``` ## [](#metadata)metadata Returns metadata from the input message by key, or `null` if the key doesn’t exist. This reads the original metadata; to access modified metadata during mapping, use the `@` operator instead. Use this to extract message properties like topics, headers, or timestamps. ### [](#parameters-10)Parameters | Name | Type | Description | | --- | --- | --- | | key | string | An optional key of a metadata value to obtain. | ### [](#examples-21)Examples ```bloblang root.topic = metadata("kafka_topic") ``` Retrieve all metadata as an object by omitting the key parameter: ```bloblang root.all_metadata = metadata() ``` Copy specific metadata fields to the message body: ```bloblang root.source = { "topic": metadata("kafka_topic"), "partition": metadata("kafka_partition"), "timestamp": metadata("kafka_timestamp_unix") } ``` ## [](#nanoid)nanoid Generates a URL-safe unique identifier using Nano ID. Use this for compact, URL-friendly IDs with good collision resistance. Customize the length (default 21) or provide a custom alphabet for specific character requirements. ### [](#parameters-11)Parameters | Name | Type | Description | | --- | --- | --- | | length (optional) | integer | An optional length. | | alphabet (optional) | string | An optional custom alphabet to use for generating IDs. When specified the field length must also be present. | ### [](#examples-22)Examples ```bloblang root.id = nanoid() ``` Generate a longer ID for additional uniqueness: ```bloblang root.id = nanoid(54) ``` Use a custom alphabet for domain-specific IDs: ```bloblang root.id = nanoid(54, "abcde") ``` ## [](#nothing)nothing ## [](#now)now Returns the current timestamp as an RFC 3339 formatted string with nanosecond precision. Use this to add processing timestamps to messages or measure time between events. Chain with `ts_format` to customize the format or timezone. ### [](#examples-23)Examples ```bloblang root.received_at = now() ``` Format the timestamp in a custom format and timezone: ```bloblang root.received_at = now().ts_format("Mon Jan 2 15:04:05 -0700 MST 2006", "UTC") ``` ## [](#pi)pi Returns the value of the mathematical constant Pi. ### [](#examples-24)Examples ```bloblang root.radians = this.degrees * (pi() / 180) # In: {"degrees":45} # Out: {"radians":0.7853981633974483} ``` ```bloblang root.degrees = this.radians * (180 / pi()) # In: {"radians":0.78540} # Out: {"degrees":45.00010522957486} ``` ## [](#random_int)random\_int Generates a pseudo-random non-negative 64-bit integer. Use this for creating random IDs, sampling data, or generating test values. Provide a seed for reproducible randomness, or use a dynamic seed like `timestamp_unix_nano()` for unique values per mapping instance. Optional `min` and `max` parameters constrain the output range (both inclusive). For dynamic ranges based on message data, use the modulo operator instead: `random_int() % dynamic_max + dynamic_min`. ### [](#parameters-12)Parameters | Name | Type | Description | | --- | --- | --- | | seed | query expression | A seed to use, if a query is provided it will only be resolved once during the lifetime of the mapping. | | min | integer | The minimum value the random generated number will have. The default value is 0. | | max | integer | The maximum value the random generated number will have. The default value is 9223372036854775806 (math.MaxInt64 - 1). | ### [](#examples-25)Examples ```bloblang root.first = random_int() root.second = random_int(1) root.third = random_int(max:20) root.fourth = random_int(min:10, max:20) root.fifth = random_int(timestamp_unix_nano(), 5, 20) root.sixth = random_int(seed:timestamp_unix_nano(), max:20) ``` Use a dynamic seed for unique random values per mapping instance: ```bloblang root.random_id = random_int(timestamp_unix_nano()) root.sample_percent = random_int(seed: timestamp_unix_nano(), min: 0, max: 100) ``` ## [](#range)range Creates an array of integers from start (inclusive) to stop (exclusive) with an optional step. Use this to generate sequences for iteration, indexing, or creating numbered lists. ### [](#parameters-13)Parameters | Name | Type | Description | | --- | --- | --- | | start | integer | The start value. | | stop | integer | The stop value. | | step | integer | The step value. | ### [](#examples-26)Examples ```bloblang root.a = range(0, 10) root.b = range(start: 0, stop: this.max, step: 2) # Using named params root.c = range(0, -this.max, -2) # In: {"max":10} # Out: {"a":[0,1,2,3,4,5,6,7,8,9],"b":[0,2,4,6,8],"c":[0,-2,-4,-6,-8]} ``` Generate a sequence for batch processing: ```bloblang root.pages = range(0, this.total_items, 100).map_each(offset -> { "offset": offset, "limit": 100 }) # In: {"total_items":250} # Out: {"pages":[{"limit":100,"offset":0},{"limit":100,"offset":100}]} ``` ## [](#root_meta)root\_meta > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Returns the value of a metadata key from the new message being created as a string, or `null` if the key does not exist. Changes made to metadata during a mapping will be reflected by this function. ### [](#parameters-14)Parameters | Name | Type | Description | | --- | --- | --- | | key | string | An optional key of a metadata value to obtain. | ### [](#examples-27)Examples ```bloblang root.topic = root_meta("kafka_topic") ``` The key parameter is optional and if omitted the entire metadata contents are returned as an object: ```bloblang root.all_metadata = root_meta() ``` ## [](#snowflake_id)snowflake\_id Generates a unique, time-ordered Snowflake ID. Snowflake IDs are 64-bit integers that encode timestamp, node ID, and sequence information, making them ideal for distributed systems where sortable unique identifiers are needed. Returns a string representation of the ID. ### [](#parameters-15)Parameters | Name | Type | Description | | --- | --- | --- | | node_id | integer | Optional node identifier (0-1023) to distinguish IDs generated by different machines in a distributed system. Defaults to 1. | ### [](#examples-28)Examples Generate a unique Snowflake ID for each message: ```bloblang root.id = snowflake_id() root.payload = this ``` Generate Snowflake IDs with different node IDs for multi-datacenter deployments: ```bloblang root.id = snowflake_id(42) root.data = this ``` ## [](#throw)throw Immediately fails the mapping with a custom error message. Use this to halt processing when data validation fails or required fields are missing, causing the message to be routed to error handlers. ### [](#parameters-16)Parameters | Name | Type | Description | | --- | --- | --- | | why | string | A string explanation for why an error was thrown, this will be added to the resulting error message. | ### [](#examples-29)Examples ```bloblang root.doc.type = match { this.exists("header.id") => "foo" this.exists("body.data") => "bar" _ => throw("unknown type") } root.doc.contents = (this.body.content | this.thing.body) # In: {"header":{"id":"first"},"thing":{"body":"hello world"}} # Out: {"doc":{"contents":"hello world","type":"foo"}} # In: {"nothing":"matches"} # Out: Error("failed assignment (line 1): unknown type") ``` Validate required fields before processing: ```bloblang root = if this.exists("user_id") { this } else { throw("missing required field: user_id") } # In: {"user_id":123,"name":"alice"} # Out: {"name":"alice","user_id":123} # In: {"name":"bob"} # Out: Error("failed assignment (line 1): missing required field: user_id") ``` ## [](#timestamp_unix)timestamp\_unix Returns the current Unix timestamp in seconds since epoch. Use this for numeric timestamps compatible with most systems, or as a seed for random number generation. ### [](#examples-30)Examples ```bloblang root.received_at = timestamp_unix() ``` Create a sortable ID combining timestamp with a counter: ```bloblang root.id = "%v-%v".format(timestamp_unix(), batch_index()) ``` ## [](#timestamp_unix_micro)timestamp\_unix\_micro Returns the current Unix timestamp in microseconds since epoch. Use this for high-precision timing measurements or when microsecond resolution is required. ### [](#examples-31)Examples ```bloblang root.received_at = timestamp_unix_micro() ``` Measure elapsed time between events: ```bloblang root.processing_duration_us = timestamp_unix_micro() - this.start_time_us ``` ## [](#timestamp_unix_milli)timestamp\_unix\_milli Returns the current Unix timestamp in milliseconds since epoch. Use this for millisecond-precision timestamps common in web APIs and JavaScript systems. ### [](#examples-32)Examples ```bloblang root.received_at = timestamp_unix_milli() ``` Add processing time metadata: ```bloblang meta processing_time_ms = timestamp_unix_milli() ``` ## [](#timestamp_unix_nano)timestamp\_unix\_nano Returns the current Unix timestamp in nanoseconds since epoch. Use this for the highest precision timing or as a unique seed value that changes on every invocation. ### [](#examples-33)Examples ```bloblang root.received_at = timestamp_unix_nano() ``` Generate unique random values on each mapping: ```bloblang root.random_value = random_int(timestamp_unix_nano()) ``` ## [](#tracing_id)tracing\_id Returns the OpenTelemetry trace ID for the message, or an empty string if no tracing span exists. Use this to correlate logs and events with distributed traces. ### [](#examples-34)Examples ```bloblang meta trace_id = tracing_id() ``` Add trace ID to structured logs: ```bloblang root.log_entry = this root.log_entry.trace_id = tracing_id() ``` ## [](#tracing_span)tracing\_span Returns the OpenTelemetry tracing span attached to the message as a text map object, or `null` if no span exists. Use this to propagate trace context to downstream systems via headers or metadata. ### [](#examples-35)Examples ```bloblang root.headers.traceparent = tracing_span().traceparent # In: {"some_stuff":"just can't be explained by science"} # Out: {"headers":{"traceparent":"00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}} ``` Forward all tracing fields to output metadata: ```bloblang meta = tracing_span() ``` ## [](#ulid)ulid Generates a Universally Unique Lexicographically Sortable Identifier (ULID). ULIDs are 128-bit identifiers that are sortable by creation time, URL-safe, and case-insensitive. They consist of a 48-bit timestamp (millisecond precision) and 80 bits of randomness, making them ideal for distributed systems that need time-ordered unique IDs without coordination. ### [](#parameters-17)Parameters | Name | Type | Description | | --- | --- | --- | | encoding | string | Encoding format for the ULID. "crockford" produces 26-character Base32 strings (recommended). "hex" produces 32-character hexadecimal strings. | | random_source | string | Randomness source: "secure_random" uses cryptographically secure random (recommended for production), "fast_random" uses faster but non-secure random (only for non-sensitive testing). | ### [](#examples-36)Examples Generate time-sortable IDs for distributed message ordering: ```bloblang root.message_id = ulid() root.timestamp = now() root.data = this ``` Generate hex-encoded ULIDs for systems that prefer hexadecimal format: ```bloblang root.id = ulid("hex") ``` ## [](#uuid_v4)uuid\_v4 Generates a random RFC-4122 version 4 UUID. Use this for creating unique identifiers that don’t reveal timing information or require ordering. Each invocation produces a new globally unique ID. ### [](#examples-37)Examples ```bloblang root.id = uuid_v4() ``` Add unique request IDs for tracing: ```bloblang root = this root.request_id = uuid_v4() ``` ## [](#uuid_v7)uuid\_v7 Generates a time-ordered UUID version 7 with millisecond timestamp precision. Use this for sortable unique identifiers that maintain chronological ordering, ideal for database keys or event IDs. Optionally specify a custom timestamp. ### [](#parameters-18)Parameters | Name | Type | Description | | --- | --- | --- | | time (optional) | timestamp | An optional timestamp to use for the time ordered portion of the UUID. | ### [](#examples-38)Examples ```bloblang root.id = uuid_v7() ``` Generate a UUID with a specific timestamp for backdating events: ```bloblang root.id = uuid_v7(now().ts_sub_iso8601("PT1M")) ``` ## [](#var)var ### [](#parameters-19)Parameters | Name | Type | Description | | --- | --- | --- | | name | string | The name of the target variable. | ## [](#with_schema_registry_header)with\_schema\_registry\_header Prepends a Confluent Schema Registry wire format header to message bytes. The header is 5 bytes: a magic byte (0x00) followed by a 4-byte big-endian schema ID. This format is required when producing messages to Kafka topics that use Confluent Schema Registry for schema validation and evolution. ### [](#parameters-20)Parameters | Name | Type | Description | | --- | --- | --- | | schema_id | unknown | The schema ID from your Schema Registry (0 to 4294967295). This ID references the schema version used to encode the message. | | message | unknown | The serialized message bytes (e.g., Avro, Protobuf, or JSON Schema encoded data) to prepend the header to. | ### [](#examples-39)Examples Add Schema Registry header to Avro-encoded message: ```bloblang root = with_schema_registry_header(123, content()) ``` Use schema ID from metadata to add header dynamically: ```bloblang root = with_schema_registry_header(meta("schema_id").number(), content()) ``` --- # Page 334: Bloblang Methods **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/bloblang/methods.md --- # Bloblang Methods --- title: Bloblang Methods latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/bloblang/methods page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/bloblang/methods.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/bloblang/methods.adoc description: A list of Bloblang methods page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Methods provide most of the power in Bloblang as they allow you to augment values and can be added to any expression (including other methods): ```bloblang root.doc.id = this.thing.id.string().catch(uuid_v4()) root.doc.reduced_nums = this.thing.nums.map_each(num -> if num < 10 { deleted() } else { num - 10 }) root.has_good_taste = ["pikachu","mewtwo","magmar"].contains(this.user.fav_pokemon) # In: {"thing":{"id":123,"nums":[5,12,18,7,25]},"user":{"fav_pokemon":"pikachu"}} ``` Methods support both named and nameless style arguments: ```bloblang root.foo_one = this.(bar | baz).trim().replace_all(old: "dog", new: "cat") root.foo_two = this.(bar | baz).trim().replace_all("dog", "cat") # In: {"bar":" I love my dog "} ``` ## [](#general)General ### [](#apply)apply Apply a declared mapping to a target value. #### [](#parameters)Parameters | Name | Type | Description | | --- | --- | --- | | mapping | string | The mapping to apply. | #### [](#examples)Examples ```bloblang map thing { root.inner = this.first } root.foo = this.doc.apply("thing") # In: {"doc":{"first":"hello world"}} # Out: {"foo":{"inner":"hello world"}} ``` ```bloblang map create_foo { root.name = "a foo" root.purpose = "to be a foo" } root = this root.foo = null.apply("create_foo") # In: {"id":"1234"} # Out: {"foo":{"name":"a foo","purpose":"to be a foo"},"id":"1234"} ``` ### [](#catch)catch If the result of a target query fails (due to incorrect types, failed parsing, etc) the argument is returned instead. #### [](#parameters-2)Parameters | Name | Type | Description | | --- | --- | --- | | fallback | query expression | A value to yield, or query to execute, if the target query fails. | #### [](#examples-2)Examples ```bloblang root.doc.id = this.thing.id.string().catch(uuid_v4()) ``` The fallback argument can be a mapping, allowing you to capture the error string and yield structured data back: ```bloblang root.url = this.url.parse_url().catch(err -> {"error":err,"input":this.url}) # In: {"url":"invalid %&# url"} # Out: {"url":{"error":"field `this.url`: parse \"invalid %&\": invalid URL escape \"%&\"","input":"invalid %&# url"}} ``` When the input document is not structured attempting to reference structured fields with `this` will result in an error. Therefore, a convenient way to delete non-structured data is with a catch: ```bloblang root = this.catch(deleted()) # In: {"doc":{"foo":"bar"}} # Out: {"doc":{"foo":"bar"}} # In: not structured data # Out: ``` ### [](#from)from Modifies a target query such that certain functions are executed from the perspective of another message in the batch. This allows you to mutate events based on the contents of other messages. Functions that support this behavior are `content`, `json` and `meta`. #### [](#parameters-3)Parameters | Name | Type | Description | | --- | --- | --- | | index | integer | The message index to use as a perspective. | #### [](#examples-3)Examples For example, the following map extracts the contents of the JSON field `foo` specifically from message index `1` of a batch, effectively overriding the field `foo` for all messages of a batch to that of message 1: ```bloblang root = this root.foo = json("foo").from(1) ``` ### [](#from_all)from\_all Modifies a target query such that certain functions are executed from the perspective of each message in the batch, and returns the set of results as an array. Functions that support this behavior are `content`, `json` and `meta`. #### [](#examples-4)Examples ```bloblang root = this root.foo_summed = json("foo").from_all().sum() ``` ### [](#map)map Executes a query on the target value, allowing you to transform or extract data from the current context. #### [](#parameters-4)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query to execute on the target. | ### [](#not)not Returns the logical NOT (negation) of a boolean value. Converts true to false and false to true. ### [](#or)or If the result of the target query fails or resolves to `null`, returns the argument instead. This is an explicit method alternative to the coalesce pipe operator `|`. #### [](#parameters-5)Parameters | Name | Type | Description | | --- | --- | --- | | fallback | query expression | A value to yield, or query to execute, if the target query fails or resolves to null. | #### [](#examples-5)Examples ```bloblang root.doc.id = this.thing.id.or(uuid_v4()) ``` ## [](#encoding-and-encryption)Encoding and encryption ### [](#compress)compress Compresses a string or byte array using the specified compression algorithm. Returns compressed data as bytes. Useful for reducing payload size before transmission or storage. #### [](#parameters-6)Parameters | Name | Type | Description | | --- | --- | --- | | algorithm | string | The compression algorithm: flate, gzip, pgzip (parallel gzip), lz4, snappy, zlib, or zstd. | | level | integer | Compression level (default: -1 for default compression). Higher values increase compression ratio but use more CPU. Range and effect varies by algorithm. | #### [](#examples-6)Examples Compress and encode for safe transmission: ```bloblang root.compressed = content().bytes().compress("gzip").encode("base64") # In: {"message":"hello world I love space"} # Out: {"compressed":"H4sIAAAJbogA/wAmANn/eyJtZXNzYWdlIjoiaGVsbG8gd29ybGQgSSBsb3ZlIHNwYWNlIn0DAHEvdwomAAAA"} ``` Compare compression ratios across algorithms: ```bloblang root.original_size = content().length() root.gzip_size = content().compress("gzip").length() root.lz4_size = content().compress("lz4").length() # In: The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. # Out: {"gzip_size":114,"lz4_size":85,"original_size":89} ``` ### [](#decode)decode Decodes an encoded string according to a chosen scheme. #### [](#parameters-7)Parameters | Name | Type | Description | | --- | --- | --- | | scheme | string | The decoding scheme to use. | #### [](#examples-7)Examples ```bloblang root.decoded = this.value.decode("hex").string() # In: {"value":"68656c6c6f20776f726c64"} # Out: {"decoded":"hello world"} ``` ```bloblang root = this.encoded.decode("ascii85") # In: {"encoded":"FD,B0+DGm>FDl80Ci\"A>F`)8BEckl6F`M&(+Cno&@/"} # Out: this is totally unstructured data ``` ### [](#decompress)decompress Decompresses a byte array using the specified decompression algorithm. Returns decompressed data as bytes. Use with data that was previously compressed using the corresponding algorithm. #### [](#parameters-8)Parameters | Name | Type | Description | | --- | --- | --- | | algorithm | string | The decompression algorithm: gzip, pgzip (parallel gzip), zlib, bzip2, flate, snappy, lz4, or zstd. | #### [](#examples-8)Examples Decompress base64-encoded compressed data: ```bloblang root = this.compressed.decode("base64").decompress("gzip") # In: {"compressed":"H4sIAN12MWkAA8tIzcnJVyjPL8pJUfBUyMkvS1UoLkhMTgUAQpDxbxgAAAA="} # Out: hello world I love space ``` Convert decompressed bytes to string for JSON output: ```bloblang root.message = this.compressed.decode("base64").decompress("gzip").string() # In: {"compressed":"H4sIAN12MWkAA8tIzcnJVyjPL8pJUfBUyMkvS1UoLkhMTgUAQpDxbxgAAAA="} # Out: {"message":"hello world I love space"} ``` ### [](#decrypt_aes)decrypt\_aes Decrypts an AES-encrypted string or byte array. #### [](#parameters-9)Parameters | Name | Type | Description | | --- | --- | --- | | scheme | string | The scheme to use for decryption, one of ctr, gcm, ofb, cbc. | | key | string | A key to decrypt with. | | iv | string | An initialization vector / nonce. | #### [](#examples-9)Examples ```bloblang let key = "2b7e151628aed2a6abf7158809cf4f3c".decode("hex") let vector = "f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff".decode("hex") root.decrypted = this.value.decode("hex").decrypt_aes("ctr", $key, $vector).string() # In: {"value":"84e9b31ff7400bdf80be7254"} # Out: {"decrypted":"hello world!"} ``` ### [](#encode)encode Encodes a string or byte array according to a chosen scheme. #### [](#parameters-10)Parameters | Name | Type | Description | | --- | --- | --- | | scheme | string | The encoding scheme to use. | #### [](#examples-10)Examples ```bloblang root.encoded = this.value.encode("hex") # In: {"value":"hello world"} # Out: {"encoded":"68656c6c6f20776f726c64"} ``` ```bloblang root.encoded = content().encode("ascii85") # In: this is totally unstructured data # Out: {"encoded":"FD,B0+DGm>FDl80Ci\"A>F`)8BEckl6F`M&(+Cno&@/"} ``` ### [](#encrypt_aes)encrypt\_aes Encrypts a string or byte array using AES encryption. #### [](#parameters-11)Parameters | Name | Type | Description | | --- | --- | --- | | scheme | string | The scheme to use for encryption, one of ctr, gcm, ofb, cbc. | | key | string | A key to encrypt with. | | iv | string | An initialization vector / nonce. | #### [](#examples-11)Examples ```bloblang let key = "2b7e151628aed2a6abf7158809cf4f3c".decode("hex") let vector = "f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff".decode("hex") root.encrypted = this.value.encrypt_aes("ctr", $key, $vector).encode("hex") # In: {"value":"hello world!"} # Out: {"encrypted":"84e9b31ff7400bdf80be7254"} ``` ### [](#hash)hash Hashes a string or byte array using a specified algorithm. #### [](#parameters-12)Parameters | Name | Type | Description | | --- | --- | --- | | algorithm | string | The hashing algorithm to use. | | key (optional) | string | An optional key to use. | | polynomial | string | An optional polynomial key to use when selecting the crc32 algorithm, otherwise ignored. Options are IEEE (default), Castagnoli and Koopman | #### [](#examples-12)Examples ```bloblang root.h1 = this.value.hash("sha1").encode("hex") root.h2 = this.value.hash("hmac_sha1","static-key").encode("hex") # In: {"value":"hello world"} # Out: {"h1":"2aae6c35c94fcfb415dbe95f408b9ce91ee846ed","h2":"d87e5f068fa08fe90bb95bc7c8344cb809179d76"} ``` The `crc32` algorithm supports options for the polynomial: ```bloblang root.h1 = this.value.hash(algorithm: "crc32", polynomial: "Castagnoli").encode("hex") root.h2 = this.value.hash(algorithm: "crc32", polynomial: "Koopman").encode("hex") # In: {"value":"hello world"} # Out: {"h1":"c99465aa","h2":"df373d3c"} ``` ### [](#uuid_v5)uuid\_v5 Generates a version 5 UUID from a namespace and name. #### [](#parameters-13)Parameters | Name | Type | Description | | --- | --- | --- | | ns (optional) | string | An optional namespace name or UUID. It supports the dns, url, oid and x500 predefined namespaces and any valid RFC-9562 UUID. If empty, the nil UUID will be used. | #### [](#examples-13)Examples ```bloblang root.id = "example".uuid_v5() ``` ```bloblang root.id = "example".uuid_v5("x500") ``` ```bloblang root.id = "example".uuid_v5("77f836b7-9f61-46c0-851e-9b6ca3535e69") ``` ## [](#geoip)GeoIP ### [](#geoip_anonymous_ip)geoip\_anonymous\_ip Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the anonymous IP associated with it. #### [](#parameters-14)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_asn)geoip\_asn Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the ASN associated with it. #### [](#parameters-15)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_city)geoip\_city Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the city associated with it. #### [](#parameters-16)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_connection_type)geoip\_connection\_type Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the connection type associated with it. #### [](#parameters-17)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_country)geoip\_country Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the country associated with it. #### [](#parameters-18)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_domain)geoip\_domain Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the domain associated with it. #### [](#parameters-19)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_enterprise)geoip\_enterprise Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the enterprise associated with it. #### [](#parameters-20)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ### [](#geoip_isp)geoip\_isp Looks up an IP address against a [MaxMind database file](https://www.maxmind.com/en/home) and, if found, returns an object describing the ISP associated with it. #### [](#parameters-21)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A path to an mmdb (maxmind) file. | ## [](#json-web-tokens)JSON web tokens ### [](#parse_jwt_es256)parse\_jwt\_es256 Parses a claims object from a JWT string encoded with ES256. This method does not validate JWT claims. #### [](#parameters-22)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The ES256 secret that was used for signing the token. | #### [](#examples-14)Examples ```bloblang root.claims = this.signed.parse_jwt_es256("""-----BEGIN PUBLIC KEY----- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEGtLqIBePHmIhQcf0JLgc+F/4W/oI dp0Gta53G35VerNDgUUXmp78J2kfh4qLdh0XtmOMI587tCaqjvDAXfs//w== -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.GIRajP9JJbpTlqSCdNEz4qpQkRvzX4Q51YnTwVyxLDM9tKjR_a8ggHWn9CWj7KG0x8J56OWtmUxn112SRTZVhQ"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_es384)parse\_jwt\_es384 Parses a claims object from a JWT string encoded with ES384. This method does not validate JWT claims. #### [](#parameters-23)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The ES384 secret that was used for signing the token. | #### [](#examples-15)Examples ```bloblang root.claims = this.signed.parse_jwt_es384("""-----BEGIN PUBLIC KEY----- MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAERoz74/B6SwmLhs8X7CWhnrWyRrB13AuU 8OYeqy0qHRu9JWNw8NIavqpTmu6XPT4xcFanYjq8FbeuM11eq06C52mNmS4LLwzA 2imlFEgn85bvJoC3bnkuq4mQjwt9VxdH -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJFUzM4NCIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.H2HBSlrvQBaov2tdreGonbBexxtQB-xzaPL4-tNQZ6TVh7VH8VBcSwcWHYa1lBAHmdsKOFcB2Wk0SB7QWeGT3ptSgr-_EhDMaZ8bA5spgdpq5DsKfaKHrd7DbbQlmxNq"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_es512)parse\_jwt\_es512 Parses a claims object from a JWT string encoded with ES512. This method does not validate JWT claims. #### [](#parameters-24)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The ES512 secret that was used for signing the token. | #### [](#examples-16)Examples ```bloblang root.claims = this.signed.parse_jwt_es512("""-----BEGIN PUBLIC KEY----- MIGbMBAGByqGSM49AgEGBSuBBAAjA4GGAAQAkHLdts9P56fFkyhpYQ31M/Stwt3w vpaxhlfudxnXgTO1IP4RQRgryRxZ19EUzhvWDcG3GQIckoNMY5PelsnCGnIBT2Xh 9NQkjWF5K6xS4upFsbGSAwQ+GIyyk5IPJ2LHgOyMSCVh5gRZXV3CZLzXujx/umC9 UeYyTt05zRRWuD+p5bY= -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.ACrpLuU7TKpAnncDCpN9m85nkL55MJ45NFOBl6-nEXmNT1eIxWjiP4pwWVbFH9et_BgN14119jbL_KqEJInPYc9nAXC6dDLq0aBU-dalvNl4-O5YWpP43-Y-TBGAsWnbMTrchILJ4-AEiICe73Ck5yWPleKg9c3LtkEFWfGs7BoPRguZ"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_hs256)parse\_jwt\_hs256 Parses a claims object from a JWT string encoded with HS256. This method does not validate JWT claims. #### [](#parameters-25)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The HS256 secret that was used for signing the token. | #### [](#examples-17)Examples ```bloblang root.claims = this.signed.parse_jwt_hs256("""dont-tell-anyone""") # In: {"signed":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.YwXOM8v3gHVWcQRRRQc_zDlhmLnM62fwhFYGpiA0J1A"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_hs384)parse\_jwt\_hs384 Parses a claims object from a JWT string encoded with HS384. This method does not validate JWT claims. #### [](#parameters-26)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The HS384 secret that was used for signing the token. | #### [](#examples-18)Examples ```bloblang root.claims = this.signed.parse_jwt_hs384("""dont-tell-anyone""") # In: {"signed":"eyJhbGciOiJIUzM4NCIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.2Y8rf_ijwN4t8hOGGViON_GrirLkCQVbCOuax6EoZ3nluX0tCGezcJxbctlIfsQ2"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_hs512)parse\_jwt\_hs512 Parses a claims object from a JWT string encoded with HS512. This method does not validate JWT claims. #### [](#parameters-27)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The HS512 secret that was used for signing the token. | #### [](#examples-19)Examples ```bloblang root.claims = this.signed.parse_jwt_hs512("""dont-tell-anyone""") # In: {"signed":"eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.utRb0urG6LGGyranZJVo5Dk0Fns1QNcSUYPN0TObQ-YzsGGB8jrxHwM5NAJccjJZzKectEUqmmKCaETZvuX4Fg"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_rs256)parse\_jwt\_rs256 Parses a claims object from a JWT string encoded with RS256. This method does not validate JWT claims. #### [](#parameters-28)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The RS256 secret that was used for signing the token. | #### [](#examples-20)Examples ```bloblang root.claims = this.signed.parse_jwt_rs256("""-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAs/ibN8r68pLMR6gRzg4S 8v8l6Q7yi8qURjkEbcNeM1rkokC7xh0I4JVTwxYSVv/JIW8qJdyspl5NIfuAVi32 WfKvSAs+NIs+DMsNPYw3yuQals4AX8hith1YDvYpr8SD44jxhz/DR9lYKZFGhXGB +7NqQ7vpTWp3BceLYocazWJgusZt7CgecIq57ycM5hjM93BvlrUJ8nQ1a46wfL/8 Cy4P0et70hzZrsjjN41KFhKY0iUwlyU41yEiDHvHDDsTMBxAZosWjSREGfJL6Mfp XOInTHs/Gg6DZMkbxjQu6L06EdJ+Q/NwglJdAXM7Zo9rNELqRig6DdvG5JesdMsO +QIDAQAB -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.b0lH3jEupZZ4zoaly4Y_GCvu94HH6UKdKY96zfGNsIkPZpQLHIkZ7jMWlLlNOAd8qXlsBGP_i8H2qCKI4zlWJBGyPZgxXDzNRPVrTDfFpn4t4nBcA1WK2-ntXP3ehQxsaHcQU8Z_nsogId7Pme5iJRnoHWEnWtbwz5DLSXL3ZZNnRdrHM9MdI7QSDz9mojKDCaMpGN9sG7Xl-tGdBp1XzXuUOzG8S03mtZ1IgVR1uiBL2N6oohHIAunk8DIAmNWI-zgycTgzUGU7mvPkKH43qO8Ua1-13tCUBKKa8VxcotZ67Mxm1QAvBGoDnTKwWMwghLzs6d6WViXQg6eWlJcpBA"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_rs384)parse\_jwt\_rs384 Parses a claims object from a JWT string encoded with RS384. This method does not validate JWT claims. #### [](#parameters-29)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The RS384 secret that was used for signing the token. | #### [](#examples-21)Examples ```bloblang root.claims = this.signed.parse_jwt_rs384("""-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAs/ibN8r68pLMR6gRzg4S 8v8l6Q7yi8qURjkEbcNeM1rkokC7xh0I4JVTwxYSVv/JIW8qJdyspl5NIfuAVi32 WfKvSAs+NIs+DMsNPYw3yuQals4AX8hith1YDvYpr8SD44jxhz/DR9lYKZFGhXGB +7NqQ7vpTWp3BceLYocazWJgusZt7CgecIq57ycM5hjM93BvlrUJ8nQ1a46wfL/8 Cy4P0et70hzZrsjjN41KFhKY0iUwlyU41yEiDHvHDDsTMBxAZosWjSREGfJL6Mfp XOInTHs/Gg6DZMkbxjQu6L06EdJ+Q/NwglJdAXM7Zo9rNELqRig6DdvG5JesdMsO +QIDAQAB -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.orcXYBcjVE5DU7mvq4KKWFfNdXR4nEY_xupzWoETRpYmQZIozlZnM_nHxEk2dySvpXlAzVm7kgOPK2RFtGlOVaNRIa3x-pMMr-bhZTno4L8Hl4sYxOks3bWtjK7wql4uqUbqThSJB12psAXw2-S-I_FMngOPGIn4jDT9b802ottJSvTpXcy0-eKTjrV2PSkRRu-EYJh0CJZW55MNhqlt6kCGhAXfbhNazN3ASX-dmpd_JixyBKphrngr_zRA-FCn_Xf3QQDA-5INopb4Yp5QiJ7UxVqQEKI80X_JvJqz9WE1qiAw8pq5-xTen1t7zTP-HT1NbbD3kltcNa3G8acmNg"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#parse_jwt_rs512)parse\_jwt\_rs512 Parses a claims object from a JWT string encoded with RS512. This method does not validate JWT claims. #### [](#parameters-30)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The RS512 secret that was used for signing the token. | #### [](#examples-22)Examples ```bloblang root.claims = this.signed.parse_jwt_rs512("""-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAs/ibN8r68pLMR6gRzg4S 8v8l6Q7yi8qURjkEbcNeM1rkokC7xh0I4JVTwxYSVv/JIW8qJdyspl5NIfuAVi32 WfKvSAs+NIs+DMsNPYw3yuQals4AX8hith1YDvYpr8SD44jxhz/DR9lYKZFGhXGB +7NqQ7vpTWp3BceLYocazWJgusZt7CgecIq57ycM5hjM93BvlrUJ8nQ1a46wfL/8 Cy4P0et70hzZrsjjN41KFhKY0iUwlyU41yEiDHvHDDsTMBxAZosWjSREGfJL6Mfp XOInTHs/Gg6DZMkbxjQu6L06EdJ+Q/NwglJdAXM7Zo9rNELqRig6DdvG5JesdMsO +QIDAQAB -----END PUBLIC KEY-----""") # In: {"signed":"eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.rsMp_X5HMrUqKnZJIxo27aAoscovRA6SSQYR9rq7pifIj0YHXxMyNyOBDGnvVALHKTi25VUGHpfNUW0VVMmae0A4t_ObNU6hVZHguWvetKZZq4FZpW1lgWHCMqgPGwT5_uOqwYCH6r8tJuZT3pqXeL0CY4putb1AN2w6CVp620nh3l8d3XWb4jaifycd_4CEVCqHuWDmohfug4VhmoVKlIXZkYoAQowgHlozATDssBSWdYtv107Wd2AzEoiXPu6e3pflsuXULlyqQnS4ELEKPYThFLafh1NqvZDPddqozcPZ-iODBW-xf3A4DYDdivnMYLrh73AZOGHexxu8ay6nDA"} # Out: {"claims":{"iat":1516239022,"mood":"Disdainful","sub":"1234567890"}} ``` ### [](#sign_jwt_es256)sign\_jwt\_es256 Hash and sign an object representing JSON Web Token (JWT) claims using ES256. #### [](#parameters-31)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-23)Examples ```bloblang root.signed = this.claims.sign_jwt_es256("""-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.-8LrOdkEiv_44ADWW08lpbq41ZmHCel58NMORPq1q4Dyw0zFhqDVLrRoSvCvuyyvgXAFb9IHfR-9MlJ_2ShA9A"} ``` ```bloblang root.signed = this.claims.sign_jwt_es256(signing_secret: """-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_es384)sign\_jwt\_es384 Hash and sign an object representing JSON Web Token (JWT) claims using ES384. #### [](#parameters-32)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-24)Examples ```bloblang root.signed = this.claims.sign_jwt_es384("""-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJFUzM4NCIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIn0.8FmTKH08dl7dyxrNu0rmvhegiIBCy-O9cddGco2e9lpZtgv5mS5qHgPkgBC5eRw1d7SRJsHwHZeehzdqT5Ba7aZJIhz9ds0sn37YQ60L7jT0j2gxCzccrt4kECHnUnLw"} ``` ```bloblang root.signed = this.claims.sign_jwt_es384(signing_secret: """-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_es512)sign\_jwt\_es512 Hash and sign an object representing JSON Web Token (JWT) claims using ES512. #### [](#parameters-33)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-25)Examples ```bloblang root.signed = this.claims.sign_jwt_es512("""-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIn0.AQbEWymoRZxDJEJtKSFFG2k2VbDCTYSuBwAZyMqexCspr3If8aERTVGif8HXG3S7TzMBCCzxkcKr3eIU441l3DlpAMNfQbkcOlBqMvNBn-CX481WyKf3K5rFHQ-6wRonz05aIsWAxCDvAozI_9J0OWllxdQ2MBAuTPbPJ38OqXsYkCQs"} ``` ```bloblang root.signed = this.claims.sign_jwt_es512(signing_secret: """-----BEGIN EC PRIVATE KEY----- ... signature data ... -----END EC PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_hs256)sign\_jwt\_hs256 Hash and sign an object representing JSON Web Token (JWT) claims using HS256. #### [](#parameters-34)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-26)Examples ```bloblang root.signed = this.claims.sign_jwt_hs256("""dont-tell-anyone""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIn0.hUl-nngPMY_3h9vveWJUPsCcO5PeL6k9hWLnMYeFbFQ"} ``` ```bloblang root.signed = this.claims.sign_jwt_hs256(signing_secret: """dont-tell-anyone""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_hs384)sign\_jwt\_hs384 Hash and sign an object representing JSON Web Token (JWT) claims using HS384. #### [](#parameters-35)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-27)Examples ```bloblang root.signed = this.claims.sign_jwt_hs384("""dont-tell-anyone""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJIUzM4NCIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIn0.zGYLr83aToon1efUNq-hw7XgT20lPvZb8sYei8x6S6mpHwb433SJdXJXx0Oio8AZ"} ``` ```bloblang root.signed = this.claims.sign_jwt_hs384(signing_secret: """dont-tell-anyone""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_hs512)sign\_jwt\_hs512 Hash and sign an object representing JSON Web Token (JWT) claims using HS512. #### [](#parameters-36)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-28)Examples ```bloblang root.signed = this.claims.sign_jwt_hs512("""dont-tell-anyone""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIn0.zBNR9o_6EDwXXKkpKLNJhG26j8Dc-mV-YahBwmEdCrmiWt5les8I9rgmNlWIowpq6Yxs4kLNAdFhqoRz3NXT3w"} ``` ```bloblang root.signed = this.claims.sign_jwt_hs512(signing_secret: """dont-tell-anyone""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_rs256)sign\_jwt\_rs256 Hash and sign an object representing JSON Web Token (JWT) claims using RS256. #### [](#parameters-37)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-29)Examples ```bloblang root.signed = this.claims.sign_jwt_rs256("""-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.b0lH3jEupZZ4zoaly4Y_GCvu94HH6UKdKY96zfGNsIkPZpQLHIkZ7jMWlLlNOAd8qXlsBGP_i8H2qCKI4zlWJBGyPZgxXDzNRPVrTDfFpn4t4nBcA1WK2-ntXP3ehQxsaHcQU8Z_nsogId7Pme5iJRnoHWEnWtbwz5DLSXL3ZZNnRdrHM9MdI7QSDz9mojKDCaMpGN9sG7Xl-tGdBp1XzXuUOzG8S03mtZ1IgVR1uiBL2N6oohHIAunk8DIAmNWI-zgycTgzUGU7mvPkKH43qO8Ua1-13tCUBKKa8VxcotZ67Mxm1QAvBGoDnTKwWMwghLzs6d6WViXQg6eWlJcpBA"} ``` ```bloblang root.signed = this.claims.sign_jwt_rs256(signing_secret: """-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_rs384)sign\_jwt\_rs384 Hash and sign an object representing JSON Web Token (JWT) claims using RS384. #### [](#parameters-38)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-30)Examples ```bloblang root.signed = this.claims.sign_jwt_rs384("""-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.orcXYBcjVE5DU7mvq4KKWFfNdXR4nEY_xupzWoETRpYmQZIozlZnM_nHxEk2dySvpXlAzVm7kgOPK2RFtGlOVaNRIa3x-pMMr-bhZTno4L8Hl4sYxOks3bWtjK7wql4uqUbqThSJB12psAXw2-S-I_FMngOPGIn4jDT9b802ottJSvTpXcy0-eKTjrV2PSkRRu-EYJh0CJZW55MNhqlt6kCGhAXfbhNazN3ASX-dmpd_JixyBKphrngr_zRA-FCn_Xf3QQDA-5INopb4Yp5QiJ7UxVqQEKI80X_JvJqz9WE1qiAw8pq5-xTen1t7zTP-HT1NbbD3kltcNa3G8acmNg"} ``` ```bloblang root.signed = this.claims.sign_jwt_rs384(signing_secret: """-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ### [](#sign_jwt_rs512)sign\_jwt\_rs512 Hash and sign an object representing JSON Web Token (JWT) claims using RS512. #### [](#parameters-39)Parameters | Name | Type | Description | | --- | --- | --- | | signing_secret | string | The secret to use for signing the token. | | headers (optional) | unknown | Optional object of JWT header fields to include in the token. Keys "alg", "typ", "jku", "jwk", "x5u", "x5c", "x5t","x5t#S256" and "crit" will be ignored if provided. | #### [](#examples-31)Examples ```bloblang root.signed = this.claims.sign_jwt_rs512("""-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""") # In: {"claims":{"sub":"user123"}} # Out: {"signed":"eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsIm1vb2QiOiJEaXNkYWluZnVsIiwic3ViIjoiMTIzNDU2Nzg5MCJ9.rsMp_X5HMrUqKnZJIxo27aAoscovRA6SSQYR9rq7pifIj0YHXxMyNyOBDGnvVALHKTi25VUGHpfNUW0VVMmae0A4t_ObNU6hVZHguWvetKZZq4FZpW1lgWHCMqgPGwT5_uOqwYCH6r8tJuZT3pqXeL0CY4putb1AN2w6CVp620nh3l8d3XWb4jaifycd_4CEVCqHuWDmohfug4VhmoVKlIXZkYoAQowgHlozATDssBSWdYtv107Wd2AzEoiXPu6e3pflsuXULlyqQnS4ELEKPYThFLafh1NqvZDPddqozcPZ-iODBW-xf3A4DYDdivnMYLrh73AZOGHexxu8ay6nDA"} ``` ```bloblang root.signed = this.claims.sign_jwt_rs512(signing_secret: """-----BEGIN RSA PRIVATE KEY----- ... signature data ... -----END RSA PRIVATE KEY-----""", headers: {"kid": "my-key", "x": "y"}) # In: {"claims":{"sub":"user123"}} # Out: {"signed":""} ``` ## [](#number-manipulation)Number manipulation ### [](#abs)abs Returns the absolute value of an int64 or float64 number. As a special case, when an integer is provided that is the minimum value it is converted to the maximum value. #### [](#examples-32)Examples ```bloblang root.outs = this.ins.map_each(ele -> ele.abs()) # In: {"ins":[9,-18,1.23,-4.56]} # Out: {"outs":[9,18,1.23,4.56]} ``` ### [](#bitwise_and)bitwise\_and Performs a bitwise AND operation between the integer and the specified value. #### [](#parameters-40)Parameters | Name | Type | Description | | --- | --- | --- | | value | integer | The value to AND with | #### [](#examples-33)Examples ```bloblang root.new_value = this.value.bitwise_and(6) # In: {"value":12} # Out: {"new_value":4} ``` ```bloblang root.masked = this.flags.bitwise_and(15) # In: {"flags":127} # Out: {"masked":15} ``` ### [](#bitwise_or)bitwise\_or Performs a bitwise OR operation between the integer and the specified value. #### [](#parameters-41)Parameters | Name | Type | Description | | --- | --- | --- | | value | integer | The value to OR with | #### [](#examples-34)Examples ```bloblang root.new_value = this.value.bitwise_or(6) # In: {"value":12} # Out: {"new_value":14} ``` ```bloblang root.combined = this.flags.bitwise_or(8) # In: {"flags":4} # Out: {"combined":12} ``` ### [](#bitwise_xor)bitwise\_xor Performs a bitwise XOR (exclusive OR) operation between the integer and the specified value. #### [](#parameters-42)Parameters | Name | Type | Description | | --- | --- | --- | | value | integer | The value to XOR with | #### [](#examples-35)Examples ```bloblang root.new_value = this.value.bitwise_xor(6) # In: {"value":12} # Out: {"new_value":10} ``` ```bloblang root.toggled = this.flags.bitwise_xor(5) # In: {"flags":3} # Out: {"toggled":6} ``` ### [](#ceil)ceil Rounds a number up to the nearest integer. Returns an integer if the result fits in 64-bit, otherwise returns a float. #### [](#examples-36)Examples ```bloblang root.new_value = this.value.ceil() # In: {"value":5.3} # Out: {"new_value":6} # In: {"value":-5.9} # Out: {"new_value":-5} ``` ```bloblang root.result = this.price.ceil() # In: {"price":19.99} # Out: {"result":20} ``` ### [](#cos)cos Calculates the cosine of a given angle specified in radians. #### [](#examples-37)Examples ```bloblang root.new_value = (this.value * (pi() / 180)).cos() # In: {"value":45} # Out: {"new_value":0.7071067811865476} # In: {"value":0} # Out: {"new_value":1} # In: {"value":180} # Out: {"new_value":-1} ``` ### [](#float32)float32 Converts a numerical type into a 32-bit floating point number, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 32-bit floating point number. Please refer to the [`strconv.ParseFloat` documentation](https://pkg.go.dev/strconv#ParseFloat) for details regarding the supported formats. #### [](#examples-38)Examples ```bloblang root.out = this.in.float32() # In: {"in":"6.674282313423543523453425345e-11"} # Out: {"out":6.674283e-11} ``` ### [](#float64)float64 Converts a numerical type into a 64-bit floating point number, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 64-bit floating point number. Please refer to the [`strconv.ParseFloat` documentation](https://pkg.go.dev/strconv#ParseFloat) for details regarding the supported formats. #### [](#examples-39)Examples ```bloblang root.out = this.in.float64() # In: {"in":"6.674282313423543523453425345e-11"} # Out: {"out":6.674282313423544e-11} ``` ### [](#floor)floor Rounds a number down to the nearest integer. Returns an integer if the result fits in 64-bit, otherwise returns a float. #### [](#examples-40)Examples ```bloblang root.new_value = this.value.floor() # In: {"value":5.7} # Out: {"new_value":5} # In: {"value":-3.2} # Out: {"new_value":-4} ``` ```bloblang root.whole_seconds = this.duration_seconds.floor() # In: {"duration_seconds":12.345} # Out: {"whole_seconds":12} ``` ### [](#int16)int16 Converts a numerical type into a 16-bit signed integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 16-bit signed integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-41)Examples ```bloblang root.a = this.a.int16() root.b = this.b.round().int16() root.c = this.c.int16() root.d = this.d.int16().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":-12} ``` ```bloblang root = this.int16() # In: "0xDE" # Out: 222 ``` ### [](#int32)int32 Converts a numerical type into a 32-bit signed integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 32-bit signed integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-42)Examples ```bloblang root.a = this.a.int32() root.b = this.b.round().int32() root.c = this.c.int32() root.d = this.d.int32().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":-12} ``` ```bloblang root = this.int32() # In: "0xDEAD" # Out: 57005 ``` ### [](#int64)int64 Converts a numerical type into a 64-bit signed integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 64-bit signed integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-43)Examples ```bloblang root.a = this.a.int64() root.b = this.b.round().int64() root.c = this.c.int64() root.d = this.d.int64().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":-12} ``` ```bloblang root = this.int64() # In: "0xDEADBEEF" # Out: 3735928559 ``` ### [](#int8)int8 Converts a numerical type into a 8-bit signed integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 8-bit signed integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-44)Examples ```bloblang root.a = this.a.int8() root.b = this.b.round().int8() root.c = this.c.int8() root.d = this.d.int8().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":-12} ``` ```bloblang root = this.int8() # In: "0xD" # Out: 13 ``` ### [](#log)log Calculates the natural logarithm (base e) of a number. #### [](#examples-45)Examples ```bloblang root.new_value = this.value.log().round() # In: {"value":1} # Out: {"new_value":0} # In: {"value":2.7183} # Out: {"new_value":1} ``` ```bloblang root.ln_result = this.number.log() # In: {"number":10} # Out: {"ln_result":2.302585092994046} ``` ### [](#log10)log10 Calculates the base-10 logarithm of a number. #### [](#examples-46)Examples ```bloblang root.new_value = this.value.log10() # In: {"value":100} # Out: {"new_value":2} # In: {"value":1000} # Out: {"new_value":3} ``` ```bloblang root.log_value = this.magnitude.log10() # In: {"magnitude":10000} # Out: {"log_value":4} ``` ### [](#max)max Returns the largest number from an array. All elements must be numbers and the array cannot be empty. #### [](#examples-47)Examples ```bloblang root.biggest = this.values.max() # In: {"values":[0,3,2.5,7,5]} # Out: {"biggest":7} ``` ```bloblang root.highest_temp = this.temperatures.max() # In: {"temperatures":[20.5,22.1,19.8,23.4]} # Out: {"highest_temp":23.4} ``` ### [](#min)min Returns the smallest number from an array. All elements must be numbers and the array cannot be empty. #### [](#examples-48)Examples ```bloblang root.smallest = this.values.min() # In: {"values":[0,3,-2.5,7,5]} # Out: {"smallest":-2.5} ``` ```bloblang root.lowest_temp = this.temperatures.min() # In: {"temperatures":[20.5,22.1,19.8,23.4]} # Out: {"lowest_temp":19.8} ``` ### [](#pow)pow Returns the number raised to the specified exponent. #### [](#parameters-43)Parameters | Name | Type | Description | | --- | --- | --- | | exponent | float | The exponent you want to raise to the power of. | #### [](#examples-49)Examples ```bloblang root.new_value = this.value * 10.pow(-2) # In: {"value":2} # Out: {"new_value":0.02} ``` ```bloblang root.new_value = this.value.pow(-2) # In: {"value":2} # Out: {"new_value":0.25} ``` ### [](#round)round Rounds a number to the nearest integer. Values at .5 round away from zero. Returns an integer if the result fits in 64-bit, otherwise returns a float. #### [](#examples-50)Examples ```bloblang root.new_value = this.value.round() # In: {"value":5.3} # Out: {"new_value":5} # In: {"value":5.9} # Out: {"new_value":6} ``` ```bloblang root.rounded = this.score.round() # In: {"score":87.5} # Out: {"rounded":88} ``` ### [](#sin)sin Calculates the sine of a given angle specified in radians. #### [](#examples-51)Examples ```bloblang root.new_value = (this.value * (pi() / 180)).sin() # In: {"value":45} # Out: {"new_value":0.7071067811865475} # In: {"value":0} # Out: {"new_value":0} # In: {"value":90} # Out: {"new_value":1} ``` ### [](#tan)tan Calculates the tangent of a given angle specified in radians. #### [](#examples-52)Examples ```bloblang root.new_value = "%f".format((this.value * (pi() / 180)).tan()) # In: {"value":0} # Out: {"new_value":"0.000000"} # In: {"value":45} # Out: {"new_value":"1.000000"} # In: {"value":180} # Out: {"new_value":"-0.000000"} ``` ### [](#uint16)uint16 Converts a numerical type into a 16-bit unsigned integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 16-bit unsigned integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-53)Examples ```bloblang root.a = this.a.uint16() root.b = this.b.round().uint16() root.c = this.c.uint16() root.d = this.d.uint16().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":0} ``` ```bloblang root = this.uint16() # In: "0xDE" # Out: 222 ``` ### [](#uint32)uint32 Converts a numerical type into a 32-bit unsigned integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 32-bit unsigned integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-54)Examples ```bloblang root.a = this.a.uint32() root.b = this.b.round().uint32() root.c = this.c.uint32() root.d = this.d.uint32().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":0} ``` ```bloblang root = this.uint32() # In: "0xDEAD" # Out: 57005 ``` ### [](#uint64)uint64 Converts a numerical type into a 64-bit unsigned integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 64-bit unsigned integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-55)Examples ```bloblang root.a = this.a.uint64() root.b = this.b.round().uint64() root.c = this.c.uint64() root.d = this.d.uint64().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":0} ``` ```bloblang root = this.uint64() # In: "0xDEADBEEF" # Out: 3735928559 ``` ### [](#uint8)uint8 Converts a numerical type into a 8-bit unsigned integer, this is for advanced use cases where a specific data type is needed for a given component (such as the ClickHouse SQL driver). If the value is a string then an attempt will be made to parse it as a 8-bit unsigned integer. If the target value exceeds the capacity of an integer or contains decimal values then this method will throw an error. In order to convert a floating point number containing decimals first use [`.round()`](#round) on the value. Please refer to the [`strconv.ParseInt` documentation](https://pkg.go.dev/strconv#ParseInt) for details regarding the supported formats. #### [](#examples-56)Examples ```bloblang root.a = this.a.uint8() root.b = this.b.round().uint8() root.c = this.c.uint8() root.d = this.d.uint8().catch(0) # In: {"a":12,"b":12.34,"c":"12","d":-12} # Out: {"a":12,"b":12,"c":12,"d":0} ``` ```bloblang root = this.uint8() # In: "0xD" # Out: 13 ``` ## [](#object-array-manipulation)Object & array manipulation ### [](#all)all Tests whether all elements in an array satisfy a condition. Returns true only if the query evaluates to true for every element. Returns false for empty arrays. #### [](#parameters-44)Parameters | Name | Type | Description | | --- | --- | --- | | test | query expression | A test query to apply to each element. | #### [](#examples-57)Examples ```bloblang root.all_over_21 = this.patrons.all(patron -> patron.age >= 21) # In: {"patrons":[{"id":"1","age":18},{"id":"2","age":23}]} # Out: {"all_over_21":false} # In: {"patrons":[{"id":"1","age":45},{"id":"2","age":23}]} # Out: {"all_over_21":true} ``` ```bloblang root.all_positive = this.values.all(v -> v > 0) # In: {"values":[1,2,3,4,5]} # Out: {"all_positive":true} # In: {"values":[1,-2,3,4,5]} # Out: {"all_positive":false} ``` ### [](#any)any Tests whether at least one element in an array satisfies a condition. Returns true if the query evaluates to true for any element. Returns false for empty arrays. #### [](#parameters-45)Parameters | Name | Type | Description | | --- | --- | --- | | test | query expression | A test query to apply to each element. | #### [](#examples-58)Examples ```bloblang root.any_over_21 = this.patrons.any(patron -> patron.age >= 21) # In: {"patrons":[{"id":"1","age":18},{"id":"2","age":23}]} # Out: {"any_over_21":true} # In: {"patrons":[{"id":"1","age":10},{"id":"2","age":12}]} # Out: {"any_over_21":false} ``` ```bloblang root.has_errors = this.results.any(r -> r.status == "error") # In: {"results":[{"status":"ok"},{"status":"error"},{"status":"ok"}]} # Out: {"has_errors":true} # In: {"results":[{"status":"ok"},{"status":"ok"}]} # Out: {"has_errors":false} ``` ### [](#append)append Adds one or more elements to the end of an array and returns the new array. The original array is not modified. #### [](#examples-59)Examples ```bloblang root.foo = this.foo.append("and", "this") # In: {"foo":["bar","baz"]} # Out: {"foo":["bar","baz","and","this"]} ``` ```bloblang root.combined = this.items.append(this.new_item) # In: {"items":["apple","banana"],"new_item":"orange"} # Out: {"combined":["apple","banana","orange"]} ``` ### [](#assign)assign Merges two objects or arrays with override behavior. For objects, source values replace destination values on key conflicts. Arrays are concatenated. To preserve both values on conflict, use the merge method instead. #### [](#parameters-46)Parameters | Name | Type | Description | | --- | --- | --- | | with | unknown | A value to merge the target value with. | #### [](#examples-60)Examples ```bloblang root = this.foo.assign(this.bar) # In: {"foo":{"first_name":"fooer","likes":"bars"},"bar":{"second_name":"barer","likes":"foos"}} # Out: {"first_name":"fooer","likes":"foos","second_name":"barer"} ``` Override defaults with user settings: ```bloblang root.config = this.defaults.assign(this.user_settings) # In: {"defaults":{"timeout":30,"retries":3},"user_settings":{"timeout":60}} # Out: {"config":{"retries":3,"timeout":60}} ``` ### [](#collapse)collapse Flattens a nested structure into a flat object with dot-notation keys. #### [](#parameters-47)Parameters | Name | Type | Description | | --- | --- | --- | | include_empty | bool | Whether to include empty objects and arrays in the resulting object. | #### [](#examples-61)Examples ```bloblang root.result = this.collapse() # In: {"foo":[{"bar":"1"},{"bar":{}},{"bar":"2"},{"bar":[]}]} # Out: {"result":{"foo.0.bar":"1","foo.2.bar":"2"}} ``` Set include\_empty to true to preserve empty objects and arrays in the output: ```bloblang root.result = this.collapse(include_empty: true) # In: {"foo":[{"bar":"1"},{"bar":{}},{"bar":"2"},{"bar":[]}]} # Out: {"result":{"foo.0.bar":"1","foo.1.bar":{},"foo.2.bar":"2","foo.3.bar":[]}} ``` ### [](#concat)concat Concatenates an array value with one or more argument arrays. #### [](#examples-62)Examples ```bloblang root.foo = this.foo.concat(this.bar, this.baz) # In: {"foo":["a","b"],"bar":["c"],"baz":["d","e","f"]} # Out: {"foo":["a","b","c","d","e","f"]} ``` ### [](#contains)contains Tests if an array or object contains a value. #### [](#parameters-48)Parameters | Name | Type | Description | | --- | --- | --- | | value | unknown | A value to test against elements of the target. | #### [](#examples-63)Examples ```bloblang root.has_foo = this.thing.contains("foo") # In: {"thing":["this","foo","that"]} # Out: {"has_foo":true} # In: {"thing":["this","bar","that"]} # Out: {"has_foo":false} ``` ```bloblang root.has_bar = this.thing.contains(20) # In: {"thing":[10.3,20.0,"huh",3]} # Out: {"has_bar":true} # In: {"thing":[2,3,40,67]} # Out: {"has_bar":false} ``` ```bloblang root.has_foo = this.thing.contains("foo") # In: {"thing":"this foo that"} # Out: {"has_foo":true} # In: {"thing":"this bar that"} # Out: {"has_foo":false} ``` ### [](#diff)diff Compares the current value with another value and returns a detailed changelog describing all differences. The changelog contains operations (create, update, delete) with their paths and values, enabling you to track changes between data versions, implement audit logs, or synchronize data between systems. #### [](#parameters-49)Parameters | Name | Type | Description | | --- | --- | --- | | other | unknown | The value to compare against the current value. Can be any structured data (object or array). | #### [](#examples-64)Examples Compare two objects to track field changes: ```bloblang root.changes = this.before.diff(this.after) # In: {"before":{"name":"Alice","age":30},"after":{"name":"Alice","age":31,"city":"NYC"}} # Out: {"changes":[{"From":30,"Path":["age"],"To":31,"Type":"update"},{"From":null,"Path":["city"],"To":"NYC","Type":"create"}]} ``` Detect deletions in configuration changes: ```bloblang root.changelog = this.old_config.diff(this.new_config) # In: {"old_config":{"debug":true,"timeout":30},"new_config":{"timeout":60}} # Out: {"changelog":[{"From":true,"Path":["debug"],"To":null,"Type":"delete"},{"From":30,"Path":["timeout"],"To":60,"Type":"update"}]} ``` ### [](#enumerated)enumerated Transforms an array into an array of objects with index and value fields, making it easy to access both the position and content of each element. #### [](#examples-65)Examples ```bloblang root.foo = this.foo.enumerated() # In: {"foo":["bar","baz"]} # Out: {"foo":[{"index":0,"value":"bar"},{"index":1,"value":"baz"}]} ``` Useful for filtering by index position: ```bloblang root.first_two = this.items.enumerated().filter(item -> item.index < 2).map_each(item -> item.value) # In: {"items":["a","b","c","d"]} # Out: {"first_two":["a","b"]} ``` ### [](#exists)exists Checks whether a field exists at the specified dot path within an object. Returns true if the field is present (even if null), false otherwise. #### [](#parameters-50)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A dot path to a field. | #### [](#examples-66)Examples ```bloblang root.result = this.foo.exists("bar.baz") # In: {"foo":{"bar":{"baz":"yep, I exist"}}} # Out: {"result":true} # In: {"foo":{"bar":{}}} # Out: {"result":false} # In: {"foo":{}} # Out: {"result":false} ``` Also returns true for null values if the field exists: ```bloblang root.has_field = this.data.exists("optional_field") # In: {"data":{"optional_field":null}} # Out: {"has_field":true} # In: {"data":{}} # Out: {"has_field":false} ``` ### [](#explode)explode Expands a nested field into multiple documents. #### [](#parameters-51)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A dot path to a field to explode. | #### [](#examples-67)Examples ##### [](#on-arrays)On arrays When exploding an array, each element becomes a separate document with the array element replacing the original field: ```bloblang root = this.explode("value") # In: {"id":1,"value":["foo","bar","baz"]} # Out: [{"id":1,"value":"foo"},{"id":1,"value":"bar"},{"id":1,"value":"baz"}] ``` ##### [](#on-objects)On objects When exploding an object, the output keys match the nested object’s keys, with values being the full document where the target field is replaced by each nested value: ```bloblang root = this.explode("value") # In: {"id":1,"value":{"foo":2,"bar":[3,4],"baz":{"bev":5}}} # Out: {"bar":{"id":1,"value":[3,4]},"baz":{"id":1,"value":{"bev":5}},"foo":{"id":1,"value":2}} ``` ### [](#filter)filter Filters array or object elements based on a condition. #### [](#parameters-52)Parameters | Name | Type | Description | | --- | --- | --- | | test | query expression | A query to apply to each element, if this query resolves to any value other than a boolean true the element will be removed from the result. | #### [](#examples-68)Examples ```bloblang root.new_nums = this.nums.filter(num -> num > 10) # In: {"nums":[3,11,4,17]} # Out: {"new_nums":[11,17]} ``` ##### [](#on-objects-2)On objects When filtering objects, the query receives a context with `key` and `value` fields for each entry: ```bloblang root.new_dict = this.dict.filter(item -> item.value.contains("foo")) # In: {"dict":{"first":"hello foo","second":"world","third":"this foo is great"}} # Out: {"new_dict":{"first":"hello foo","third":"this foo is great"}} ``` ### [](#find)find Searches an array for a matching value and returns the index of the first occurrence. Returns -1 if no match is found. Numeric types are compared by value regardless of representation. #### [](#parameters-53)Parameters | Name | Type | Description | | --- | --- | --- | | value | unknown | A value to find. | #### [](#examples-69)Examples ```bloblang root.index = this.find("bar") # In: ["foo", "bar", "baz"] # Out: {"index":1} ``` ```bloblang root.index = this.things.find(this.goal) # In: {"goal":"bar","things":["foo", "bar", "baz"]} # Out: {"index":1} ``` ### [](#find_all)find\_all Searches an array for all occurrences of a value and returns an array of matching indexes. Returns an empty array if no matches are found. Numeric types are compared by value regardless of representation. #### [](#parameters-54)Parameters | Name | Type | Description | | --- | --- | --- | | value | unknown | A value to find. | #### [](#examples-70)Examples ```bloblang root.index = this.find_all("bar") # In: ["foo", "bar", "baz", "bar"] # Out: {"index":[1,3]} ``` ```bloblang root.indexes = this.things.find_all(this.goal) # In: {"goal":"bar","things":["foo", "bar", "baz", "bar", "buz"]} # Out: {"indexes":[1,3]} ``` ### [](#find_all_by)find\_all\_by Searches an array for all elements that satisfy a condition and returns an array of their indexes. Returns an empty array if no elements match. #### [](#parameters-55)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query to execute for each element. | #### [](#examples-71)Examples ```bloblang root.index = this.find_all_by(v -> v != "bar") # In: ["foo", "bar", "baz"] # Out: {"index":[0,2]} ``` Find all indexes matching criteria: ```bloblang root.error_indexes = this.logs.find_all_by(log -> log.level == "error") # In: {"logs":[{"level":"info"},{"level":"error"},{"level":"warn"},{"level":"error"}]} # Out: {"error_indexes":[1,3]} ``` ### [](#find_by)find\_by Searches an array for the first element that satisfies a condition and returns its index. Returns -1 if no element matches the query. #### [](#parameters-56)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query to execute for each element. | #### [](#examples-72)Examples ```bloblang root.index = this.find_by(v -> v != "bar") # In: ["foo", "bar", "baz"] # Out: {"index":0} ``` Find first object matching criteria: ```bloblang root.first_adult = this.users.find_by(u -> u.age >= 18) # In: {"users":[{"name":"Alice","age":15},{"name":"Bob","age":22},{"name":"Carol","age":19}]} # Out: {"first_adult":1} ``` ### [](#flatten)flatten Flattens an array by one level, expanding nested arrays into the parent array. Only the first level of nesting is removed. #### [](#examples-73)Examples ```bloblang root.result = this.flatten() # In: ["foo",["bar","baz"],"buz"] # Out: {"result":["foo","bar","baz","buz"]} ``` Deeper nesting requires multiple flatten calls: ```bloblang root.result = this.data.flatten() # In: {"data":["a",["b",["c","d"]],"e"]} # Out: {"result":["a","b",["c","d"],"e"]} ``` ### [](#fold)fold Reduces an array to a single value by iteratively applying a function. Also known as reduce or aggregate. The query receives an accumulator (tally) and current element (value) for each iteration. #### [](#parameters-57)Parameters | Name | Type | Description | | --- | --- | --- | | initial | unknown | The initial value to start the fold with. For example, an empty object {}, a zero count 0, or an empty string "". | | query | query expression | A query to apply for each element. The query is provided an object with two fields; tally containing the current tally, and value containing the value of the current element. The query should result in a new tally to be passed to the next element query. | #### [](#examples-74)Examples Sum numbers in an array: ```bloblang root.sum = this.foo.fold(0, item -> item.tally + item.value) # In: {"foo":[3,8,11]} # Out: {"sum":22} ``` Concatenate strings: ```bloblang root.result = this.foo.fold("", item -> "%v%v".format(item.tally, item.value)) # In: {"foo":["hello ", "world"]} # Out: {"result":"hello world"} ``` Merge an array of objects into a single object: ```bloblang root.smoothie = this.fruits.fold({}, item -> item.tally.merge(item.value)) # In: {"fruits":[{"apple":5},{"banana":3},{"orange":8}]} # Out: {"smoothie":{"apple":5,"banana":3,"orange":8}} ``` ### [](#get)get Extract a field value, identified via a [dot path](../../../configuration/field_paths/), from an object. #### [](#parameters-58)Parameters | Name | Type | Description | | --- | --- | --- | | path | string | A dot path identifying a field to obtain. | #### [](#examples-75)Examples ```bloblang root.result = this.foo.get(this.target) # In: {"foo":{"bar":"from bar","baz":"from baz"},"target":"bar"} # Out: {"result":"from bar"} # In: {"foo":{"bar":"from bar","baz":"from baz"},"target":"baz"} # Out: {"result":"from baz"} ``` ### [](#index)index Extract an element from an array by an index. The index can be negative, and if so the element will be selected from the end counting backwards starting from -1. E.g. an index of -1 returns the last element, an index of -2 returns the element before the last, and so on. #### [](#parameters-59)Parameters | Name | Type | Description | | --- | --- | --- | | index | integer | The index to obtain from an array. | #### [](#examples-76)Examples ```bloblang root.last_name = this.names.index(-1) # In: {"names":["rachel","stevens"]} # Out: {"last_name":"stevens"} ``` It is also possible to use this method on byte arrays, in which case the selected element will be returned as an integer: ```bloblang root.last_byte = this.name.bytes().index(-1) # In: {"name":"foobar bazson"} # Out: {"last_byte":110} ``` ### [](#join)join Joins an array of strings with an optional delimiter. #### [](#parameters-60)Parameters | Name | Type | Description | | --- | --- | --- | | delimiter (optional) | string | An optional delimiter to add between each string. | #### [](#examples-77)Examples ```bloblang root.joined_words = this.words.join() root.joined_numbers = this.numbers.map_each(this.string()).join(",") # In: {"words":["hello","world"],"numbers":[3,8,11]} # Out: {"joined_numbers":"3,8,11","joined_words":"helloworld"} ``` ### [](#json_path)json\_path Executes the given JSONPath expression on an object or array and returns the result. The JSONPath expression syntax can be found at [https://goessner.net/articles/JsonPath/](https://goessner.net/articles/JsonPath/). For more complex logic, you can use Gval expressions ([https://github.com/PaesslerAG/gval](https://github.com/PaesslerAG/gval)). #### [](#parameters-61)Parameters | Name | Type | Description | | --- | --- | --- | | expression | string | The JSONPath expression to execute. | #### [](#examples-78)Examples ```bloblang root.all_names = this.json_path("$..name") # In: {"name":"alice","foo":{"name":"bob"}} # Out: {"all_names":["alice","bob"]} # In: {"thing":["this","bar",{"name":"alice"}]} # Out: {"all_names":["alice"]} ``` ```bloblang root.text_objects = this.json_path("$.body[?(@.type=='text')]") # In: {"body":[{"type":"image","id":"foo"},{"type":"text","id":"bar"}]} # Out: {"text_objects":[{"id":"bar","type":"text"}]} ``` ### [](#json_schema)json\_schema Checks a [JSON schema](https://json-schema.org/) against a value and returns the value if it matches or throws and error if it does not. #### [](#parameters-62)Parameters | Name | Type | Description | | --- | --- | --- | | schema | string | The schema to check values against. | #### [](#examples-79)Examples ```bloblang root = this.json_schema("""{ "type":"object", "properties":{ "foo":{ "type":"string" } } }""") # In: {"foo":"bar"} # Out: {"foo":"bar"} # In: {"foo":5} # Out: Error("failed assignment (line 1): field `this`: foo invalid type. expected: string, given: integer") ``` In order to load a schema from a file use the `file` function: ```bloblang root = this.json_schema(file(env("BENTHOS_TEST_BLOBLANG_SCHEMA_FILE"))) ``` ### [](#key_values)key\_values Converts an object into an array of key-value pair objects. Each element has a 'key' field and a 'value' field. Order is not guaranteed unless sorted. #### [](#examples-80)Examples ```bloblang root.foo_key_values = this.foo.key_values().sort_by(pair -> pair.key) # In: {"foo":{"bar":1,"baz":2}} # Out: {"foo_key_values":[{"key":"bar","value":1},{"key":"baz","value":2}]} ``` Filter object entries by value: ```bloblang root.large_items = this.items.key_values().filter(pair -> pair.value > 15).map_each(pair -> pair.key) # In: {"items":{"a":5,"b":15,"c":20,"d":3}} # Out: {"large_items":["c"]} ``` ### [](#keys)keys Extracts all keys from an object and returns them as a sorted array. #### [](#examples-81)Examples ```bloblang root.foo_keys = this.foo.keys() # In: {"foo":{"bar":1,"baz":2}} # Out: {"foo_keys":["bar","baz"]} ``` Check if specific keys exist: ```bloblang root.has_id = this.data.keys().contains("id") # In: {"data":{"id":123,"name":"test"}} # Out: {"has_id":true} ``` ### [](#length)length Returns the length of an array, object, or string. #### [](#examples-82)Examples ```bloblang root.foo_len = this.foo.length() # In: {"foo":"hello world"} # Out: {"foo_len":11} ``` ```bloblang root.foo_len = this.foo.length() # In: {"foo":["first","second"]} # Out: {"foo_len":2} # In: {"foo":{"first":"bar","second":"baz"}} # Out: {"foo_len":2} ``` ### [](#map_each)map\_each Applies a mapping to each element of an array or object. #### [](#parameters-63)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query that will be used to map each element. | #### [](#examples-83)Examples ##### [](#on-arrays-2)On arrays Transforms each array element using a query. Return deleted() to remove an element, or the new value to replace it: ```bloblang root.new_nums = this.nums.map_each(num -> if num < 10 { deleted() } else { num - 10 }) # In: {"nums":[3,11,4,17]} # Out: {"new_nums":[1,7]} ``` ##### [](#on-objects-3)On objects Transforms each object value using a query. The query receives an object with 'key' and 'value' fields for each entry: ```bloblang root.new_dict = this.dict.map_each(item -> item.value.uppercase()) # In: {"dict":{"foo":"hello","bar":"world"}} # Out: {"new_dict":{"bar":"WORLD","foo":"HELLO"}} ``` ### [](#map_each_key)map\_each\_key Transforms object keys using a mapping query. #### [](#parameters-64)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query that will be used to map each key. | #### [](#examples-84)Examples ```bloblang root.new_dict = this.dict.map_each_key(key -> key.uppercase()) # In: {"dict":{"keya":"hello","keyb":"world"}} # Out: {"new_dict":{"KEYA":"hello","KEYB":"world"}} ``` Conditionally transform keys: ```bloblang root = this.map_each_key(key -> if key.contains("kafka") { "_" + key }) # In: {"amqp_key":"foo","kafka_key":"bar","kafka_topic":"baz"} # Out: {"_kafka_key":"bar","_kafka_topic":"baz","amqp_key":"foo"} ``` ### [](#merge)merge Combines two objects or arrays. When merging objects, conflicting keys create arrays containing both values. Arrays are concatenated. For key override behavior instead, use the assign method. #### [](#parameters-65)Parameters | Name | Type | Description | | --- | --- | --- | | with | unknown | A value to merge the target value with. | #### [](#examples-85)Examples ```bloblang root = this.foo.merge(this.bar) # In: {"foo":{"first_name":"fooer","likes":"bars"},"bar":{"second_name":"barer","likes":"foos"}} # Out: {"first_name":"fooer","likes":["bars","foos"],"second_name":"barer"} ``` Merge arrays: ```bloblang root.combined = this.list1.merge(this.list2) # In: {"list1":["a","b"],"list2":["c","d"]} # Out: {"combined":["a","b","c","d"]} ``` ### [](#patch)patch Applies a changelog (created by the diff method) to the current value, transforming it according to the specified operations. This enables you to synchronize data, replay changes, or implement event sourcing patterns by applying recorded changes to reconstruct state. #### [](#parameters-66)Parameters | Name | Type | Description | | --- | --- | --- | | changelog | unknown | The changelog array to apply. Should be in the format returned by the diff method, containing Type, Path, From, and To fields for each change. | #### [](#examples-86)Examples Apply recorded changes to update an object: ```bloblang root.updated = this.current.patch(this.changelog) # In: {"current":{"name":"Alice","age":30},"changelog":[{"Type":"update","Path":["age"],"From":30,"To":31},{"Type":"create","Path":["city"],"From":null,"To":"NYC"}]} # Out: {"updated":{"age":31,"city":"NYC","name":"Alice"}} ``` Restore previous state by applying inverse changes: ```bloblang root.restored = this.modified.patch(this.reverse_changelog) # In: {"modified":{"timeout":60},"reverse_changelog":[{"Type":"create","Path":["debug"],"From":null,"To":true},{"Type":"update","Path":["timeout"],"From":60,"To":30}]} # Out: {"restored":{"debug":true,"timeout":30}} ``` ### [](#slice)slice Extracts a portion of an array or string. #### [](#parameters-67)Parameters | Name | Type | Description | | --- | --- | --- | | low | integer | The low bound, which is the first element of the selection, or if negative selects from the end. | | high (optional) | integer | An optional high bound. | #### [](#examples-87)Examples ```bloblang root.beginning = this.value.slice(0, 2) root.end = this.value.slice(4) # In: {"value":"foo bar"} # Out: {"beginning":"fo","end":"bar"} ``` A negative low index can be used, indicating an offset from the end of the sequence. If the low index is greater than the length of the sequence then an empty result is returned: ```bloblang root.last_chunk = this.value.slice(-4) root.the_rest = this.value.slice(0, -4) # In: {"value":"foo bar"} # Out: {"last_chunk":" bar","the_rest":"foo"} ``` ```bloblang root.beginning = this.value.slice(0, 2) root.end = this.value.slice(4) # In: {"value":["foo","bar","baz","buz","bev"]} # Out: {"beginning":["foo","bar"],"end":["bev"]} ``` A negative low index can be used, indicating an offset from the end of the sequence. If the low index is greater than the length of the sequence then an empty result is returned: ```bloblang root.last_chunk = this.value.slice(-2) root.the_rest = this.value.slice(0, -2) # In: {"value":["foo","bar","baz","buz","bev"]} # Out: {"last_chunk":["buz","bev"],"the_rest":["foo","bar","baz"]} ``` ### [](#sort)sort Sorts array elements in ascending order. #### [](#parameters-68)Parameters | Name | Type | Description | | --- | --- | --- | | compare (optional) | query expression | An optional query that should explicitly compare elements left and right and provide a boolean result. | #### [](#examples-88)Examples ```bloblang root.sorted = this.foo.sort() # In: {"foo":["bbb","ccc","aaa"]} # Out: {"sorted":["aaa","bbb","ccc"]} ``` Custom comparison for complex objects - return true if left < right: ```bloblang root.sorted = this.foo.sort(item -> item.left.v < item.right.v) # In: {"foo":[{"id":"foo","v":"bbb"},{"id":"bar","v":"ccc"},{"id":"baz","v":"aaa"}]} # Out: {"sorted":[{"id":"baz","v":"aaa"},{"id":"foo","v":"bbb"},{"id":"bar","v":"ccc"}]} ``` ### [](#sort_by)sort\_by Sorts array elements by a specified field or expression. #### [](#parameters-69)Parameters | Name | Type | Description | | --- | --- | --- | | query | query expression | A query to apply to each element that yields a value used for sorting. | #### [](#examples-89)Examples ```bloblang root.sorted = this.foo.sort_by(ele -> ele.id) # In: {"foo":[{"id":"bbb","message":"bar"},{"id":"aaa","message":"foo"},{"id":"ccc","message":"baz"}]} # Out: {"sorted":[{"id":"aaa","message":"foo"},{"id":"bbb","message":"bar"},{"id":"ccc","message":"baz"}]} ``` Sort by numeric field: ```bloblang root.sorted = this.items.sort_by(item -> item.priority) # In: {"items":[{"name":"low","priority":3},{"name":"high","priority":1},{"name":"med","priority":2}]} # Out: {"sorted":[{"name":"high","priority":1},{"name":"med","priority":2},{"name":"low","priority":3}]} ``` ### [](#squash)squash Squashes an array of objects into a single object, where key collisions result in the values being merged (following similar rules as the `.merge()` method). #### [](#examples-90)Examples ```bloblang root.locations = this.locations.map_each(loc -> {loc.state: [loc.name]}).squash() # In: {"locations":[{"name":"Seattle","state":"WA"},{"name":"New York","state":"NY"},{"name":"Bellevue","state":"WA"},{"name":"Olympia","state":"WA"}]} # Out: {"locations":{"NY":["New York"],"WA":["Seattle","Bellevue","Olympia"]}} ``` ### [](#sum)sum Returns the sum of numeric values in an array. #### [](#examples-91)Examples ```bloblang root.sum = this.foo.sum() # In: {"foo":[3,8,4]} # Out: {"sum":15} ``` Works with decimals: ```bloblang root.total = this.prices.sum() # In: {"prices":[10.5,20.25,5.00]} # Out: {"total":35.75} ``` ### [](#unique)unique Returns an array with duplicate elements removed. #### [](#parameters-70)Parameters | Name | Type | Description | | --- | --- | --- | | emit (optional) | query expression | An optional query that can be used in order to yield a value for each element to determine uniqueness. | #### [](#examples-92)Examples ```bloblang root.uniques = this.foo.unique() # In: {"foo":["a","b","a","c"]} # Out: {"uniques":["a","b","c"]} ``` Use a query to determine uniqueness by a field: ```bloblang root.unique_users = this.users.unique(u -> u.id) # In: {"users":[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"},{"id":1,"name":"Alice Duplicate"}]} # Out: {"unique_users":[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]} ``` ### [](#values)values Returns an array of all values from an object. #### [](#examples-93)Examples ```bloblang root.foo_vals = this.foo.values().sort() # In: {"foo":{"bar":1,"baz":2}} # Out: {"foo_vals":[1,2]} ``` Find max value in object: ```bloblang root.max = this.scores.values().sort().index(-1) # In: {"scores":{"player1":85,"player2":92,"player3":78}} # Out: {"max":92} ``` ### [](#with)with Returns an object where all but one or more [field path](../../../configuration/field_paths/) arguments are removed. Each path specifies a specific field to be retained from the input object, allowing for nested fields. If a key within a nested path does not exist then it is ignored. #### [](#examples-94)Examples ```bloblang root = this.with("inner.a","inner.c","d") # In: {"inner":{"a":"first","b":"second","c":"third"},"d":"fourth","e":"fifth"} # Out: {"d":"fourth","inner":{"a":"first","c":"third"}} ``` ### [](#without)without Returns an object with specified keys removed. #### [](#examples-95)Examples ```bloblang root = this.without("inner.a","inner.c","d") # In: {"inner":{"a":"first","b":"second","c":"third"},"d":"fourth","e":"fifth"} # Out: {"e":"fifth","inner":{"b":"second"}} ``` Remove sensitive fields: ```bloblang root = this.without("password","ssn","creditCard") # In: {"username":"alice","password":"secret","email":"alice@example.com","ssn":"123-45-6789"} # Out: {"email":"alice@example.com","username":"alice"} ``` ### [](#zip)zip Zip an array value with one or more argument arrays. Each array must match in length. #### [](#examples-96)Examples ```bloblang root.foo = this.foo.zip(this.bar, this.baz) # In: {"foo":["a","b","c"],"bar":[1,2,3],"baz":[4,5,6]} # Out: {"foo":[["a",1,4],["b",2,5],["c",3,6]]} ``` ## [](#parsing)Parsing ### [](#bloblang)bloblang Executes an argument Bloblang mapping on the target. This method can be used in order to execute dynamic mappings. Imports and functions that interact with the environment, such as `file` and `env`, or that access message information directly, such as `content` or `json`, are not enabled for dynamic Bloblang mappings. #### [](#parameters-71)Parameters | Name | Type | Description | | --- | --- | --- | | mapping | string | The mapping to execute. | #### [](#examples-97)Examples ```bloblang root.body = this.body.bloblang(this.mapping) # In: {"body":{"foo":"hello world"},"mapping":"root.foo = this.foo.uppercase()"} # Out: {"body":{"foo":"HELLO WORLD"}} # In: {"body":{"foo":"hello world 2"},"mapping":"root.foo = this.foo.capitalize()"} # Out: {"body":{"foo":"Hello World 2"}} ``` ### [](#format_json)format\_json Formats a value as a JSON string. #### [](#parameters-72)Parameters | Name | Type | Description | | --- | --- | --- | | indent | string | Indentation string. Each element in a JSON object or array will begin on a new, indented line followed by one or more copies of indent according to the indentation nesting. | | no_indent | bool | Disable indentation. | | escape_html | bool | Escape problematic HTML characters. | #### [](#examples-98)Examples ```bloblang root = this.doc.format_json() # In: {"doc":{"foo":"bar"}} # Out: { "foo": "bar" } ``` Pass a string to the `indent` parameter in order to customise the indentation: ```bloblang root = this.format_json(" ") # In: {"doc":{"foo":"bar"}} # Out: { "doc": { "foo": "bar" } } ``` Use the `.string()` method in order to coerce the result into a string: ```bloblang root.doc = this.doc.format_json().string() # In: {"doc":{"foo":"bar"}} # Out: {"doc":"{\n \"foo\": \"bar\"\n}"} ``` Set the `no_indent` parameter to true to disable indentation. The result is equivalent to calling `bytes()`: ```bloblang root = this.doc.format_json(no_indent: true) # In: {"doc":{"foo":"bar"}} # Out: {"foo":"bar"} ``` Escapes problematic HTML characters: ```bloblang root = this.doc.format_json() # In: {"doc":{"email":"foo&bar@benthos.dev","name":"foo>bar"}} # Out: { "email": "foo\u0026bar@benthos.dev", "name": "foo\u003ebar" } ``` Set the `escape_html` parameter to false to disable escaping of problematic HTML characters: ```bloblang root = this.doc.format_json(escape_html: false) # In: {"doc":{"email":"foo&bar@benthos.dev","name":"foo>bar"}} # Out: { "email": "foo&bar@benthos.dev", "name": "foo>bar" } ``` ### [](#format_msgpack)format\_msgpack Serializes structured data into MessagePack binary format. MessagePack is a compact binary serialization that is faster and more space-efficient than JSON, making it ideal for network transmission and storage of structured data. Returns a byte array that can be further encoded as needed. #### [](#examples-99)Examples Serialize object to MessagePack and encode as hex for transmission: ```bloblang root = this.format_msgpack().encode("hex") # In: {"foo":"bar"} # Out: 81a3666f6fa3626172 ``` Serialize data to MessagePack and base64 encode for embedding in JSON: ```bloblang root.msgpack_payload = this.data.format_msgpack().encode("base64") # In: {"data":{"foo":"bar"}} # Out: {"msgpack_payload":"gaNmb2+jYmFy"} ``` ### [](#format_xml)format\_xml Serializes an object into an XML document. Converts structured data to XML format with support for attributes (prefixed with hyphen), custom indentation, and configurable root element. Returns XML as a byte array. #### [](#parameters-73)Parameters | Name | Type | Description | | --- | --- | --- | | indent | string | String to use for each level of indentation (default is 4 spaces). Each nested XML element will be indented by this string. | | no_indent | bool | Disable indentation and newlines to produce compact XML on a single line. | | root_tag (optional) | string | Custom name for the root XML element. By default, the root element name is derived from the first key in the object. | #### [](#examples-100)Examples Serialize object to pretty-printed XML with default indentation: ```bloblang root = this.format_xml() # In: {"foo":{"bar":{"baz":"foo bar baz"}}} # Out: foo bar baz ``` Create compact XML without indentation for smaller message size: ```bloblang root = this.format_xml(no_indent: true) # In: {"foo":{"bar":{"baz":"foo bar baz"}}} # Out: foo bar baz ``` ### [](#format_yaml)format\_yaml Formats a value as a YAML string. #### [](#examples-101)Examples ```bloblang root = this.doc.format_yaml() # In: {"doc":{"foo":"bar"}} # Out: foo: bar ``` Use the `.string()` method in order to coerce the result into a string: ```bloblang root.doc = this.doc.format_yaml().string() # In: {"doc":{"foo":"bar"}} # Out: {"doc":"foo: bar\n"} ``` ### [](#infer_schema)infer\_schema Attempt to infer the schema of a given value. The resulting schema can then be used as an input to schema conversion and enforcement methods. ### [](#parse_csv)parse\_csv Parses CSV data into an array. #### [](#parameters-74)Parameters | Name | Type | Description | | --- | --- | --- | | parse_header_row | bool | Whether to reference the first row as a header row. If set to true the output structure for messages will be an object where field keys are determined by the header row. Otherwise, the output will be an array of row arrays. | | delimiter | string | The delimiter to use for splitting values in each record. It must be a single character. | | lazy_quotes | bool | If set to true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field. | #### [](#examples-102)Examples Parses CSV data with a header row: ```bloblang root.orders = this.orders.parse_csv() # In: {"orders":"foo,bar\nfoo 1,bar 1\nfoo 2,bar 2"} # Out: {"orders":[{"bar":"bar 1","foo":"foo 1"},{"bar":"bar 2","foo":"foo 2"}]} ``` Parses CSV data without a header row: ```bloblang root.orders = this.orders.parse_csv(false) # In: {"orders":"foo 1,bar 1\nfoo 2,bar 2"} # Out: {"orders":[["foo 1","bar 1"],["foo 2","bar 2"]]} ``` Parses CSV data delimited by dots: ```bloblang root.orders = this.orders.parse_csv(delimiter:".") # In: {"orders":"foo.bar\nfoo 1.bar 1\nfoo 2.bar 2"} # Out: {"orders":[{"bar":"bar 1","foo":"foo 1"},{"bar":"bar 2","foo":"foo 2"}]} ``` Parses CSV data containing a quote in an unquoted field: ```bloblang root.orders = this.orders.parse_csv(lazy_quotes:true) # In: {"orders":"foo,bar\nfoo 1,bar 1\nfoo\" \"2,bar\" \"2"} # Out: {"orders":[{"bar":"bar 1","foo":"foo 1"},{"bar":"bar\" \"2","foo":"foo\" \"2"}]} ``` ### [](#parse_form_url_encoded)parse\_form\_url\_encoded Attempts to parse a url-encoded query string (from an x-www-form-urlencoded request body) and returns a structured result. #### [](#examples-103)Examples ```bloblang root.values = this.body.parse_form_url_encoded() # In: {"body":"noise=meow&animal=cat&fur=orange&fur=fluffy"} # Out: {"values":{"animal":"cat","fur":["orange","fluffy"],"noise":"meow"}} ``` ### [](#parse_json)parse\_json Parses a JSON string into a structured value. #### [](#parameters-75)Parameters | Name | Type | Description | | --- | --- | --- | | use_number (optional) | bool | An optional flag that when set makes parsing numbers as json.Number instead of the default float64. | #### [](#examples-104)Examples ```bloblang root.doc = this.doc.parse_json() # In: {"doc":"{\"foo\":\"bar\"}"} # Out: {"doc":{"foo":"bar"}} ``` ```bloblang root.doc = this.doc.parse_json(use_number: true) # In: {"doc":"{\"foo\":\"11380878173205700000000000000000000000000000000\"}"} # Out: {"doc":{"foo":"11380878173205700000000000000000000000000000000"}} ``` ### [](#parse_logfmt)parse\_logfmt Parses logfmt formatted data into an object. #### [](#examples-105)Examples ```bloblang root = this.msg.parse_logfmt() # In: {"msg":"level=info msg=\"hello world\" dur=1.5s"} # Out: {"dur":"1.5s","level":"info","msg":"hello world"} ``` ### [](#parse_msgpack)parse\_msgpack Parses MessagePack binary data into a structured object. MessagePack is an efficient binary serialization format that is more compact than JSON while maintaining similar data structures. Commonly used for high-performance APIs and data interchange between microservices. #### [](#examples-106)Examples Parse MessagePack data from hex-encoded content: ```bloblang root = content().decode("hex").parse_msgpack() # In: 81a3666f6fa3626172 # Out: {"foo":"bar"} ``` Parse MessagePack from base64-encoded field: ```bloblang root.decoded = this.msgpack_data.decode("base64").parse_msgpack() # In: {"msgpack_data":"gaNmb2+jYmFy"} # Out: {"decoded":{"foo":"bar"}} ``` ### [](#parse_parquet)parse\_parquet Parses Apache Parquet binary data into an array of objects. Parquet is a columnar storage format optimized for analytics, commonly used with big data systems like Apache Spark, Hive, and cloud data warehouses. Each row in the Parquet file becomes an object in the output array. #### [](#parameters-76)Parameters | Name | Type | Description | | --- | --- | --- | | byte_array_as_string | bool | Deprecated: This parameter is no longer used. | #### [](#examples-107)Examples Parse Parquet file data into structured objects: ```bloblang root.records = content().parse_parquet() ``` Process Parquet data from a field and extract specific columns: ```bloblang root.users = this.parquet_data.parse_parquet().map_each(row -> {"name": row.name, "email": row.email}) ``` ### [](#parse_url)parse\_url Attempts to parse a URL from a string value, returning a structured result that describes the various facets of the URL. The fields returned within the structured result roughly follow [https://pkg.go.dev/net/url#URL](https://pkg.go.dev/net/url#URL), and may be expanded in future in order to present more information. #### [](#examples-108)Examples ```bloblang root.foo_url = this.foo_url.parse_url() # In: {"foo_url":"https://docs.redpanda.com/redpanda-connect/guides/bloblang/about/"} # Out: {"foo_url":{"fragment":"","host":"docs.redpanda.com","opaque":"","path":"/redpanda-connect/guides/bloblang/about/","raw_fragment":"","raw_path":"","raw_query":"","scheme":"https"}} ``` ```bloblang root.username = this.url.parse_url().user.name | "unknown" # In: {"url":"amqp://foo:bar@127.0.0.1:5672/"} # Out: {"username":"foo"} # In: {"url":"redis://localhost:6379"} # Out: {"username":"unknown"} ``` ### [](#parse_xml)parse\_xml Parses an XML document into a structured object. Converts XML elements to JSON-like objects following these rules: - Element attributes are prefixed with a hyphen (e.g., `-id` for an `id` attribute) - Elements with both attributes and text content store the text in a `#text` field - Repeated elements become arrays - XML comments, directives, and processing instructions are ignored - Optionally cast numeric and boolean strings to their proper types. #### [](#parameters-77)Parameters | Name | Type | Description | | --- | --- | --- | | cast (optional) | bool | Whether to automatically cast numeric and boolean string values to their proper types. When false, all values remain as strings. | #### [](#examples-109)Examples Parse XML document into object structure: ```bloblang root.doc = this.doc.parse_xml() # In: {"doc":"This is a titleThis is some content"} # Out: {"doc":{"root":{"content":"This is some content","title":"This is a title"}}} ``` Parse XML with type casting enabled to convert strings to numbers and booleans: ```bloblang root.doc = this.doc.parse_xml(cast: true) # In: {"doc":"This is a title123True"} # Out: {"doc":{"root":{"bool":true,"number":{"#text":123,"-id":99},"title":"This is a title"}}} ``` ### [](#parse_yaml)parse\_yaml Parses a YAML string into a structured value. #### [](#examples-110)Examples ```bloblang root.doc = this.doc.parse_yaml() # In: {"doc":"foo: bar"} # Out: {"doc":{"foo":"bar"}} ``` ## [](#regular-expressions)Regular expressions ### [](#re_find_all)re\_find\_all Finds all matches of a regular expression in a string. #### [](#parameters-78)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | #### [](#examples-111)Examples ```bloblang root.matches = this.value.re_find_all("a.") # In: {"value":"paranormal"} # Out: {"matches":["ar","an","al"]} ``` ```bloblang root.numbers = this.text.re_find_all("[0-9]+") # In: {"text":"I have 2 apples and 15 oranges"} # Out: {"numbers":["2","15"]} ``` ### [](#re_find_all_object)re\_find\_all\_object Finds all regex matches as objects with named groups. #### [](#parameters-79)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | #### [](#examples-112)Examples ```bloblang root.matches = this.value.re_find_all_object("a(?Px*)b") # In: {"value":"-axxb-ab-"} # Out: {"matches":[{"0":"axxb","foo":"xx"},{"0":"ab","foo":""}]} ``` ```bloblang root.matches = this.value.re_find_all_object("(?m)(?P\\w+):\\s+(?P\\w+)$") # In: {"value":"option1: value1\noption2: value2\noption3: value3"} # Out: {"matches":[{"0":"option1: value1","key":"option1","value":"value1"},{"0":"option2: value2","key":"option2","value":"value2"},{"0":"option3: value3","key":"option3","value":"value3"}]} ``` ### [](#re_find_all_submatch)re\_find\_all\_submatch Finds all regex matches with capture groups. #### [](#parameters-80)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | #### [](#examples-113)Examples ```bloblang root.matches = this.value.re_find_all_submatch("a(x*)b") # In: {"value":"-axxb-ab-"} # Out: {"matches":[["axxb","xx"],["ab",""]]} ``` ```bloblang root.emails = this.text.re_find_all_submatch("(\\w+)@(\\w+\\.\\w+)") # In: {"text":"Contact: alice@example.com or bob@test.org"} # Out: {"emails":[["alice@example.com","alice","example.com"],["bob@test.org","bob","test.org"]]} ``` ### [](#re_find_object)re\_find\_object Finds the first regex match as an object with named groups. #### [](#parameters-81)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | #### [](#examples-114)Examples ```bloblang root.matches = this.value.re_find_object("a(?Px*)b") # In: {"value":"-axxb-ab-"} # Out: {"matches":{"0":"axxb","foo":"xx"}} ``` ```bloblang root.matches = this.value.re_find_object("(?P\\w+):\\s+(?P\\w+)") # In: {"value":"option1: value1"} # Out: {"matches":{"0":"option1: value1","key":"option1","value":"value1"}} ``` ### [](#re_match)re\_match Tests if a string matches a regular expression. #### [](#parameters-82)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | #### [](#examples-115)Examples ```bloblang root.matches = this.value.re_match("[0-9]") # In: {"value":"there are 10 puppies"} # Out: {"matches":true} # In: {"value":"there are ten puppies"} # Out: {"matches":false} ``` ### [](#re_replace)re\_replace Replaces all regex matches with a replacement string that can reference capture groups using `$1`, `$2`, etc. Use for pattern-based transformations or data reformatting. #### [](#parameters-83)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | | value | string | The value to replace with. | ### [](#re_replace_all)re\_replace\_all Replaces all regex matches with a replacement string. #### [](#parameters-84)Parameters | Name | Type | Description | | --- | --- | --- | | pattern | string | The pattern to match against. | | value | string | The value to replace with. | #### [](#examples-116)Examples ```bloblang root.new_value = this.value.re_replace_all("ADD ([0-9]+)","+($1)") # In: {"value":"foo ADD 70"} # Out: {"new_value":"foo +(70)"} ``` ```bloblang root.masked = this.email.re_replace_all("(\\w{2})\\w+@", "$1***@") # In: {"email":"alice@example.com"} # Out: {"masked":"al***@example.com"} ``` ## [](#sql)SQL ### [](#vector)vector Converts an array of numbers into a vector type suitable for insertion into SQL databases with vector/embedding support. This is commonly used with PostgreSQL’s pgvector extension for storing and querying machine learning embeddings, enabling similarity search and vector operations in your database. #### [](#examples-117)Examples Convert embeddings array to vector for pgvector storage: ```bloblang root.embedding = this.embeddings.vector() root.text = this.text ``` Process ML model output into database-ready vector format: ```bloblang root.doc_id = this.id root.vector_embedding = this.model_output.map_each(num -> num.number()).vector() ``` ## [](#string-manipulation)String manipulation ### [](#capitalize)capitalize Converts a string to title case with Unicode letter mapping. #### [](#examples-118)Examples ```bloblang root.title = this.title.capitalize() # In: {"title":"the foo bar"} # Out: {"title":"The Foo Bar"} ``` ```bloblang root.name = this.name.capitalize() # In: {"name":"alice smith"} # Out: {"name":"Alice Smith"} ``` ### [](#compare_argon2)compare\_argon2 Checks whether a string matches a hashed secret using Argon2. #### [](#parameters-85)Parameters | Name | Type | Description | | --- | --- | --- | | hashed_secret | string | The hashed secret to compare with the input. This must be a fully-qualified string which encodes the Argon2 options used to generate the hash. | #### [](#examples-119)Examples ```bloblang root.match = this.secret.compare_argon2("$argon2id$v=19$m=4096,t=3,p=1$c2FsdHktbWNzYWx0ZmFjZQ$RMUMwgtS32/mbszd+ke4o4Ej1jFpYiUqY6MHWa69X7Y") # In: {"secret":"there-are-many-blobs-in-the-sea"} # Out: {"match":true} ``` ```bloblang root.match = this.secret.compare_argon2("$argon2id$v=19$m=4096,t=3,p=1$c2FsdHktbWNzYWx0ZmFjZQ$RMUMwgtS32/mbszd+ke4o4Ej1jFpYiUqY6MHWa69X7Y") # In: {"secret":"will-i-ever-find-love"} # Out: {"match":false} ``` ### [](#compare_bcrypt)compare\_bcrypt Checks whether a string matches a hashed secret using bcrypt. #### [](#parameters-86)Parameters | Name | Type | Description | | --- | --- | --- | | hashed_secret | string | The hashed secret value to compare with the input. | #### [](#examples-120)Examples ```bloblang root.match = this.secret.compare_bcrypt("$2y$10$Dtnt5NNzVtMCOZONT705tOcS8It6krJX8bEjnDJnwxiFKsz1C.3Ay") # In: {"secret":"there-are-many-blobs-in-the-sea"} # Out: {"match":true} ``` ```bloblang root.match = this.secret.compare_bcrypt("$2y$10$Dtnt5NNzVtMCOZONT705tOcS8It6krJX8bEjnDJnwxiFKsz1C.3Ay") # In: {"secret":"will-i-ever-find-love"} # Out: {"match":false} ``` ### [](#contains-2)contains Tests if an array or object contains a value. #### [](#parameters-87)Parameters | Name | Type | Description | | --- | --- | --- | | value | unknown | A value to test against elements of the target. | #### [](#examples-121)Examples ```bloblang root.has_foo = this.thing.contains("foo") # In: {"thing":["this","foo","that"]} # Out: {"has_foo":true} # In: {"thing":["this","bar","that"]} # Out: {"has_foo":false} ``` ```bloblang root.has_bar = this.thing.contains(20) # In: {"thing":[10.3,20.0,"huh",3]} # Out: {"has_bar":true} # In: {"thing":[2,3,40,67]} # Out: {"has_bar":false} ``` ```bloblang root.has_foo = this.thing.contains("foo") # In: {"thing":"this foo that"} # Out: {"has_foo":true} # In: {"thing":"this bar that"} # Out: {"has_foo":false} ``` ### [](#escape_html)escape\_html Escapes HTML special characters. #### [](#examples-122)Examples ```bloblang root.escaped = this.value.escape_html() # In: {"value":"foo & bar"} # Out: {"escaped":"foo & bar"} ``` ```bloblang root.safe_html = this.user_input.escape_html() # In: {"user_input":""} # Out: {"safe_html":"<script>alert('xss')</script>"} ``` ### [](#escape_url_path)escape\_url\_path Escapes a string for use in URL paths. #### [](#examples-123)Examples ```bloblang root.escaped = this.value.escape_url_path() # In: {"value":"foo & bar"} # Out: {"escaped":"foo%20&%20bar"} ``` ```bloblang root.url = "https://example.com/docs/" + this.path.escape_url_path() # In: {"path":"my document.pdf"} # Out: {"url":"https://example.com/docs/my%20document.pdf"} ``` ### [](#escape_url_query)escape\_url\_query Escapes a string for use in URL query parameters. #### [](#examples-124)Examples ```bloblang root.escaped = this.value.escape_url_query() # In: {"value":"foo & bar"} # Out: {"escaped":"foo+%26+bar"} ``` ```bloblang root.url = "https://example.com?search=" + this.query.escape_url_query() # In: {"query":"hello world!"} # Out: {"url":"https://example.com?search=hello+world%21"} ``` ### [](#filepath_join)filepath\_join Joins filepath components into a single path. #### [](#examples-125)Examples ```bloblang root.path = this.path_elements.filepath_join() # In: {"path_elements":["/foo/","bar.txt"]} # Out: {"path":"/foo/bar.txt"} ``` ### [](#filepath_split)filepath\_split Splits a filepath into directory and filename components. #### [](#examples-126)Examples ```bloblang root.path_sep = this.path.filepath_split() # In: {"path":"/foo/bar.txt"} # Out: {"path_sep":["/foo/","bar.txt"]} # In: {"path":"baz.txt"} # Out: {"path_sep":["","baz.txt"]} ``` ### [](#format)format Formats a value using a specified format string. #### [](#examples-127)Examples ```bloblang root.foo = "%s(%v): %v".format(this.name, this.age, this.fingers) # In: {"name":"lance","age":37,"fingers":13} # Out: {"foo":"lance(37): 13"} ``` ```bloblang root.message = "User %s has %v points".format(this.username, this.score) # In: {"username":"alice","score":100} # Out: {"message":"User alice has 100 points"} ``` ### [](#has_prefix)has\_prefix Tests if a string starts with a specified prefix. #### [](#parameters-88)Parameters | Name | Type | Description | | --- | --- | --- | | value | string | The string to test. | #### [](#examples-128)Examples ```bloblang root.t1 = this.v1.has_prefix("foo") root.t2 = this.v2.has_prefix("foo") # In: {"v1":"foobar","v2":"barfoo"} # Out: {"t1":true,"t2":false} ``` ### [](#has_suffix)has\_suffix Tests if a string ends with a specified suffix. #### [](#parameters-89)Parameters | Name | Type | Description | | --- | --- | --- | | value | string | The string to test. | #### [](#examples-129)Examples ```bloblang root.t1 = this.v1.has_suffix("foo") root.t2 = this.v2.has_suffix("foo") # In: {"v1":"foobar","v2":"barfoo"} # Out: {"t1":false,"t2":true} ``` ### [](#index_of)index\_of Returns the index of the first occurrence of a substring. #### [](#parameters-90)Parameters | Name | Type | Description | | --- | --- | --- | | value | string | A string to search for. | #### [](#examples-130)Examples ```bloblang root.index = this.thing.index_of("bar") # In: {"thing":"foobar"} # Out: {"index":3} ``` ```bloblang root.index = content().index_of("meow") # In: the cat meowed, the dog woofed # Out: {"index":8} ``` ### [](#length-2)length Returns the length of an array, object, or string. #### [](#examples-131)Examples ```bloblang root.foo_len = this.foo.length() # In: {"foo":"hello world"} # Out: {"foo_len":11} ``` ```bloblang root.foo_len = this.foo.length() # In: {"foo":["first","second"]} # Out: {"foo_len":2} # In: {"foo":{"first":"bar","second":"baz"}} # Out: {"foo_len":2} ``` ### [](#lowercase)lowercase Converts all letters in a string to lowercase. #### [](#examples-132)Examples ```bloblang root.foo = this.foo.lowercase() # In: {"foo":"HELLO WORLD"} # Out: {"foo":"hello world"} ``` ```bloblang root.email = this.user_email.lowercase() # In: {"user_email":"User@Example.COM"} # Out: {"email":"user@example.com"} ``` ### [](#quote)quote Wraps a string in double quotes and escapes special characters. #### [](#examples-133)Examples ```bloblang root.quoted = this.thing.quote() # In: {"thing":"foo\nbar"} # Out: {"quoted":"\"foo\\nbar\""} ``` ```bloblang root.literal = this.text.quote() # In: {"text":"hello\tworld"} # Out: {"literal":"\"hello\\tworld\""} ``` ### [](#repeat)repeat Creates a string by repeating the input a specified number of times. #### [](#parameters-91)Parameters | Name | Type | Description | | --- | --- | --- | | count | integer | The number of times to repeat the string. | #### [](#examples-134)Examples ```bloblang root.repeated = this.name.repeat(3) root.not_repeated = this.name.repeat(0) # In: {"name":"bob"} # Out: {"not_repeated":"","repeated":"bobbobbob"} ``` ```bloblang root.separator = "-".repeat(10) # In: {} # Out: {"separator":"----------"} ``` ### [](#replace)replace Replaces all occurrences of a substring with another string. Use for text transformation, cleaning data, or normalizing strings. #### [](#parameters-92)Parameters | Name | Type | Description | | --- | --- | --- | | old | string | A string to match against. | | new | string | A string to replace with. | ### [](#replace_all)replace\_all Replaces all occurrences of a substring with another. #### [](#parameters-93)Parameters | Name | Type | Description | | --- | --- | --- | | old | string | A string to match against. | | new | string | A string to replace with. | #### [](#examples-135)Examples ```bloblang root.new_value = this.value.replace_all("foo","dog") # In: {"value":"The foo ate my homework"} # Out: {"new_value":"The dog ate my homework"} ``` ```bloblang root.clean = this.text.replace_all(" ", " ") # In: {"text":"hello world foo"} # Out: {"clean":"hello world foo"} ``` ### [](#replace_all_many)replace\_all\_many Performs multiple find-and-replace operations in sequence. #### [](#parameters-94)Parameters | Name | Type | Description | | --- | --- | --- | | values | array | An array of values, each even value will be replaced with the following odd value. | #### [](#examples-136)Examples ```bloblang root.new_value = this.value.replace_all_many([ "", "<b>", "", "</b>", "", "<i>", "", "</i>", ]) # In: {"value":"Hello World"} # Out: {"new_value":"<i>Hello</i> <b>World</b>"} ``` ### [](#replace_many)replace\_many Performs multiple find-and-replace operations in sequence using an array of `[old, new]` pairs. More efficient than chaining multiple `replace_all` calls. Use for bulk text transformations. #### [](#parameters-95)Parameters | Name | Type | Description | | --- | --- | --- | | values | array | An array of values, each even value will be replaced with the following odd value. | ### [](#reverse)reverse Reverses the order of characters in a string. #### [](#examples-137)Examples ```bloblang root.reversed = this.thing.reverse() # In: {"thing":"backwards"} # Out: {"reversed":"sdrawkcab"} ``` ```bloblang root = content().reverse() # In: {"thing":"backwards"} # Out: }"sdrawkcab":"gniht"{ ``` ### [](#slice-2)slice Extracts a portion of an array or string. #### [](#parameters-96)Parameters | Name | Type | Description | | --- | --- | --- | | low | integer | The low bound, which is the first element of the selection, or if negative selects from the end. | | high (optional) | integer | An optional high bound. | #### [](#examples-138)Examples ```bloblang root.beginning = this.value.slice(0, 2) root.end = this.value.slice(4) # In: {"value":"foo bar"} # Out: {"beginning":"fo","end":"bar"} ``` A negative low index can be used, indicating an offset from the end of the sequence. If the low index is greater than the length of the sequence then an empty result is returned: ```bloblang root.last_chunk = this.value.slice(-4) root.the_rest = this.value.slice(0, -4) # In: {"value":"foo bar"} # Out: {"last_chunk":" bar","the_rest":"foo"} ``` ```bloblang root.beginning = this.value.slice(0, 2) root.end = this.value.slice(4) # In: {"value":["foo","bar","baz","buz","bev"]} # Out: {"beginning":["foo","bar"],"end":["bev"]} ``` A negative low index can be used, indicating an offset from the end of the sequence. If the low index is greater than the length of the sequence then an empty result is returned: ```bloblang root.last_chunk = this.value.slice(-2) root.the_rest = this.value.slice(0, -2) # In: {"value":["foo","bar","baz","buz","bev"]} # Out: {"last_chunk":["buz","bev"],"the_rest":["foo","bar","baz"]} ``` ### [](#slug)slug Converts a string into a URL-friendly slug by replacing spaces with hyphens, removing special characters, and converting to lowercase. Supports multiple languages for proper transliteration of non-ASCII characters. #### [](#parameters-97)Parameters | Name | Type | Description | | --- | --- | --- | | lang (optional) | string | | #### [](#examples-139)Examples Create a URL-friendly slug from a string with special characters: ```bloblang root.slug = this.title.slug() # In: {"title":"Hello World! Welcome to Redpanda Connect"} # Out: {"slug":"hello-world-welcome-to-redpanda-connect"} ``` Create a slug preserving French language rules: ```bloblang root.slug = this.title.slug("fr") # In: {"title":"Café & Restaurant"} # Out: {"slug":"cafe-et-restaurant"} ``` ### [](#split)split Splits a string into an array of substrings. #### [](#parameters-98)Parameters | Name | Type | Description | | --- | --- | --- | | delimiter | string | The delimiter to split with. | | empty_as_null | bool | To treat empty substrings as null values | #### [](#examples-140)Examples ```bloblang root.new_value = this.value.split(",") # In: {"value":"foo,bar,baz"} # Out: {"new_value":["foo","bar","baz"]} ``` ```bloblang root.new_value = this.value.split(",", true) # In: {"value":"foo,,qux"} # Out: {"new_value":["foo",null,"qux"]} ``` ```bloblang root.words = this.sentence.split(" ") # In: {"sentence":"hello world from bloblang"} # Out: {"words":["hello","world","from","bloblang"]} ``` ### [](#strip_html)strip\_html Removes HTML tags from a string, returning only the text content. Useful for extracting plain text from HTML documents, sanitizing user input, or preparing content for text analysis. Optionally preserves specific HTML elements while stripping all others. #### [](#parameters-99)Parameters | Name | Type | Description | | --- | --- | --- | | preserve (optional) | unknown | Optional array of HTML element names to preserve (e.g., ["strong", "em", "a"]). All other HTML tags will be removed. | #### [](#examples-141)Examples Extract plain text from HTML content: ```bloblang root.plain_text = this.html_content.strip_html() # In: {"html_content":"

Welcome to Redpanda Connect!

"} # Out: {"plain_text":"Welcome to Redpanda Connect!"} ``` Preserve specific HTML elements while removing others: ```bloblang root.sanitized = this.html.strip_html(["strong", "em"]) # In: {"html":"

Some bold and italic text with a

"} # Out: {"sanitized":"Some bold and italic text with a "} ``` ### [](#trim)trim Removes leading and trailing characters from a string. #### [](#parameters-100)Parameters | Name | Type | Description | | --- | --- | --- | | cutset (optional) | string | An optional string of characters to trim from the target value. | #### [](#examples-142)Examples ```bloblang root.title = this.title.trim("!?") root.description = this.description.trim() # In: {"description":" something happened and its amazing! ","title":"!!!watch out!?"} # Out: {"description":"something happened and its amazing!","title":"watch out"} ``` ### [](#trim_prefix)trim\_prefix Removes a specified prefix from the beginning of a string. #### [](#parameters-101)Parameters | Name | Type | Description | | --- | --- | --- | | prefix | string | The leading prefix substring to trim from the string. | #### [](#examples-143)Examples ```bloblang root.name = this.name.trim_prefix("foobar_") root.description = this.description.trim_prefix("foobar_") # In: {"description":"unchanged","name":"foobar_blobton"} # Out: {"description":"unchanged","name":"blobton"} ``` ### [](#trim_suffix)trim\_suffix Removes a specified suffix from the end of a string. #### [](#parameters-102)Parameters | Name | Type | Description | | --- | --- | --- | | suffix | string | The trailing suffix substring to trim from the string. | #### [](#examples-144)Examples ```bloblang root.name = this.name.trim_suffix("_foobar") root.description = this.description.trim_suffix("_foobar") # In: {"description":"unchanged","name":"blobton_foobar"} # Out: {"description":"unchanged","name":"blobton"} ``` ### [](#unescape_html)unescape\_html Converts HTML entities back to their original characters. #### [](#examples-145)Examples ```bloblang root.unescaped = this.value.unescape_html() # In: {"value":"foo & bar"} # Out: {"unescaped":"foo & bar"} ``` ```bloblang root.text = this.html.unescape_html() # In: {"html":"<p>Hello & goodbye</p>"} # Out: {"text":"

Hello & goodbye

"} ``` ### [](#unescape_url_path)unescape\_url\_path Unescapes URL path encoding. #### [](#examples-146)Examples ```bloblang root.unescaped = this.value.unescape_url_path() # In: {"value":"foo%20&%20bar"} # Out: {"unescaped":"foo & bar"} ``` ```bloblang root.filename = this.path.unescape_url_path() # In: {"path":"my%20document.pdf"} # Out: {"filename":"my document.pdf"} ``` ### [](#unescape_url_query)unescape\_url\_query Unescapes URL query parameter encoding. #### [](#examples-147)Examples ```bloblang root.unescaped = this.value.unescape_url_query() # In: {"value":"foo+%26+bar"} # Out: {"unescaped":"foo & bar"} ``` ```bloblang root.search = this.param.unescape_url_query() # In: {"param":"hello+world%21"} # Out: {"search":"hello world!"} ``` ### [](#unicode_segments)unicode\_segments Splits text into segments based on Unicode text segmentation rules. Returns an array of strings representing individual graphemes (visual characters), words (including punctuation and whitespace), or sentences. Handles complex Unicode correctly, including emoji with skin tone modifiers and zero-width joiners. #### [](#parameters-103)Parameters | Name | Type | Description | | --- | --- | --- | | segmentation_type | string | Type of segmentation: "grapheme", "word", or "sentence" | #### [](#examples-148)Examples Split text into sentences (preserves trailing spaces): ```bloblang root.sentences = this.text.unicode_segments("sentence") # In: {"text":"Hello world. How are you?"} # Out: {"sentences":["Hello world. ","How are you?"]} ``` Split text into grapheme clusters (handles complex emoji correctly): ```bloblang root.graphemes = this.emoji.unicode_segments("grapheme") # In: {"emoji":"👨‍👩‍👧‍👦❤️"} # Out: {"graphemes":["👨‍👩‍👧‍👦","❤️"]} ``` ### [](#unquote)unquote Removes surrounding quotes and interprets escape sequences. #### [](#examples-149)Examples ```bloblang root.unquoted = this.thing.unquote() # In: {"thing":"\"foo\\nbar\""} # Out: {"unquoted":"foo\nbar"} ``` ```bloblang root.text = this.literal.unquote() # In: {"literal":"\"hello\\tworld\""} # Out: {"text":"hello\tworld"} ``` ### [](#uppercase)uppercase Converts all letters in a string to uppercase. #### [](#examples-150)Examples ```bloblang root.foo = this.foo.uppercase() # In: {"foo":"hello world"} # Out: {"foo":"HELLO WORLD"} ``` ```bloblang root.code = this.product_code.uppercase() # In: {"product_code":"abc-123"} # Out: {"code":"ABC-123"} ``` ## [](#timestamp-manipulation)Timestamp manipulation ### [](#parse_duration)parse\_duration Parses a Go-style duration string into nanoseconds. A duration string is a signed sequence of decimal numbers with unit suffixes like "300ms", "-1.5h", or "2h45m". Valid units: "ns", "us" (or "µs"), "ms", "s", "m", "h". #### [](#examples-151)Examples Parse microseconds to nanoseconds: ```bloblang root.delay_for_ns = this.delay_for.parse_duration() # In: {"delay_for":"50us"} # Out: {"delay_for_ns":50000} ``` Parse hours to seconds: ```bloblang root.delay_for_s = this.delay_for.parse_duration() / 1000000000 # In: {"delay_for":"2h"} # Out: {"delay_for_s":7200} ``` ### [](#parse_duration_iso8601)parse\_duration\_iso8601 Parses an ISO 8601 duration string into nanoseconds. Format: "P\[n\]Y\[n\]M\[n\]DT\[n\]H\[n\]M\[n\]S" or "P\[n\]W". Example: "P3Y6M4DT12H30M5S" means 3 years, 6 months, 4 days, 12 hours, 30 minutes, 5 seconds. Supports fractional seconds with full precision (not just one decimal place). #### [](#examples-152)Examples Parse complex ISO 8601 duration to nanoseconds: ```bloblang root.delay_for_ns = this.delay_for.parse_duration_iso8601() # In: {"delay_for":"P3Y6M4DT12H30M5S"} # Out: {"delay_for_ns":110839937000000000} ``` Parse hours to seconds: ```bloblang root.delay_for_s = this.delay_for.parse_duration_iso8601() / 1000000000 # In: {"delay_for":"PT2H"} # Out: {"delay_for_s":7200} ``` ### [](#ts_add_iso8601)ts\_add\_iso8601 Adds an ISO 8601 duration to a timestamp with calendar-aware precision for years, months, and days. Useful when you need to add durations that account for variable month lengths or leap years. #### [](#parameters-104)Parameters | Name | Type | Description | | --- | --- | --- | | duration | string | Duration in ISO 8601 format (e.g., "P1Y2M3D" for 1 year, 2 months, 3 days) | #### [](#examples-153)Examples Add one year to a timestamp: ```bloblang root.next_year = this.created_at.ts_add_iso8601("P1Y") # In: {"created_at":"2020-08-14T05:54:23Z"} # Out: {"next_year":"2021-08-14T05:54:23Z"} ``` Add a complex duration with multiple units: ```bloblang root.future_date = this.created_at.ts_add_iso8601("P1Y2M3DT4H5M6S") # In: {"created_at":"2020-01-01T00:00:00Z"} # Out: {"future_date":"2021-03-04T04:05:06Z"} ``` ### [](#ts_format)ts\_format Formats a timestamp as a string using Go’s reference time format. Defaults to RFC 3339 if no format specified. The format uses "Mon Jan 2 15:04:05 -0700 MST 2006" as a reference. Accepts unix timestamps (with decimal precision) or RFC 3339 strings. Use ts\_strftime for strftime-style formats. #### [](#parameters-105)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The output format using Go’s reference time. | | tz (optional) | string | Optional timezone (e.g., 'UTC', 'America/New_York'). Defaults to input timezone or local time for unix timestamps. | #### [](#examples-154)Examples Format timestamp with custom format: ```bloblang root.something_at = this.created_at.ts_format("2006-Jan-02 15:04:05") # In: {"created_at":"2020-08-14T11:50:26.371Z"} # Out: {"something_at":"2020-Aug-14 11:50:26"} ``` Format unix timestamp with timezone specification: ```bloblang root.something_at = this.created_at.ts_format(format: "2006-Jan-02 15:04:05", tz: "UTC") # In: {"created_at":1597405526} # Out: {"something_at":"2020-Aug-14 11:45:26"} ``` ### [](#ts_parse)ts\_parse Parses a timestamp string using Go’s reference time format and outputs a timestamp object. The format uses "Mon Jan 2 15:04:05 -0700 MST 2006" as a reference - show how this reference time would appear in your format. Use ts\_strptime for strftime-style formats instead. #### [](#parameters-106)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The format of the input string using Go’s reference time. | #### [](#examples-155)Examples Parse a date with abbreviated month name: ```bloblang root.doc.timestamp = this.doc.timestamp.ts_parse("2006-Jan-02") # In: {"doc":{"timestamp":"2020-Aug-14"}} # Out: {"doc":{"timestamp":"2020-08-14T00:00:00Z"}} ``` Parse a custom datetime format: ```bloblang root.parsed = this.timestamp.ts_parse("Jan 2, 2006 at 3:04pm (MST)") # In: {"timestamp":"Aug 14, 2020 at 5:54am (UTC)"} # Out: {"parsed":"2020-08-14T05:54:00Z"} ``` ### [](#ts_round)ts\_round Rounds a timestamp to the nearest multiple of the specified duration. Halfway values round up. Accepts unix timestamps (seconds with optional decimal precision) or RFC 3339 formatted strings. #### [](#parameters-107)Parameters | Name | Type | Description | | --- | --- | --- | | duration | integer | A duration measured in nanoseconds to round by. | #### [](#examples-156)Examples Round timestamp to the nearest hour: ```bloblang root.created_at_hour = this.created_at.ts_round("1h".parse_duration()) # In: {"created_at":"2020-08-14T05:54:23Z"} # Out: {"created_at_hour":"2020-08-14T06:00:00Z"} ``` Round timestamp to the nearest minute: ```bloblang root.created_at_minute = this.created_at.ts_round("1m".parse_duration()) # In: {"created_at":"2020-08-14T05:54:23Z"} # Out: {"created_at_minute":"2020-08-14T05:54:00Z"} ``` ### [](#ts_strftime)ts\_strftime Formats a timestamp as a string using strptime format specifiers (like %Y, %m, %d). Accepts unix timestamps (with decimal precision) or RFC 3339 strings. Supports %f for microseconds. Use ts\_format for Go-style reference time formats. #### [](#parameters-108)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The output format using strptime specifiers. | | tz (optional) | string | Optional timezone. Defaults to input timezone or local time for unix timestamps. | #### [](#examples-157)Examples Format timestamp with strftime specifiers: ```bloblang root.something_at = this.created_at.ts_strftime("%Y-%b-%d %H:%M:%S") # In: {"created_at":"2020-08-14T11:50:26.371Z"} # Out: {"something_at":"2020-Aug-14 11:50:26"} ``` Format with microseconds using %f directive: ```bloblang root.something_at = this.created_at.ts_strftime("%Y-%b-%d %H:%M:%S.%f", "UTC") # In: {"created_at":"2020-08-14T11:50:26.371Z"} # Out: {"something_at":"2020-Aug-14 11:50:26.371000"} ``` ### [](#ts_strptime)ts\_strptime Parses a timestamp string using strptime format specifiers (like %Y, %m, %d) and outputs a timestamp object. Use ts\_parse for Go-style reference time formats instead. #### [](#parameters-109)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The format string using strptime specifiers (e.g., %Y-%m-%d). | #### [](#examples-158)Examples Parse date with abbreviated month using strptime format: ```bloblang root.doc.timestamp = this.doc.timestamp.ts_strptime("%Y-%b-%d") # In: {"doc":{"timestamp":"2020-Aug-14"}} # Out: {"doc":{"timestamp":"2020-08-14T00:00:00Z"}} ``` Parse datetime with microseconds using %f directive: ```bloblang root.doc.timestamp = this.doc.timestamp.ts_strptime("%Y-%b-%d %H:%M:%S.%f") # In: {"doc":{"timestamp":"2020-Aug-14 11:50:26.371000"}} # Out: {"doc":{"timestamp":"2020-08-14T11:50:26.371Z"}} ``` ### [](#ts_sub)ts\_sub Calculates the duration in nanoseconds between two timestamps (t1 - t2). Returns a signed integer: positive if t1 is after t2, negative if t1 is before t2. Use .abs() for absolute duration. #### [](#parameters-110)Parameters | Name | Type | Description | | --- | --- | --- | | t2 | timestamp | The timestamp to subtract from the target timestamp. | #### [](#examples-159)Examples Calculate absolute duration between two timestamps: ```bloblang root.between = this.started_at.ts_sub("2020-08-14T05:54:23Z").abs() # In: {"started_at":"2020-08-13T05:54:23Z"} # Out: {"between":86400000000000} ``` Calculate signed duration (can be negative): ```bloblang root.duration_ns = this.end_time.ts_sub(this.start_time) # In: {"start_time":"2020-08-14T10:00:00Z","end_time":"2020-08-14T11:30:00Z"} # Out: {"duration_ns":5400000000000} ``` ### [](#ts_sub_iso8601)ts\_sub\_iso8601 Subtracts an ISO 8601 duration from a timestamp with calendar-aware precision for years, months, and days. Useful when you need to subtract durations that account for variable month lengths or leap years. #### [](#parameters-111)Parameters | Name | Type | Description | | --- | --- | --- | | duration | string | Duration in ISO 8601 format (e.g., "P1Y2M3D" for 1 year, 2 months, 3 days) | #### [](#examples-160)Examples Subtract one year from a timestamp: ```bloblang root.last_year = this.created_at.ts_sub_iso8601("P1Y") # In: {"created_at":"2020-08-14T05:54:23Z"} # Out: {"last_year":"2019-08-14T05:54:23Z"} ``` Subtract a complex duration with multiple units: ```bloblang root.past_date = this.created_at.ts_sub_iso8601("P1Y2M3DT4H5M6S") # In: {"created_at":"2021-03-04T04:05:06Z"} # Out: {"past_date":"2020-01-01T00:00:00Z"} ``` ### [](#ts_tz)ts\_tz Converts a timestamp to a different timezone while preserving the moment in time. Accepts unix timestamps (seconds with optional decimal precision) or RFC 3339 formatted strings. #### [](#parameters-112)Parameters | Name | Type | Description | | --- | --- | --- | | tz | string | The timezone to change to. Use "UTC" for UTC, "Local" for local timezone, or an IANA Time Zone database location name like "America/New_York". | #### [](#examples-161)Examples Convert timestamp to UTC timezone: ```bloblang root.created_at_utc = this.created_at.ts_tz("UTC") # In: {"created_at":"2021-02-03T17:05:06+01:00"} # Out: {"created_at_utc":"2021-02-03T16:05:06Z"} ``` Convert timestamp to a specific timezone: ```bloblang root.created_at_ny = this.created_at.ts_tz("America/New_York") # In: {"created_at":"2021-02-03T16:05:06Z"} # Out: {"created_at_ny":"2021-02-03T11:05:06-05:00"} ``` ### [](#ts_unix)ts\_unix Converts a timestamp to a unix timestamp (seconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing seconds. #### [](#examples-162)Examples Convert RFC 3339 timestamp to unix seconds: ```bloblang root.created_at_unix = this.created_at.ts_unix() # In: {"created_at":"2009-11-10T23:00:00Z"} # Out: {"created_at_unix":1257894000} ``` Unix timestamp passthrough returns same value: ```bloblang root.timestamp = this.ts.ts_unix() # In: {"ts":1257894000} # Out: {"timestamp":1257894000} ``` ### [](#ts_unix_micro)ts\_unix\_micro Converts a timestamp to a unix timestamp with microsecond precision (microseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing microseconds. #### [](#examples-163)Examples Convert timestamp to microseconds since epoch: ```bloblang root.created_at_unix = this.created_at.ts_unix_micro() # In: {"created_at":"2009-11-10T23:00:00Z"} # Out: {"created_at_unix":1257894000000000} ``` Preserve microsecond precision from timestamp: ```bloblang root.precise_time = this.timestamp.ts_unix_micro() # In: {"timestamp":"2020-08-14T11:45:26.123456Z"} # Out: {"precise_time":1597405526123456} ``` ### [](#ts_unix_milli)ts\_unix\_milli Converts a timestamp to a unix timestamp with millisecond precision (milliseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing milliseconds. #### [](#examples-164)Examples Convert timestamp to milliseconds since epoch: ```bloblang root.created_at_unix = this.created_at.ts_unix_milli() # In: {"created_at":"2009-11-10T23:00:00Z"} # Out: {"created_at_unix":1257894000000} ``` Useful for JavaScript timestamp compatibility: ```bloblang root.js_timestamp = this.event_time.ts_unix_milli() # In: {"event_time":"2020-08-14T11:45:26.123Z"} # Out: {"js_timestamp":1597405526123} ``` ### [](#ts_unix_nano)ts\_unix\_nano Converts a timestamp to a unix timestamp with nanosecond precision (nanoseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing nanoseconds. #### [](#examples-165)Examples Convert timestamp to nanoseconds since epoch: ```bloblang root.created_at_unix = this.created_at.ts_unix_nano() # In: {"created_at":"2009-11-10T23:00:00Z"} # Out: {"created_at_unix":1257894000000000000} ``` Preserve full nanosecond precision: ```bloblang root.precise_time = this.timestamp.ts_unix_nano() # In: {"timestamp":"2020-08-14T11:45:26.123456789Z"} # Out: {"precise_time":1597405526123456789} ``` ## [](#type-coercion)Type coercion ### [](#array)array Converts a value to an array. #### [](#examples-166)Examples ```bloblang root.my_array = this.name.array() # In: {"name":"foobar bazson"} # Out: {"my_array":["foobar bazson"]} ``` ### [](#bool)bool Converts a value to a boolean with optional fallback. #### [](#parameters-113)Parameters | Name | Type | Description | | --- | --- | --- | | default (optional) | bool | An optional value to yield if the target cannot be parsed as a boolean. | #### [](#examples-167)Examples ```bloblang root.foo = this.thing.bool() root.bar = this.thing.bool(true) ``` ### [](#bytes)bytes Marshals a value into a byte array. #### [](#examples-168)Examples ```bloblang root.first_byte = this.name.bytes().index(0) # In: {"name":"foobar bazson"} # Out: {"first_byte":102} ``` ### [](#not_empty)not\_empty Ensures a value is not empty. #### [](#examples-169)Examples ```bloblang root.a = this.a.not_empty() # In: {"a":"foo"} # Out: {"a":"foo"} # In: {"a":""} # Out: Error("failed assignment (line 1): field `this.a`: string value is empty") # In: {"a":["foo","bar"]} # Out: {"a":["foo","bar"]} # In: {"a":[]} # Out: Error("failed assignment (line 1): field `this.a`: array value is empty") # In: {"a":{"b":"foo","c":"bar"}} # Out: {"a":{"b":"foo","c":"bar"}} # In: {"a":{}} # Out: Error("failed assignment (line 1): field `this.a`: object value is empty") ``` ### [](#not_null)not\_null Ensures a value is not null. #### [](#examples-170)Examples ```bloblang root.a = this.a.not_null() # In: {"a":"foobar","b":"barbaz"} # Out: {"a":"foobar"} # In: {"b":"barbaz"} # Out: Error("failed assignment (line 1): field `this.a`: value is null") ``` ### [](#number)number Converts a value to a number with optional fallback. #### [](#parameters-114)Parameters | Name | Type | Description | | --- | --- | --- | | default (optional) | float | An optional value to yield if the target cannot be parsed as a number. | #### [](#examples-171)Examples ```bloblang root.foo = this.thing.number() + 10 root.bar = this.thing.number(5) * 10 ``` ### [](#string)string Converts a value to a string representation. #### [](#examples-172)Examples ```bloblang root.nested_json = this.string() # In: {"foo":"bar"} # Out: {"nested_json":"{\"foo\":\"bar\"}"} ``` ```bloblang root.id = this.id.string() # In: {"id":228930314431312345} # Out: {"id":"228930314431312345"} ``` ### [](#timestamp)timestamp Converts a value to a timestamp with optional fallback. #### [](#parameters-115)Parameters | Name | Type | Description | | --- | --- | --- | | default (optional) | timestamp | An optional value to yield if the target cannot be parsed as a timestamp. | #### [](#examples-173)Examples ```bloblang root.foo = this.ts.timestamp() root.bar = this.none.timestamp(1234567890.timestamp()) ``` ### [](#type)type Returns the type of a value as a string. #### [](#examples-174)Examples ```bloblang root.bar_type = this.bar.type() root.foo_type = this.foo.type() # In: {"bar":10,"foo":"is a string"} # Out: {"bar_type":"number","foo_type":"string"} ``` ```bloblang root.type = this.type() # In: "foobar" # Out: {"type":"string"} # In: 666 # Out: {"type":"number"} # In: false # Out: {"type":"bool"} # In: ["foo", "bar"] # Out: {"type":"array"} # In: {"foo": "bar"} # Out: {"type":"object"} # In: null # Out: {"type":"null"} ``` ```bloblang root.type = content().type() # In: foobar # Out: {"type":"bytes"} ``` ```bloblang root.type = this.ts_parse("2006-01-02").type() # In: "2022-06-06" # Out: {"type":"timestamp"} ``` ## [](#deprecated)Deprecated ### [](#format_timestamp)format\_timestamp > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Formats a timestamp as a string using Go’s reference time format. Defaults to RFC 3339 if no format specified. The format uses "Mon Jan 2 15:04:05 -0700 MST 2006" as a reference. Accepts unix timestamps (with decimal precision) or RFC 3339 strings. Use ts\_strftime for strftime-style formats. #### [](#parameters-116)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The output format using Go’s reference time. | | tz (optional) | string | Optional timezone (e.g., 'UTC', 'America/New_York'). Defaults to input timezone or local time for unix timestamps. | ### [](#format_timestamp_strftime)format\_timestamp\_strftime > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Formats a timestamp as a string using strptime format specifiers (like %Y, %m, %d). Accepts unix timestamps (with decimal precision) or RFC 3339 strings. Supports %f for microseconds. Use ts\_format for Go-style reference time formats. #### [](#parameters-117)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The output format using strptime specifiers. | | tz (optional) | string | Optional timezone. Defaults to input timezone or local time for unix timestamps. | ### [](#format_timestamp_unix)format\_timestamp\_unix > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Converts a timestamp to a unix timestamp (seconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing seconds. ### [](#format_timestamp_unix_micro)format\_timestamp\_unix\_micro > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Converts a timestamp to a unix timestamp with microsecond precision (microseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing microseconds. ### [](#format_timestamp_unix_milli)format\_timestamp\_unix\_milli > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Converts a timestamp to a unix timestamp with millisecond precision (milliseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing milliseconds. ### [](#format_timestamp_unix_nano)format\_timestamp\_unix\_nano > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Converts a timestamp to a unix timestamp with nanosecond precision (nanoseconds since epoch). Accepts unix timestamps or RFC 3339 strings. Returns an integer representing nanoseconds. ### [](#parse_timestamp)parse\_timestamp > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Parses a timestamp string using Go’s reference time format and outputs a timestamp object. The format uses "Mon Jan 2 15:04:05 -0700 MST 2006" as a reference - show how this reference time would appear in your format. Use ts\_strptime for strftime-style formats instead. #### [](#parameters-118)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The format of the input string using Go’s reference time. | ### [](#parse_timestamp_strptime)parse\_timestamp\_strptime > ⚠️ **WARNING** > > This method is deprecated and will be removed in a future version. Parses a timestamp string using strptime format specifiers (like %Y, %m, %d) and outputs a timestamp object. Use ts\_parse for Go-style reference time formats instead. #### [](#parameters-119)Parameters | Name | Type | Description | | --- | --- | --- | | format | string | The format string using strptime specifiers (e.g., %Y-%m-%d). | --- # Page 335: Bloblang Walkthrough **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/bloblang/walkthrough.md --- # Bloblang Walkthrough --- title: Bloblang Walkthrough latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/bloblang/walkthrough page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/bloblang/walkthrough.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/bloblang/walkthrough.adoc description: A step by step introduction to Bloblang page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- Bloblang is the most advanced mapping language that you’ll learn from this walkthrough (probably). It is designed for readability, the power to shape even the most outrageous input documents, and to easily make erratic schemas bend to your will. Bloblang is the native mapping language of Redpanda Connect, but it has been designed as a general purpose technology ready to be adopted by other tools. In this walkthrough you’ll learn how to make new friends by mapping their documents, and lose old friends as they grow jealous and bitter of your mapping abilities. There are a few ways to execute Bloblang but the way we’ll do it in this guide is to pull a Redpanda Connect docker image and run the command `rpk connect blobl server`, which opens up an interactive Bloblang editor: ```sh docker pull docker.redpanda.com/redpandadata/connect:latest docker run -p 4195:4195 --rm docker.redpanda.com/redpandadata/connect blobl server --no-open --host 0.0.0.0 ``` Next, open your browser at `http://localhost:4195` and you should see an app with three panels, the top-left is where you paste an input document, the bottom is your Bloblang mapping and on the top-right is the output. ## [](#your-first-assignment)Your first assignment The primary goal of a Bloblang mapping is to construct a brand new document by using an input document as a reference, which we achieve through a series of assignments. Bloblang is traditionally used to map JSON documents and that’s mostly what we’ll be doing in this walkthrough. The first mapping you’ll see when you open the editor is a single assignment: ```bloblang root = this # In: {"message":"hello world"} # Out: {"message":"hello world"} ``` On the left-hand side of the assignment is our assignment target, where `root` is a keyword referring to the root of the new document being constructed. On the right-hand side is a query which determines the value to be assigned, where `this` is a keyword that refers to the context of the mapping which begins as the root of the input document. As you can see the input document in the editor begins as a JSON object `{"message":"hello world"}`, and the output panel should show the result as: ```json { "message": "hello world" } ``` This output is a (neatly formatted) replica of the input document. This is the result of our mapping because we assigned the entire input document to the root of our new thing. Let’s create a brand new document by assigning a fresh object to the root: ```bloblang root = {} root.foo = this.message # In: {"message":"hello world"} # Out: {"foo":"hello world"} ``` Bloblang supports a bunch of [literal types](../about/#literals), and the first line of this mapping assigns an empty object literal to the root. The second line then creates a new field `foo` on that object by assigning it the value of `message` from the input document. You should see that our output has changed to: ```json { "foo": "hello world" } ``` In Bloblang, when the path that we assign to contains fields that are themselves unset then they are created as empty objects. This rule also applies to `root` itself, which means the mapping: ```bloblang root.foo.bar = this.message root.foo."buz me".baz = "I like mapping" # In: {"message":"hello world"} # Out: {"foo":{"bar":"hello world","buz me":{"baz":"I like mapping"}}} ``` Will automatically create the objects required to produce the output document: ```json { "foo": { "bar": "hello world", "buz me": { "baz": "I like mapping" } } } ``` Also note that we can use quotes in order to express path segments that contain symbols or whitespace. Great, let’s move on quick before our self-satisfaction gets in the way of progress. ## [](#basic-methods-and-functions)Basic methods and functions Nothing is ever good enough for you, why should the input document be any different? Usually in our mappings it’s necessary to mutate values whilst we map them over, this is almost always done with methods, of which [there are many](../methods/). To demonstrate we’re going to change our mapping to [uppercase](../methods/#uppercase) the field `message` from our input document: ```bloblang root.foo.bar = this.message.uppercase() root.foo."buz me".baz = "I like mapping" # In: {"message":"hello world"} # Out: {"foo":{"bar":"HELLO WORLD","buz me":{"baz":"I like mapping"}}} ``` As you can see the syntax for a method is similar to many languages, simply add a dot on the target value followed by the method name and arguments within brackets. With this method added our output document should look like this: ```json { "foo": { "bar": "HELLO WORLD", "buz me": { "baz": "I like mapping" } } } ``` Since the result of any Bloblang query is a value you can use methods on anything, including other methods. For example, we could expand our mapping of `message` to also replace `WORLD` with `EARTH` using the [`replace_all` method](../methods/#replace_all): ```bloblang root.foo.bar = this.message.uppercase().replace_all("WORLD", "EARTH") root.foo."buz me".baz = "I like mapping" # In: {"message":"hello world"} # Out: {"foo":{"bar":"HELLO EARTH","buz me":{"baz":"I like mapping"}}} ``` As you can see this method required some arguments. Methods support both nameless (like above) and named arguments, which are often literal values but can also be queries themselves. For example try out the following mapping using both named style and a dynamic argument: ```bloblang root.foo.bar = this.message.uppercase().replace_all(old: "WORLD", new: this.message.capitalize()) root.foo."buz me".baz = "I like mapping" # In: {"message":"hello world"} # Out: {"foo":{"bar":"HELLO Hello World","buz me":{"baz":"I like mapping"}}} ``` Woah, I think that’s the plot to Inception, let’s move onto functions. Functions are just boring methods that don’t have a target, and there are [plenty of them as well](../functions/). Functions are often used to extract information unrelated to the input document, such as [environment variables](../functions/#env), or to generate data such as [timestamps](../functions/#now) or [UUIDs](../functions/#uuid_v4). Since we’re completionists let’s add one to our mapping: ```bloblang root.foo.bar = this.message.uppercase().replace_all("WORLD", "EARTH") root.foo."buz me".baz = "I like mapping" root.foo.id = uuid_v4() # In: {"message":"hello world"} ``` Now I can’t tell you what the output looks like since it will be different each time it’s mapped, how fun! ### [](#deletions)Deletions Everything in Bloblang is an expression to be assigned, including deletions, which is a [function `deleted()`](../functions/#deleted). To illustrate let’s create a field we want to delete by changing our input to the following: ```json { "name": "fooman barson", "age": 7, "opinions": ["trucks are cool","trains are cool","chores are bad"] } ``` If we wanted a full copy of this document without the field `name` then we can assign `deleted()` to it: ```bloblang root = this root.name = deleted() # In: {"name":"fooman barson","age":7,"opinions":["trucks are cool","trains are cool","chores are bad"]} # Out: {"age":7,"opinions":["trucks are cool","trains are cool","chores are bad"]} ``` And it won’t be included in the output: ```json { "age": 7, "opinions": [ "trucks are cool", "trains are cool", "chores are bad" ] } ``` An alternative way to delete fields is the [method `without`](../methods/#without), our above example could be rewritten as a single assignment `root = this.without("name")`. However, `deleted()` is generally more powerful and will come into play more later on. ## [](#variables)Variables Sometimes it’s necessary to capture a value for later, but we might not want it to be added to the resulting document. In Bloblang we can achieve this with variables which are created using the `let` keyword, and can be referenced within subsequent queries with a dollar sign prefix: ```bloblang let id = uuid_v4() root.id_sha1 = $id.hash("sha1").encode("hex") root.id_md5 = $id.hash("md5").encode("hex") # In: {} ``` Variables can be assigned any value type, including objects and arrays. ## [](#unstructured-and-binary-data)Unstructured and binary data So far in all of our examples both the input document and our newly mapped document are structured, but this does not need to be so. Try assigning some literal value types directly to the `root`, such as a string `root = "hello world"`, or a number `root = 5`. You should notice that when a value type is assigned to the root the output is the raw value, and therefore strings are not quoted. This is what makes it possible to output data of any format, including encrypted, encoded or otherwise binary data. Unstructured mapping is not limited to the output. Rather than referencing the input document with `this`, where it must be structured, it is possible to reference it as a binary string with the [function `content`](../functions/#content), try changing your mapping to: ```bloblang root = content().uppercase() # In: hello world # Out: HELLO WORLD ``` When you add content to the input panel, it should be the same in the output panel, but in all uppercase. ## [](#conditionals)Conditionals In order to play around with conditionals let’s set our input to something structured: ```json { "pet": { "type": "cat", "is_cute": true, "treats": 5, "toys": 3 } } ``` In Bloblang all conditionals are expressions, this is a core principal of Bloblang and will be important later on when we’re mapping deeply nested structures. ### [](#if-expression)If expression The simplest conditional is the `if` expression, where the boolean condition does not need to be in parentheses. Let’s create a map that modifies the number of treats our pet receives based on a field: ```bloblang root = this root.pet.treats = if this.pet.is_cute { this.pet.treats + 10 } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":15,"toys":3}} ``` Try that mapping out and you should see the number of treats in the output increased to 15. Now try changing the input field `pet.is_cute` to `false` and the output treats count should go back to the original 5. When a conditional expression doesn’t have a branch to execute then the assignment is skipped entirely, which means when the pet is not cute the value of `pet.treats` is unchanged (and remains the value set in the `root = this` assignment). We can add an `else` block to our `if` expression to remove treats entirely when the pet is not cute: ```bloblang root = this root.pet.treats = if this.pet.is_cute { this.pet.treats + 10 } else { deleted() } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":15,"toys":3}} ``` This is possible because field deletions are expressed as assigned values created with the `deleted()` function. ### [](#if-statement)If statement The `if` keyword can also be used as a statement in order to conditionally apply a series of mapping assignments, the previous example can be rewritten as: ```bloblang root = this if this.pet.is_cute { root.pet.treats = this.pet.treats + 10 } else { root.pet.treats = deleted() } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":15,"toys":3}} ``` Converting this mapping to use a statement has resulted in a more verbose mapping as we had to specify `root.pet.treats` multiple times as an assignment target. However, using `if` as a statement can be beneficial when multiple assignments rely on the same logic: ```bloblang root = this if this.pet.is_cute { root.pet.treats = this.pet.treats + 10 root.pet.toys = this.pet.toys + 10 } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":15,"toys":13}} ``` More treats _and_ more toys! Lucky Spot! ### [](#match-expression)Match expression Another conditional expression is `match` which allows you to list many branches consisting of a condition and a query to execute separated with `=>`, where the first condition to pass is the one that is executed: ```bloblang root = this root.pet.toys = match { this.pet.treats > 5 => this.pet.treats - 5, this.pet.type == "cat" => 3, this.pet.type == "dog" => this.pet.toys - 3, this.pet.type == "horse" => this.pet.toys + 10, _ => 0, } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} ``` Try executing that mapping with different values for `pet.type` and `pet.treats`. Match expressions can also specify a new context for the keyword `this` which can help reduce some of the boilerplate in your boolean conditions. The following mapping is equivalent to the previous: ```bloblang root = this root.pet.toys = match this.pet { this.treats > 5 => this.treats - 5, this.type == "cat" => 3, this.type == "dog" => this.toys - 3, this.type == "horse" => this.toys + 10, _ => 0, } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} ``` Your boolean conditions can also be expressed as value types, in which case the context being matched will be compared to the value: ```bloblang root = this root.pet.toys = match this.pet.type { "cat" => 3, "dog" => 5, "rabbit" => 8, "horse" => 20, _ => 0, } # In: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} # Out: {"pet":{"type":"cat","is_cute":true,"treats":5,"toys":3}} ``` ## [](#error-handling)Error handling Bloblang can simplify handling errors. First, let’s take a look at what happens when errors _aren’t_ handled, change your input to the following: ```json { "palace_guards": 10, "angry_peasants": "I couldn't be bothered to ask them" } ``` And change your mapping to something simple like a number comparison: ```bloblang root.in_trouble = this.angry_peasants > this.palace_guards # In: {"palace_guards":10,"angry_peasants":"I couldn't be bothered to ask them"} ``` Uh oh! It looks like our canvasser was too lazy and our `angry_peasants` count was incorrectly set for this document. You should see an error in the output window that mentions something like `cannot compare types string (from field this.angry_peasants) and number (from field this.palace_guards)`, which means the mapping was abandoned. So what if we want to try and map something, but don’t care if it fails? In this case if we are unable to compare our angry peasants with palace guards then I would still consider us in trouble just to be safe. For that we have a special [method `catch`](../methods/#catch), which if we add to any query allows us to specify an argument to be returned when an error occurs. Since methods can be added to any query we can surround our arithmetic with brackets and catch the whole thing: ```bloblang root.in_trouble = (this.angry_peasants > this.palace_guards).catch(true) # In: {"palace_guards":10,"angry_peasants":"I couldn't be bothered to ask them"} # Out: {"in_trouble":true} ``` Now instead of an error we should see an output with `in_trouble` set to `true`. Try changing to value of `angry_peasants` to a few different values, including some numbers. One of the powerful features of `catch` is that when it is added at the end of a series of expressions and methods it will capture errors at any part of the series, allowing you to capture errors at any granularity. For example, the mapping: ```bloblang root.abort_mission = if this.mission.type == "impossible" { !this.user.motives.contains("must clear name") } else { this.mission.difficulty > 10 }.catch(false) # In: {"mission":{"type":"impossible","difficulty":5},"user":{"motives":["must clear name"]}} # Out: {"abort_mission":false} ``` Will catch errors caused by: - `this.mission.type` not being a string - `this.user.motives` not being an array - `this.mission.difficulty` not being a number But will always return `false` if any of those errors occur. Try it out with this input and play around by breaking some of the fields: ```json { "mission": { "type": "impossible", "difficulty": 5 }, "user": { "motives": ["must clear name"] } } ``` Now try out this mapping: ```bloblang root.abort_mission = if (this.mission.type == "impossible").catch(true) { !this.user.motives.contains("must clear name").catch(false) } else { (this.mission.difficulty > 10).catch(true) } # In: {"mission":{"type":"impossible","difficulty":5},"user":{"motives":["must clear name"]}} # Out: {"abort_mission":false} ``` This version is more granular and will capture each of the errors individually, with each error given a unique `true` or `false` fallback. ## [](#validation)Validation Sometimes errors are what we want. Failing a mapping with an error allows us to handle the bad document in other ways, such as routing it to a dead-letter queue or filtering it entirely. You can read about common Redpanda Connect error handling patterns for bad data in the [error handling guide](../../../configuration/error_handling/), but the first step is to create the error. Luckily, Bloblang has a range of ways of creating errors under certain circumstances, which can be used in order to validate the data being mapped. There are [a few helper methods](../methods/#type-coercion) that make validating and coercing fields nice and easy, try this mapping out: ```bloblang root.foo = this.foo.number() root.bar = this.bar.not_null() root.baz = this.baz.not_empty() # In: {"foo":5,"bar":"hello world","baz":[1,2,3]} # Out: {"foo":5,"bar":"hello world","baz":[1,2,3]} ``` With some of these sample inputs: ```json {"foo":"nope","bar":"hello world","baz":[1,2,3]} {"foo":5,"baz":[1,2,3]} {"foo":10,"bar":"hello world","baz":[]} ``` However, these methods don’t cover all use cases. The general purpose error throwing technique is the [`throw` function](../functions/#throw), which takes an argument string that describes the error. When it’s called it will throw a mapping error that abandons the mapping. For example, we can check the type of a field with the [method `type`](../methods/#type), and then throw an error if it’s not the type we expected: ```bloblang root.foos = if this.user.foos.type() == "array" { this.user.foos } else { throw("foos must be an array, but it ain't, what gives?") } # In: {"user":{"foos":[1,2,3]}} ``` Try this mapping out with a few sample inputs: ```json {"user":{"foos":[1,2,3]}} {"user":{"foos":"1,2,3"}} ``` ## [](#context)Context In Bloblang, when we refer to the context we’re talking about the value returned with the keyword `this`. At the beginning of a mapping the context starts off as a reference to the root of a structured input document, which is why the mapping `root = this` will result in the same document coming out as you put in. However, in Bloblang there are mechanisms whereby the context might change, we’ve already seen how this can happen within a `match` expression. Another useful way to change the context is by adding a bracketed query expression as a method to a query, which looks like this: ```bloblang root = this.foo.bar.(this.baz + this.buz) # In: {"foo":{"bar":{"baz":1,"buz":2}}} # Out: 3 ``` Within the bracketed query expression the context becomes the result of the query that it’s a method of, so within the brackets in the above mapping the value of `this` points to the result of `this.foo.bar`, and the mapping is therefore equivalent to: ```bloblang root = this.foo.bar.baz + this.foo.bar.buz # In: {"foo":{"bar":{"baz":1,"buz":2}}} # Out: 3 ``` With this handy trick the `throw` mapping from the validation section above could be rewritten as: ```bloblang root.foos = this.user.foos.(if this.type() == "array" { this } else { throw("foos must be an array, but it ain't, what gives?") }) # In: {"user":{"foos":[1,2,3]}} # Out: {"foos":[1,2,3]} ``` ### [](#naming-the-context)Naming the context Shadowing the keyword `this` with new contexts can look confusing in your mappings, and it also limits you to only being able to reference one context at any given time. As an alternative, Bloblang supports context capture expressions that look similar to lambda functions from other languages, where you can name the new context with the syntax ` -> `, which looks like this: ```bloblang root = this.foo.bar.(thing -> thing.baz + thing.buz) # In: {"foo":{"bar":{"baz":1,"buz":2}}} # Out: 3 ``` Within the brackets we now have a new field `thing`, which returns the context that would have otherwise been captured as `this`. This also means the value returned from `this` hasn’t changed and will continue to return the root of the input document. ## [](#coalescing)Coalescing Being able to open up bracketed query expressions on fields leads us onto another cool trick in Bloblang referred to as coalescing. It’s very common in the world of document mapping that due to structural deviations a value that we wish to obtain could come from one of multiple possible paths. To illustrate this problem change the input document to the following: ```json { "thing": { "article": { "id": "foo", "contents": "Some people did some stuff" } } } ``` Let’s say we wish to flatten this structure with the following mapping: ```bloblang root.contents = this.thing.article.contents # In: {"thing":{"article":{"id":"foo","contents":"Some people did some stuff"}}} # Out: {"contents":"Some people did some stuff"} ``` But articles are only one of many document types we expect to receive, where the field `contents` remains the same but the field `article` could instead be `comment` or `share`. In this case we could expand our map of `contents` to use a `match` expression where we check for the existence of `article`, `comment`, etc in the input document. However, a much cleaner way of approaching this is with the pipe operator (`|`), which in Bloblang can be used to join multiple queries, where the first to yield a non-null result is selected. Change your mapping to the following: ```bloblang root.contents = this.thing.article.contents | this.thing.comment.contents # In: {"thing":{"article":{"id":"foo","contents":"Some people did some stuff"}}} # Out: {"contents":"Some people did some stuff"} ``` And now try changing the field `article` in your input document to `comment`. You should see that the value of `contents` remains as `Some people did some stuff` in the output document. Now, rather than write out the full path prefix `this.thing` each time we can use a bracketed query expression to change the context, giving us more space for adding other fields: ```bloblang root.contents = this.thing.(this.article | this.comment | this.share).contents # In: {"thing":{"article":{"id":"foo","contents":"Some people did some stuff"}}} # Out: {"contents":"Some people did some stuff"} ``` And by the way, the keyword `this` within queries can be omitted and made implicit, which allows us to reduce this even further: ```bloblang root.contents = this.thing.(article | comment | share).contents # In: {"thing":{"article":{"id":"foo","contents":"Some people did some stuff"}}} # Out: {"contents":"Some people did some stuff"} ``` Finally, we can also add a pipe operator at the end to fallback to a literal value when none of our candidates exists: ```bloblang root.contents = this.thing.(article | comment | share).contents | "nothing" # In: {"thing":{"article":{"id":"foo","contents":"Some people did some stuff"}}} # Out: {"contents":"Some people did some stuff"} ``` Neat. ## [](#advanced-methods)Advanced methods What happens when you need to map all of the elements of an array? Or filter the keys of an object by their values? What if the fellowship just used the eagles to fly to mount doom? Bloblang offers a bunch of advanced methods for [manipulating structured data types](../methods/#object—​array-manipulation), let’s take a quick tour of some of the cooler ones. Set your input document to this list of things: ```json { "num_friends": 5, "things": [ { "name": "yo-yo", "quantity": 10, "is_cool": true }, { "name": "dish soap", "quantity": 50, "is_cool": false }, { "name": "scooter", "quantity": 1, "is_cool": true }, { "name": "pirate hat", "quantity": 7, "is_cool": true } ] } ``` Let’s say we wanted to reduce the `things` in our input document to only those that are cool and where we have enough of them to share with our friends. We can do this with a [`filter` method](../methods/#filter): ```bloblang root = this.things.filter(thing -> thing.is_cool && thing.quantity > this.num_friends) # In: {"num_friends":5,"things":[{"name":"yo-yo","quantity":10,"is_cool":true},{"name":"dish soap","quantity":50,"is_cool":false},{"name":"scooter","quantity":1,"is_cool":true},{"name":"pirate hat","quantity":7,"is_cool":true}]} # Out: [{"name":"yo-yo","quantity":10,"is_cool":true},{"name":"pirate hat","quantity":7,"is_cool":true}] ``` Try running that mapping and you’ll see that the output is reduced. What is happening here is that the `filter` method takes an argument that is a query, and that query will be mapped for each individual element of the array (where the context is changed to the element itself). We have captured the context into a field `thing` which allows us to continue referencing the root of the input with `this`. The `filter` method requires the query parameter to resolve to a boolean `true` or `false`, and if it resolves to `true` the element will be present in the resulting array, otherwise it is removed. Being able to express a query argument to be applied to a range in this way is one of the more powerful features of Bloblang, and when mapping complex structured data these advanced methods will likely be a common tool that you’ll reach for. Another such method is [`map_each`](../methods/#map_each), which allows you to mutate each element of an array, or each value of an object. Change your input document to the following: ```json { "talking_heads": [ "1:E.T. is a bad film,Pokemon corrupted an entire generation", "2:Digimon ripped off Pokemon,Cats are boring", "3:I'm important", "4:Science is just made up,The Pokemon films are good,The weather is good" ] } ``` Here we have an array of talking heads, where each element is a string containing an identifer, a colon, and a comma separated list of their opinions. We wish to map each string into a structured object, which we can do with the following mapping: ```bloblang root = this.talking_heads.map_each(raw -> { "id": raw.split(":").index(0), "opinions": raw.split(":").index(1).split(",") }) # In: {"talking_heads":["1:E.T. is a bad film,Pokemon corrupted an entire generation","2:Digimon ripped off Pokemon,Cats are boring","3:I'm important","4:Science is just made up,The Pokemon films are good,The weather is good"]} # Out: [{"id":"1","opinions":["E.T. is a bad film","Pokemon corrupted an entire generation"]},{"id":"2","opinions":["Digimon ripped off Pokemon","Cats are boring"]},{"id":"3","opinions":["I'm important"]},{"id":"4","opinions":["Science is just made up","The Pokemon films are good","The weather is good"]}] ``` The argument to `map_each` is a query where the context is the element, which we capture into the field `raw`. The result of the query argument will become the value of the element in the resulting array, and in this case we return an object literal. In order to separate the identifier from opinions we perform a `split` by colon on the raw string element and get the first substring with the `index` method. We then do the split again and extract the remainder, and split that by comma in order to extract all of the opinions to an array field. However, one problem with this mapping is that the split by colon is written out twice and executed twice. A more efficient way of performing the same thing is with the bracketed query expressions we’ve played with before: ```bloblang root = this.talking_heads.map_each(raw -> raw.split(":").(split_string -> { "id": split_string.index(0), "opinions": split_string.index(1).split(",") })) # In: {"talking_heads":["1:E.T. is a bad film,Pokemon corrupted an entire generation","2:Digimon ripped off Pokemon,Cats are boring","3:I'm important","4:Science is just made up,The Pokemon films are good,The weather is good"]} # Out: [{"id":"1","opinions":["E.T. is a bad film","Pokemon corrupted an entire generation"]},{"id":"2","opinions":["Digimon ripped off Pokemon","Cats are boring"]},{"id":"3","opinions":["I'm important"]},{"id":"4","opinions":["Science is just made up","The Pokemon films are good","The weather is good"]}] ``` > 📝 **NOTE: Challenge!** > > Challenge! > > Try updating that map so that only opinions that mention Pokemon are kept To find more methods for manipulating structured data types check out the [methods page](../methods/#object—​array-manipulation). ## [](#reusable-mappings)Reusable mappings Bloblang has cool methods, sure, but there’s nothing cooler than methods you’ve made yourself. When the going gets tough in the mapping world the best solution is often to create a named mapping, which you can do with the keyword `map`: ```bloblang map parse_talking_head { let split_string = this.split(":") root.id = $split_string.index(0) root.opinions = $split_string.index(1).split(",") } root = this.talking_heads.map_each(raw -> raw.apply("parse_talking_head")) # In: {"talking_heads":["1:E.T. is a bad film,Pokemon corrupted an entire generation","2:Digimon ripped off Pokemon,Cats are boring","3:I'm important","4:Science is just made up,The Pokemon films are good,The weather is good"]} # Out: [{"id":"1","opinions":["E.T. is a bad film","Pokemon corrupted an entire generation"]},{"id":"2","opinions":["Digimon ripped off Pokemon","Cats are boring"]},{"id":"3","opinions":["I'm important"]},{"id":"4","opinions":["Science is just made up","The Pokemon films are good","The weather is good"]}] ``` The body of a named map, encapsulated with squiggly brackets, is a totally isolated mapping where `root` now refers to a new value being created for each invocation of the map, and `this` refers to the root of the context provided to the map. Named maps are executed with the [method `apply`](../methods/#apply), which has a string parameter identifying the map to execute, this means it’s possible to dynamically select the target map. As you can see in the above example we were able to use a custom map in order to create our talking head objects without the object literal. Within a named map we can also create variables that exist only within the scope of the map. A nice feature of named mappings is that they can invoke themselves recursively, allowing you to define mappings that walk deeply nested structures. The following mapping will scrub all values from a document that contain the word "Voldemort" (case insensitive): ```bloblang map remove_naughty_man { root = match { this.type() == "object" => this.map_each(item -> item.value.apply("remove_naughty_man")), this.type() == "array" => this.map_each(ele -> ele.apply("remove_naughty_man")), this.type() == "string" => if this.lowercase().contains("voldemort") { deleted() }, this.type() == "bytes" => if this.lowercase().contains("voldemort") { deleted() }, _ => this, } } root = this.apply("remove_naughty_man") # In: {"summer_party":{"theme":"the woman in black","guests":["Emma Bunton","the seal I spotted in Trebarwith","Voldemort","The cast of Swiss Army Man","Richard"],"notes":{"lisa":"I don't think voldemort eats fish","monty":"Seals hate dance music"}},"crushes":["Richard is nice but he hates pokemon","Victoria Beckham but I think she's taken","Charlie but they're totally into Voldemort"]} ``` Try running that mapping with the following input document: ```json { "summer_party": { "theme": "the woman in black", "guests": [ "Emma Bunton", "the seal I spotted in Trebarwith", "Voldemort", "The cast of Swiss Army Man", "Richard" ], "notes": { "lisa": "I don't think voldemort eats fish", "monty": "Seals hate dance music" } }, "crushes": [ "Richard is nice but he hates pokemon", "Victoria Beckham but I think she's taken", "Charlie but they're totally into Voldemort" ] } ``` ## [](#unit-testing)Unit testing Redpanda Connect has it’s own [unit testing capabilities](../../../configuration/unit_testing/) that you can also use for your mappings. To start with save a mapping into a file called something like `naughty_man.blobl`, we can use the example above from the reusable mappings section: ```bloblang map remove_naughty_man { root = match { this.type() == "object" => this.map_each(item -> item.value.apply("remove_naughty_man")), this.type() == "array" => this.map_each(ele -> ele.apply("remove_naughty_man")), this.type() == "string" => if this.lowercase().contains("voldemort") { deleted() }, this.type() == "bytes" => if this.lowercase().contains("voldemort") { deleted() }, _ => this, } } root = this.apply("remove_naughty_man") ``` Next, we can define our unit tests in an accompanying YAML file in the same directory, let’s call this `naughty_man_test.yaml`: ```yaml tests: - name: test naughty man scrubber target_mapping: './naughty_man.blobl' environment: {} input_batch: - content: | { "summer_party": { "theme": "the woman in black", "guests": [ "Emma Bunton", "the seal I spotted in Trebarwith", "Voldemort", "The cast of Swiss Army Man", "Richard" ] } } output_batches: - - json_equals: { "summer_party": { "theme": "the woman in black", "guests": [ "Emma Bunton", "the dolphin I spotted in Trebarwith", "The cast of Swiss Army Man", "Richard" ] } } ``` As you can see we’ve defined a single test, where we point to our mapping file which will be executed in our test. We then specify an input message which is a reduced version of the document we tried out before, and finally we specify output predicates, which is a JSON comparison against the output document. We can execute these tests with `rpk connect test ./naughty_man_test.yaml`, Redpanda Connect will also automatically find our tests if you simply run `rpk connect test ./…​`. You should see an output something like: ```text Test 'naughty_man_test.yaml' failed Failures: --- naughty_man_test.yaml --- test naughty man scrubber [line 2]: batch 0 message 0: json_equals: JSON content mismatch { "summer_party": { "guests": [ "Emma Bunton", "the seal I spotted in Trebarwith" => "the dolphin I spotted in Trebarwith", "The cast of Swiss Army Man", "Richard" ], "theme": "the woman in black" } } ``` Because in actual fact our expected output is wrong, I’ll leave it to you to spot the error. Once the test is fixed you should see: ```text Test 'naughty_man_test.yaml' succeeded ``` And now our mapping, should we need to expand it in the future, is better protected against regressions. You can read more about the Redpanda Connect unit test specification, including alternative output predicates, in [this document](../../../configuration/unit_testing/). --- # Page 336: Amazon Web Services **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/cloud/aws.md --- # Amazon Web Services --- title: Amazon Web Services latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/cloud/aws page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/cloud/aws.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/cloud/aws.adoc description: Find out about AWS components in Redpanda Connect. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- There are many components within Redpanda Connect which utilize AWS services. You will find that each of these components contains a configuration section under the field `credentials`, of the format: ```yml credentials: profile: "" id: "" secret: "" token: "" role: "" role_external_id: "" ``` This section contains many fields and it isn’t immediately clear which of them are compulsory and which aren’t. This document aims to make it clear what each field is responsible for and how it might be used. ## [](#credentials)Credentials By explicitly setting the credentials you are using at the component level it’s possible to connect to components using different accounts within the same Redpanda Connect process. If you are using long term credentials for your account you only need to set the fields `id` and `secret`: ```yml credentials: id: foo # aws_access_key_id secret: bar # aws_secret_access_key ``` If you are using short term credentials then you will also need to set the field `token`: ```yml credentials: id: foo # aws_access_key_id secret: bar # aws_secret_access_key token: baz # aws_session_token ``` ## [](#assume-a-role)Assume a role It’s also possible to configure Redpanda Connect to [assume a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html) using your credentials by setting the field `role` to your target role ARN. ```yml credentials: role: fooarn # Role ARN ``` This does NOT require explicit credentials, but it’s possible to use both. If you need to assume a role owned by another organization they might require you to [provide an external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html), in which case place it in the field `role_external_id`: ```yml credentials: role: fooarn # Role ARN role_external_id: bar_id ``` --- # Page 337: Ingest Real-Time Sensor Telemetry with the HTTP Gateway **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/cloud/gateway.md --- # Ingest Real-Time Sensor Telemetry with the HTTP Gateway --- title: Ingest Real-Time Sensor Telemetry with the HTTP Gateway latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/cloud/gateway page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/cloud/gateway.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/cloud/gateway.adoc description: Learn how to stream sensor telemetry data into Redpanda Cloud using the gateway input in Redpanda Connect. page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- In this guide, you’ll build a pipeline that uses the `gateway` input to receive real-time telemetry data from sensors over HTTP. Each incoming message is normalized, published to a Redpanda topic, and acknowledged back to the sender. This setup is ideal for IoT, mobile, and embedded systems that need to stream data to Redpanda Cloud without using a Kafka client. The `gateway` input exposes a secure HTTP endpoint, simplifying ingestion from devices. Because HTTP is universally supported, it’s easier to integrate on constrained devices, microcontrollers, or languages that don’t support Kafka natively. Additional benefits: - **Simplified security**: Devices authenticate with Redpanda Cloud API tokens (using Bearer headers). No need to embed Kafka credentials, manage TLS, or expose brokers publicly. - **Operational flexibility**: Devices are decoupled from Kafka internals like topics or schemas. You can evolve pipeline logic without touching device code. - **Automatic provisioning**: Redpanda Cloud generates a secure endpoint URL when you deploy the pipeline. ## [](#prerequisites)Prerequisites - A Redpanda Cloud cluster (Serverless, Dedicated, or BYOC) - cURL or another compatible HTTP client ## [](#create-a-sensor-user-in-redpanda-cloud)Create a sensor user in Redpanda Cloud A sensor user is required to securely authenticate and manage access to the `sensor.telemetry` topic, ensuring that only authorized devices can produce messages to the topic. 1. [Log in to Redpanda Cloud](https://cloud.redpanda.com). 2. Go to **Topics** and create a topic named `sensor.telemetry`. This topic will be used to store incoming telemetry messages. 3. Go to **Security** and create a user with the following details: - **Username**: `sensor-sasl-user` - **Password**: `` (choose a secure password) - **SASL Mechanism**: `SCRAM-SHA-256` 4. Copy the password and save it securely for the next step. 5. Go to **Secrets Store** and create a new secret named `SENSOR_SASL_PASSWORD` with the value of the password you set for the user. - Set the scope of the secret to Redpanda Cluster and Redpanda Connect. 6. Go to **Security > ACLs** and create an access policy for the `sensor-sasl-user` user. This policy should allow the user to produce messages to the `sensor.telemetry` topic. ## [](#create-a-service-account)Create a service account The service account is used to authenticate requests to the gateway endpoint. It provides a secure way to manage access to the gateway without embedding sensitive credentials in your devices. 1. [Create a new service account](https://cloud.redpanda.com/service-accounts/new) in Redpanda Cloud named `sensor-ingest` and give it a description like "Service account for sensor telemetry ingestion". 2. Copy the client ID and secret. 3. Request a new API token for the service account. This token will be used to authenticate requests to the gateway. ```bash curl --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id= \ --data client_secret= \ --data audience=cloudv2-production.redpanda.cloud ``` Replace `` and `` with the values you copied from the service account. The request response provides an access token that remains **valid for one hour**. 4. Set the access token as an environment variable: ```bash export CLOUD_API_TOKEN= ``` ## [](#create-a-redpanda-cloud-pipeline)Create a Redpanda Cloud pipeline 1. Go to **Connect** and click **Create Pipeline**. 2. Name the pipeline `sensor-telemetry-ingest` and give it a description like "Ingest real-time sensor telemetry data". 3. Paste the following pipeline configuration into the editor: ```yaml input: gateway: rate_limit: "limit" rate_limit_resources: - label: limit local: count: 100 interval: 1s pipeline: processors: - bloblang: | root.sensor_id = this.sensor_id root.type = this.type root.value = this.value root.unit = this.unit root.received_at = now() output: broker: pattern: fan_out_sequential outputs: - redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: sensor.telemetry tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: sensor-sasl-user password: ${secrets.SENSOR_SASL_PASSWORD} - sync_response: processors: - mapping: | root = { "status": "ok", "received_at": now() } ``` This pipeline listens for incoming telemetry messages over HTTP and processes each one in real time. Here’s what each section does: - `input.gateway`: Defines the input source. It exposes a secure HTTP endpoint that devices can post to. The optional `rate_limit` named `limit` is applied to protect the pipeline from overload. - `rate_limit_resources.limit`: Limits traffic to 100 requests per second. If this rate is exceeded, HTTP requests are rejected with a 429 response. - `pipeline.processors.bloblang`: Normalizes the incoming message by copying fields and adding a `received_at` timestamp (using the current time). - `output.broker`: Uses a `fan_out_sequential` pattern to send each message to two outputs: - The first output publishes the normalized message to the `sensor.telemetry` Redpanda topic. - The second output sends a synchronous JSON response back to the sender confirming receipt. 4. Click **Start**. The pipeline starts deploying. When the state changes to "Running", the pipeline is ready to accept incoming messages. 5. Click the pipeline to view its details. When the pipeline is deployed, a URL is displayed. This is the HTTP endpoint to which you’ll post sensor data. 6. Copy the URL. ## [](#send-sensor-data)Send sensor data Send test data using cURL. Replace `` with the URL provided by Redpanda Cloud when you deployed the pipeline. ```bash curl -X POST \ -H "Authorization: Bearer $CLOUD_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sensor_id": "thermo-42", "type": "temperature", "value": 21.7, "unit": "C" }' ``` Expected response: ```json { "received_at":"2025-06-17T09:48:50.986719231Z", "sensor_id":"thermo-42", "type":"temperature", "unit":"C", "value":21.7 } ``` You can verify that the message was successfully ingested by checking the `sensor.telemetry` topic in Redpanda Cloud. To verify that the rate limit is working, try sending more than 100 requests per second. You should receive a 429 response with a `Retry-After` header indicating when to retry. ```bash seq 1 300 | xargs -n1 -P50 -I{} curl -s -o /dev/null -w "%{http_code}\n" \ -X POST \ -H "Authorization: Bearer $CLOUD_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{"sensor_id":"test", "value": 42}' ``` You should see a mixture of `200` and `429` responses, indicating that the rate limit is being enforced. ## [](#monitor-the-pipeline)Monitor the pipeline You can monitor the pipeline’s logs in the Redpanda Cloud UI. 1. Go to **Connect** and select the `sensor-telemetry-ingest` pipeline. 2. Click on the **Logs** tab to view real-time logs of the pipeline’s activity. You can see any errors that occur during processing. ## [](#next-steps)Next steps - Filter or enrich events with conditional Bloblang. - Route messages by `sensor.type` to different topics. ## [](#suggested-reading)Suggested reading - [`gateway` input reference](../../../components/inputs/gateway/) - [Bloblang functions](../../../configuration/interpolation/) - [Redpanda Cloud API authentication](/api/doc/cloud-dataplane/authentication) --- # Page 338: Google Cloud Platform **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/cloud/gcp.md --- # Google Cloud Platform --- title: Google Cloud Platform latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/cloud/gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/cloud/gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/cloud/gcp.adoc description: Find out about GCP components in Redpanda Connect. page-git-created-date: "2024-09-09" page-git-modified-date: "2024-09-09" --- There are many components within Redpanda Connect which utilize Google Cloud Platform (GCP) services. You will find that each of these components require valid credentials. When running Redpanda Connect inside a Google Cloud environment that has a [default service account](https://cloud.google.com/iam/docs/service-accounts#default), it can automatically retrieve the service account credentials to call Google Cloud APIs through a library called Application Default Credentials (ADC). Otherwise, if your application runs outside Google Cloud environments that provide a default service account, you need to manually create one. Once you have a service account set up which has the required permissions, you can [create](https://console.cloud.google.com/apis/credentials/serviceaccountkey) a new Service Account Key and download it as a JSON file. Then all you need to do set the path to this JSON file in the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. Please refer to [this document](https://cloud.google.com/docs/authentication/production) for details. --- # Page 339: Migrate to the Unified Redpanda Migrator **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/migrate-unified-redpanda-migrator.md --- # Migrate to the Unified Redpanda Migrator --- title: Migrate to the Unified Redpanda Migrator latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/migrate-unified-redpanda-migrator page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/migrate-unified-redpanda-migrator.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/migrate-unified-redpanda-migrator.adoc description: Learn how to migrate from legacy migrator components to the unified `redpanda_migrator` input/output pair in Redpanda Connect 4.67.5+. page-git-created-date: "2025-10-24" page-git-modified-date: "2025-10-24" --- > ❗ **IMPORTANT** > > This page is about migrating to a newer version of Redpanda Connect. For information about migrating your data using Redpanda Migrator, see [Redpanda Migrator](../../cookbooks/redpanda_migrator/). This guide explains how to migrate from legacy migrator components (`redpanda_migrator_bundle`, `legacy_redpanda_migrator` and `legacy_redpanda_migrator_offsets`) to the unified `redpanda_migrator` input/output pair introduced in Redpanda Connect 4.67.5+. The unified migrator consolidates all migration logic into a single input/output pair, simplifying configuration and improving reliability. ## [](#overview)Overview | Available in | Redpanda Connect 4.67.5+ | | --- | --- | | Legacy status | Deprecated in 4.67.5, removed in 4.85.0 | | Compatibility | Not backward-compatible | | Configuration model | One input and one output, paired by label | | Primary control | All migration logic resides in the output component | Key concepts: - Components are paired by matching `label` values. - The input defines the source cluster and schema registry. - The output defines the destination cluster, schema registry, and migration behavior. - Topic mapping and consumer group migration are configured in the output. ## [](#architectural-changes)Architectural changes ### [](#legacy-architecture)Legacy architecture A complex bundle (`redpanda_migrator_bundle`) that managed three subcomponents: - `redpanda_migrator`: Data transfer - `schema_registry`: Schema synchronization - `redpanda_migrator_offsets`: Consumer group offsets This design required complex internal routing and sequencing. ### [](#unified-architecture)Unified architecture A single `redpanda_migrator` input/output pair replaces the bundle: - **Input**: Consumes from the source Kafka cluster. - **Output**: Handles topic creation, schema synchronization, ACLs, and consumer group offsets. Benefits: - Simplified setup: all configuration consolidated in one output component. - Improved coordination: no internal routing or wrapper logic. - Enhanced control: fine-grained schema and topic options, improved offset handling. ## [](#migration-steps)Migration steps Follow this checklist in order to ensure a safe, low-risk migration. - Back up your existing configurations. - Add new `input.redpanda_migrator` and `output.redpanda_migrator` components with matching labels. - Move source Kafka and Schema Registry settings to the input. - Move destination Kafka and Schema Registry settings to the output. - Replace `topic_prefix` with `topic` using interpolation syntax. - Move offset settings to `output.redpanda_migrator.consumer_groups`. - Remove deprecated fields. - Validate configuration with `rpk connect lint`. - Test using non-production topics first. - Monitor logs and performance during migration. - Remove legacy configuration after successful migration. ## [](#field-mapping-reference)Field mapping reference ### [](#bundle-wrapper-redpanda_migrator_bundle)Bundle wrapper (`redpanda_migrator_bundle`) #### [](#input-mapping)Input mapping | Legacy Field | New Location | Status | Notes | | --- | --- | --- | --- | | redpanda_migrator | input.redpanda_migrator | Moved | Source cluster connection | | schema_registry | input.redpanda_migrator.schema_registry | Moved | Source schema registry | | migrate_schemas_before_data | - | Removed | Controlled by output schema interval | | consumer_group_offsets_poll_interval | output.redpanda_migrator.consumer_groups.interval | Moved | Now controls sync frequency | #### [](#output-mapping)Output mapping | Legacy Field | New Location | Status | Notes | | --- | --- | --- | --- | | redpanda_migrator | output.redpanda_migrator | Moved | Destination cluster configuration | | schema_registry | output.redpanda_migrator.schema_registry | Moved | Destination schema registry | | translate_schema_ids | output.redpanda_migrator.schema_registry.translate_ids | Moved | Schema ID translation | | input_bundle_label | label | Replaced | Input and output paired by label | ### [](#data-migration-fields)Data migration fields | Legacy Field | New Location | Status | Notes | | --- | --- | --- | --- | | All (*) | input.redpanda_migrator.* | Moved | Direct mapping | | topics (explicit list) | input.redpanda_migrator.topics | Unchanged | Still supported for explicit lists | | regexp_topics: true | input.redpanda_migrator.regexp_topics_include, regexp_topics_exclude | Deprecated | Use include/exclude arrays for pattern-based selection | | topic_prefix | output.redpanda_migrator.topic | Replaced | Use interpolation, for example 'prefix_${! @kafka_topic }' | | replication_factor_override, replication_factor | output.redpanda_migrator.topic_replication_factor | Replaced | Unified field | | input_resource | label | Replaced | Label pairing replaces internal routing | | - | output.redpanda_migrator.provenance_header | New | Optional header for tracking message source cluster | ### [](#schema-migration-fields)Schema migration fields | Legacy Field | New Location | Status | Notes | | --- | --- | --- | --- | | Connection fields | input.redpanda_migrator.schema_registry.* | Moved | Source schema registry | | subject_filter | output.redpanda_migrator.schema_registry.include, exclude | Replaced | Use regex lists for filtering | | include_deleted | output.redpanda_migrator.schema_registry.include_deleted | Moved | Configured on destination | | backfill_dependencies | output.redpanda_migrator.schema_registry.versions | Replaced | Choose all or latest | ### [](#consumer-group-offset-migration)Consumer group offset migration The `redpanda_migrator_offsets` pair is replaced by the `consumer_groups` block in the output. | Legacy Component | New Location | Status | Notes | | --- | --- | --- | --- | | redpanda_migrator_offsets (input/output) | output.redpanda_migrator.consumer_groups | Replaced | Unified control block | ## [](#migration-example)Migration example The following example demonstrates a complete migration from legacy to unified components. Legacy configuration ```yaml input: label: "source_cluster" redpanda_migrator_bundle: legacy_redpanda_migrator: seed_brokers: [ "source-kafka:9092" ] topics: [ "orders", "payments" ] consumer_group: "migration_group" schema_registry: url: "http://source-registry:8081" migrate_schemas_before_data: false consumer_group_offsets_poll_interval: 30s output: redpanda_migrator_bundle: legacy_redpanda_migrator: seed_brokers: [ "destination-redpanda:9092" ] topic_prefix: "migrated_" schema_registry: url: "http://destination-registry:8081" translate_schema_ids: true input_bundle_label: "source_cluster" ``` Unified configuration ```yaml input: label: "migration_pipeline" (1) redpanda_migrator: # Source Kafka settings seed_brokers: [ "source-kafka:9092" ] # Pattern-based topic selection (for migrating all topics except system topics) # Note: You can still use explicit lists: topics: [ "orders", "payments" ] regexp_topics_include: [ '.' ] (2) regexp_topics_exclude: [ '^_' ] (3) consumer_group: "migration_group" # Source Schema Registry settings schema_registry: url: "http://source-registry:8081" output: label: "migration_pipeline" (4) redpanda_migrator: # Destination Redpanda settings seed_brokers: [ "destination-redpanda:9092" ] # Topic mapping (replaces topic_prefix) topic: 'migrated_${! @kafka_topic }' (5) # Add source cluster tracking header provenance_header: "x-source-cluster" (6) # Destination Schema Registry and migration settings schema_registry: url: "http://destination-registry:8081" translate_ids: true # Rename subjects subject: 'migrated_${! metadata("schema_registry_subject") }' # Consumer group migration settings consumer_groups: enabled: true interval: 30s (7) ``` | 1 | Labels are now used for pairing input and output. | | --- | --- | | 2 | Match all topics using regex pattern. | | 3 | Exclude internal/system topics starting with underscore. | | 4 | Matching label pairs the input and output components. | | 5 | Use interpolation syntax to replicate topic_prefix behavior. | | 6 | Adds a header to track which cluster messages originated from, useful for debugging and auditing. | | 7 | Replaces consumer_group_offsets_poll_interval. | ## [](#validation)Validation Before running, validate your configuration: ```bash rpk connect lint config.yaml ``` Then test on a small set of topics before running full migrations. ## [](#troubleshooting)Troubleshooting | Problem | Likely Cause | Solution | | --- | --- | --- | | Labels do not match | Input and output labels differ | Use identical, case-sensitive labels. | | Topic interpolation errors | Incorrect syntax | Use topic: 'prefix_${! @kafka_topic }' with quotes and !. | | Schema registry connection fails | Incorrect registry placement | The source registry must be in the input. The destination registry must be in the output. | | Consumer group migration not working | Missing consumer_groups.enabled: true | Ensure consumer group migration is explicitly enabled. | ## [](#after-migration)After migration After verifying that the new migrator works as expected: - Remove legacy configuration files. - Update internal documentation and runbooks. - Train your team on the new configuration model. - See the [`redpanda_migrator` output](../../components/outputs/redpanda_migrator/) reference for advanced configuration options. --- # Page 340: Synchronous Responses **URL**: https://docs.redpanda.com/redpanda-cloud/develop/connect/guides/sync_responses.md --- # Synchronous Responses --- title: Synchronous Responses latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: connect/guides/sync_responses page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: connect/guides/sync_responses.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/guides/sync_responses.adoc description: Understand synchronous response handling in Redpanda Connect, ensuring reliable and efficient data processing. page-git-created-date: "2025-06-25" page-git-modified-date: "2025-06-25" --- In a regular Redpanda Connect pipeline, messages flow in one direction and acknowledgements in the other: ```text ----------- Message -------------> Input (AMQP) -> Processors -> Output (AMQP) <------- Acknowledgement --------- ``` However, Redpanda Connect supports bidirectional protocols like HTTP and WebSocket, which allow responses to be returned directly from the pipeline. For example, HTTP is a request/response protocol, and inputs like `http_server` (Self-Managed) or `gateway` (Redpanda Cloud) support returning response payloads to the requester. ```text --------- Request Body --------> Input (HTTP) -> Processors -> Output (Sync Response) <--- Response Body (and ack) --- ``` ## [](#routing-processed-messages-back)Routing processed messages back To return a processed response, use the [`sync_response`](../../components/outputs/sync_response/) output. Use the `gateway` input in Redpanda Cloud: ```yaml input: gateway: {} pipeline: processors: - mapping: | root = { city: json("location"), forecast: "Clear skies with light winds", temperature_c: 22 } output: sync_response: {} ``` Sending this request: ```json { "location": "Berlin" } ``` Returns: ```json { "city": "Berlin", "forecast": "Clear skies with light winds", "temperature_c": 22 } ``` ## [](#combine-with-other-outputs)Combine with other outputs You can route processed messages to storage and return a response using a [`broker`](../../components/outputs/broker/) output. ```yaml input: gateway: {} output: broker: pattern: fan_out outputs: - redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: weather.requests tls: enabled: true sasl: - mechanism: SCRAM-SHA-256 username: ${secrets.USERNAME} password: ${secrets.PASSWORD} - sync_response: processors: - mapping: | root = { status: "received", received_at: now() } ``` ## [](#returning-partially-processed-messages)Returning partially processed messages You can return a response before the message is fully processed by using the [`sync_response` processor](../../components/processors/sync_response/). This allows continued processing after the response is set. ```yaml pipeline: processors: - mapping: root = "Received weather report for %s".format(json("location")) - sync_response: {} - mapping: root.reported_at = now() ``` This returns `"Received weather report for Berlin"` to the client, but continues modifying the message before storing or forwarding it. > 📝 **NOTE** > > Due to delivery guarantees, the response is not sent until all downstream processing and acknowledgements are complete. --- # Page 341: Consume Data **URL**: https://docs.redpanda.com/redpanda-cloud/develop/consume-data.md --- # Consume Data --- title: Consume Data latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: consume-data/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: consume-data/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/consume-data/index.adoc description: Learn about consumer offsets and follower fetching. page-git-created-date: "2024-07-25" page-git-modified-date: "2024-08-01" --- - [Consumer Offsets](consumer-offsets/) Redpanda uses an internal topic, `__consumer_offsets`, to store committed offsets from each Kafka consumer that is attached to Redpanda. - [Follower Fetching](follower-fetching/) Learn about follower fetching and how to configure a Redpanda consumer to fetch records from the closest replica. --- # Page 342: Consumer Offsets **URL**: https://docs.redpanda.com/redpanda-cloud/develop/consume-data/consumer-offsets.md --- # Consumer Offsets --- title: Consumer Offsets latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: consume-data/consumer-offsets page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: consume-data/consumer-offsets.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/consume-data/consumer-offsets.adoc description: Redpanda uses an internal topic, __consumer_offsets, to store committed offsets from each Kafka consumer that is attached to Redpanda. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- In Redpanda, all messages are organized by [topic](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#topic) and distributed across multiple partitions, based on a [partition strategy](https://www.redpanda.com/guides/kafka-tutorial-kafka-partition-strategy). For example, when using the round robin strategy, a producer writing to a topic with five partitions would distribute approximately 20% of the messages to each [partition](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#partition). Within a partition, each message (once accepted and acknowledged by the partition leader) is permanently assigned a unique sequence number called an [offset](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#offset). Offsets enable consumers to resume processing from a specific point, such as after an application outage. If an outage prevents your application from receiving events, you can use the consumer offset to retrieve only the events that occurred during the downtime. By default, the first message in a partition is assigned offset 0, the next is offset 1, and so on. You can manually specify a specific start value for offsets if needed. Once assigned, offsets are immutable, ensuring that the order of messages within a partition is preserved. ## [](#how-consumers-use-offsets)How consumers use offsets As a consumer reads messages from Redpanda, it can save its progress by “committing the offset” (known as an [offset commit](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#offset-commit)), an action initiated by the consumer, not Redpanda. Kafka client libraries provide an API for committing offsets, which communicates with Redpanda using the [consumer group](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#consumer-group) API. Each committed offset is stored as a message in the `__consumer_offsets` topic, which is a private Redpanda topic that stores committed offsets from each Kafka consumer attached to Redpanda, allowing the consumer to resume processing from the last committed point. Redpanda exposes the `__consumer_offsets` key to enable the many tools in the Kafka ecosystem that rely on this value for their operation, providing greater ecosystem interoperability with environments and applications. When a consumer group works together to consume data from topics, the partitions are divided among the consumers in the group. For example, if a topic has 12 partitions, and there are two consumers, each consumer would be assigned six partitions to consume. If a new consumer starts later and joins this consumer group, a rebalance occurs, such that each consumer ends up with four partitions to consume. You specify a consumer group by setting the `group.id` property to a unique name for the group. Kafka tracks the maximum offset it has consumed in each partition and can commit offsets to ensure it can resume processing from the same point in the event of a restart. Kafka allows offsets for a consumer group to be stored on a designated broker, known as the group coordinator. All consumers in the group send their offset commits and fetch requests to this group coordinator. > 📝 **NOTE** > > More advanced consumers can read data from Redpanda without using a consumer group by requesting to read a specific topic, partition, and offset range. This pattern is often used by stream processing systems such as Apache Spark and Apache Flink, which have their own mechanisms for assigning work to consumers. When the group coordinator receives an OffsetCommitRequest, it appends the request to the [compacted](https://kafka.apache.org/documentation/#compaction) Kafka topic `__consumer_offsets`. The broker sends a successful offset commit response to the consumer only after all the replicas of the offsets topic receive the offsets. If the offsets fail to replicate within a configurable timeout, the offset commit fails and the consumer may retry the commit after backing off. The brokers periodically compact the `__consumer_offsets` topic, because it only needs to maintain the most recent offset commit for each partition. The coordinator also caches the offsets in an in-memory table to serve offset fetches quickly. ## [](#commit-strategies)Commit strategies There are several strategies for managing offset commits: ### [](#automatic-offset-commit)Automatic offset commit Auto commit is the default commit strategy, where the client automatically commits offsets at regular intervals. This is set with the `enable.auto.commit` property. The client then commits offsets every `auto.commit.interval.ms` milliseconds. The primary advantage of the auto commit approach is its simplicity. After it is configured, the consumer requires no additional effort. Commits are managed in the background. However, the consumer is unaware of what was committed or when. As a result, after an application restart, some messages may be reprocessed (since consumption resumes from the last committed offset, which may include already-processed messages). The strategy guarantees at-least-once delivery. > 📝 **NOTE** > > If your consume configuration is set up to consume and write to another data store, and the write to that datastore fails, the consumer might not recover when it is auto-committed. It may not only duplicate messages, but could also drop messages intended to be in another datastore. Make sure you understand the trade-off possibilities associated with this default behavior. ### [](#manual-offset-commit)Manual offset commit The manual offset commit strategy gives consumers greater control over when commits occur. This approach is typically used when a consumer needs to align commits with an external system, such as database transactions in an RDBMS. The main advantage of manual commits is that they allow you to decide exactly when a record is considered consumed. You can use two API calls for this: `commitSync` and `commitAsync`, which differ in their blocking behavior. #### [](#synchronous-commit)Synchronous commit The advantage of synchronous commits is that consumers can take appropriate action before continuing to consume messages, albeit at the expense of increased latency (while waiting for the commit to return). The commit (`commitSync`) will also retry automatically, until it either succeeds or receives an unrecoverable error. The following example shows a synchronous commit: ```java consumer.subscribe(Arrays.asList("foo", "bar")); while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { // process records here ... // ... and at the appropriate point, call commit (not after every message) consumer.commitSync(); } } ``` #### [](#asynchronous-commit)Asynchronous commit The advantage of asynchronous commits is lower latency, because the consumer does not pause to wait for the commit response. However, there is no automatic retry of the commit (`commitAsync`) if it fails. There is also increased coding complexity (due to the asynchronous callbacks). The following example shows an asynchronous commit in which the consumer will not block. Instead, the commit call registers a callback, which is executed once the commit returns: ```java void callback() { // executed when the commit returns } consumer.subscribe(Arrays.asList("foo", "bar")); while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { // process records here ... // ... and at the appropriate point, call commit consumer.commitAsync(callback); } } ``` ### [](#external-offset-management)External offset management The external offset management strategy allows consumers to manage offsets independently of Redpanda. In this approach: - Consumers bypass the consumer group API and directly assign partitions instead of subscribing to a topic. - Offsets are not committed to Redpanda, but are instead stored in an external storage system. To implement an external offset management strategy: 1. Set `enable.auto.commit` to `false`. 2. Use `assign(Collection)` to assign partitions. 3. Use the offset provided with each ConsumerRecord to save your position. 4. Upon restart, use `seek(TopicPartition, long)` to restore the position of the consumer. ### [](#hybrid-offset-management)Hybrid offset management The hybrid offset management strategy allows consumers to handle their own consumer rebalancing while still leveraging Redpanda’s offset commit functionality. In this approach: - Consumers bypass the consumer group API and directly assign partitions instead of subscribing to a topic. - Offsets are committed to Redpanda. ## [](#offset-commit-best-practices)Offset commit best practices Follow these best practices to optimize offset commits. ### [](#avoid-over-committing)Avoid over-committing The purpose of a commit is to save consumer progress. More frequent commits reduce the amount of data to re-read after an application restart, as the commit interval directly affects the recovery point objective (RPO). Because a lower RPO is desirable, application designers may believe that committing frequently is a good design choice. However, committing too frequently can result in adverse consequences. While individually small, each commit still results in a message being written to the `__consumer_offsets` topic, because the position of the consumer against every partition must be recorded. At high commit rates, this workload can become a bottleneck for both the client and the server. Additionally, many Kafka client implementations do not coalesce offset commits, meaning redundant commits in a backlog still need to be processed. In many Kafka client implementations, offset commits aren’t coalesced at the client; so if a backlog of commits forms (when using the asynchronous commit API), the earlier commits still need to be processed, even though they are effectively redundant. **Best practice**: Monitor commit latency to ensure commits are timely. If you notice performance issues, commit less frequently. ### [](#use-unique-consumer-groups)Use unique consumer groups Like many topics, the consumer group topic has multiple partitions to help with performance. When writing commit messages, Redpanda groups all of the commits for a consumer group into a specific partition to maintain ordering. Reusing a consumer group across multiple applications, even for different topics, forces all commits to use a single partition, negating the benefits of partitioning. **Best practice**: Assign a unique consumer group to each application to distribute the commit load across all partitions. ### [](#tune-the-consumer-group)Tune the consumer group In highly parallel applications, frequent consumer group heartbeats can create unnecessary overhead. For example, 3,200 consumers checking every 500 milliseconds generate 6,400 heartbeats per second. You can optimize this behavior by increasing the `heartbeat.interval.ms` (along with `session.timeout.ms`). **Best practice**: Adjust heartbeat and session timeout settings to reduce unnecessary overhead in large-scale applications. --- # Page 343: Follower Fetching **URL**: https://docs.redpanda.com/redpanda-cloud/develop/consume-data/follower-fetching.md --- # Follower Fetching --- title: Follower Fetching latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: consume-data/follower-fetching page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: consume-data/follower-fetching.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/consume-data/follower-fetching.adoc description: Learn about follower fetching and how to configure a Redpanda consumer to fetch records from the closest replica. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Learn about follower fetching and how to configure a Redpanda consumer to fetch records from the closest replica. ## [](#about-follower-fetching)About follower fetching **Follower fetching** enables a consumer to fetch records from the closest replica of a topic partition, regardless of whether it’s a leader or a follower. For a Redpanda cluster deployed across different data centers and availability zones (AZs), restricting a consumer to fetch only from the leader of a partition can incur greater costs and have higher latency than fetching from a follower that is geographically closer to the consumer. With follower fetching (proposed in [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica)), the fetch protocol is extended to support a consumer fetching from any replica. This includes [Remote Read Replicas](../../../get-started/cluster-types/byoc/remote-read-replicas/). The first fetch from a consumer is processed by a Redpanda leader broker. The leader checks for a replica (itself or a follower) that has a rack ID that matches the consumer’s rack ID. If a replica with a matching rack ID is found, the fetch request returns records from that replica. Otherwise, the fetch is handled by the leader. ## [](#configure-follower-fetching)Configure follower fetching Redpanda decides which replica a consumer fetches from. If the consumer configures its `client.rack` property, Redpanda by default selects a replica from the same rack as the consumer, if available. For each consumer, set the `client.rack` property to a rack ID. Rack awareness is pre-enabled for cloud-based clusters in multi-AZ environments. ## [](#suggested-videos)Suggested videos - [YouTube - Redpanda Office Hour: Follower Fetching (52 mins)](https://www.youtube.com/watch?v=wV6gH5_yVaw&ab_channel=RedpandaData) --- # Page 344: Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms.md --- # Data Transforms --- title: Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/index.adoc description: Learn about WebAssembly data transforms within Redpanda Cloud. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- - [How Data Transforms Work](how-transforms-work/) Learn how Redpanda data transforms work. - [Develop Data Transforms](build/) Learn how to initialize a data transforms project and write transform functions in your chosen language. - [Configure Data Transforms](configure/) Learn how to configure data transforms in Redpanda, including editing the `transform.yaml` file, environment variables, and memory settings. This topic covers both the configuration of transform functions and the WebAssembly (Wasm) engine's environment. - [Deploy Data Transforms](deploy/) Learn how to build, deploy, share, and troubleshoot data transforms in Redpanda. - [Write Integration Tests for Transform Functions](test/) Learn how to write integration tests for data transform functions in Redpanda, including setting up unit tests and using testcontainers for integration tests. - [Monitor Data Transforms](monitor/) This topic provides guidelines on how to monitor the health of your data transforms and view logs. - [Manage Data Transforms](data-transforms/) You can monitor the status and performance metrics of your transform functions. You can also view detailed logs and delete transform functions when they are no longer needed. --- # Page 345: Develop Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/build.md --- # Develop Data Transforms --- title: Develop Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/build page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/build.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/build.adoc description: Learn how to initialize a data transforms project and write transform functions in your chosen language. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-08-27" --- > 📝 **NOTE** > > Data transforms are supported on BYOC and Dedicated clusters running Redpanda version 24.3 and later. > 💡 **TIP: When to use Redpanda Connect instead** > > Data transforms do not access external networks or disks, and are best for lightweight data preparation (filtering, scrubbing, schema/format conversion). Use [Redpanda Connect](../../connect/about/) when you need any of the following: > > - External integration (HTTP services, databases, cloud storage) for enrichment or fan-out to third-party systems > > - Batching or windowed processing for grouping/aggregation > > - Prebuilt processors and connectors to reduce custom code Learn how to initialize a data transforms project and write transform functions in your chosen language. After reading this page, you will be able to: - Initialize a data transforms project using the rpk CLI - Build transform functions that process records and write to output topics - Implement multi-topic routing patterns with Schema Registry integration ## [](#prerequisites)Prerequisites You must have the following development tools installed on your host machine: - The [`rpk` command-line client](../../../manage/rpk/rpk-install/) installed. - For Golang projects, you must have at least version 1.20 of [Go](https://go.dev/doc/install). - For Rust projects, you must have the latest stable version of [Rust](https://rustup.rs/). - For JavaScript and TypeScript projects, you must have the [latest long-term-support release of Node.js](https://nodejs.org/en/download/package-manager). ## [](#enable-data-transforms)Enable data transforms Data transforms are disabled on all clusters by default. Before you can deploy data transforms to a cluster, you must first enable the feature with the `rpk` command-line tool. To enable data transforms, set the [`data_transforms_enabled`](../../../reference/properties/cluster-properties/#data_transforms_enabled) cluster property to `true`: ```bash rpk cluster config set data_transforms_enabled true ``` > 📝 **NOTE** > > This property requires a rolling restart, and it can take several minutes for the update to complete. ## [](#init)Initialize a data transforms project To initialize a data transforms project, use the following command to set up the project files in your current directory. This command adds the latest version of the [SDK](../../../reference/data-transforms/sdks/) as a project dependency: ```bash rpk transform init --language= --name= ``` If you do not include the `--language` flag, the command prompts you for the language. Supported languages include: - `tinygo-no-goroutines` (does not include [Goroutines](https://golangdocs.com/goroutines-in-golang)) - `tinygo-with-goroutines` - `rust` - `javascript` - `typescript` For example, if you choose `tinygo-no-goroutines`, `rpk` creates the following project files: . ├── go.mod ├── go.sum ├── README.md ├── transform.go └── transform.yaml The `transform.go` file contains a boilerplate transform function. The `transform.yaml` file specifies the configuration settings for the transform function. See also: [Configure Data Transforms](../configure/) ## [](#build-transform-functions)Build transform functions You can develop your transform logic with one of the available SDKs that allow your transform code to interact with a Redpanda cluster. #### Go All transform functions must register a callback with the `OnRecordWritten()` method. You should run any initialization steps in the `main()` function because it’s only run once when the transform function is first deployed. You can also use the standard predefined [`init()` function](https://go.dev/doc/effective_go#init). ```go package main import ( "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" ) func main() { // Register your transform function. // This is a good place to perform other setup too. transform.OnRecordWritten(myTransform) } // myTransform is where you read the record that was written, and then you can // output new records that will be written to the destination topic func myTransform(event transform.WriteEvent, writer transform.RecordWriter) error { return writer.Write(event.Record()) } ``` #### Rust All transform functions must register a callback with the `on_record_written()` method. You should run any initialization steps in the `main()` function because it’s only run once when the transform function is first deployed. ```rust use redpanda_transform_sdk::*; fn main() { // Register your transform function. // This is a good place to perform other setup too. on_record_written(my_transform); } // my_transform is where you read the record that was written, and then you can // return new records that will be written to the output topic fn my_transform(event: WriteEvent, writer: &mut RecordWriter) -> Result<(), Box> { writer.write(event.record)?; Ok(()) } ``` #### JavaScript All transform functions must register a callback with the `onRecordWritten()` method. You should run any initialization steps outside of the callback so that they are only run once when the transform function is first deployed. ```js // src/index.js import { onRecordWritten } from "@redpanda-data/transform-sdk"; // This is a good place to perform setup steps. // Register your transform function. onRecordWritten((event, writer) => { // This is where you read the record that was written, and then you can // output new records that will be written to the destination topic writer.write(event.record); }); ``` If you need to use Node.js standard modules in your transform function, you must configure the [`polyfillNode` plugin](https://github.com/cyco130/esbuild-plugin-polyfill-node) for [esbuild](https://esbuild.github.io/). This plugin allows you to polyfill Node.js APIs that are not natively available in the Redpanda JavaScript runtime environment. `esbuild.js` ```js import * as esbuild from 'esbuild'; import { polyfillNode } from 'esbuild-plugin-polyfill-node'; await esbuild.build({ plugins: [ polyfillNode({ globals: { buffer: true, // Allow a global Buffer variable if referenced. process: false, // Don't inject the process global, the Redpanda JavaScript runtime does that. }, polyfills: { crypto: true, // Enable crypto polyfill // Add other polyfills as needed }, }), ], }); ``` ### [](#errors)Error handling By distinguishing between recoverable and critical errors, you can ensure that your transform functions are both resilient and robust. Handling recoverable errors internally helps maintain continuous operation, while allowing critical errors to escape ensures that the system can address severe issues effectively. Redpanda tracks the offsets of records that transform functions have processed. If an error escapes the Wasm virtual machine (VM), the VM will fail. When the Wasm engine detects this failure and starts a new VM, the transform function retries processing the input topics from the last processed offset, potentially leading to repeated failures if the underlying issue is not resolved. Handling errors internally by logging them and continuing to process subsequent records can help maintain continuous operation. However, this approach can result in silently discarding problematic records, which may lead to unnoticed data loss if the logs are not monitored closely. #### Go ```go package main import ( "log" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" ) func main() { transform.OnRecordWritten(myTransform) } func myTransform(event transform.WriteEvent, writer transform.RecordWriter) error { record := event.Record() if record.Key == nil { // Handle the error internally by logging it log.Println("Error: Record key is nil") // Skip this record and continue to process other records return nil } // Allow errors with writes to escape return writer.Write(record) } ``` #### Rust ```rust use redpanda_transform_sdk::*; use log::error; fn main() { // Set up logging env_logger::init(); on_record_written(my_transform); } fn my_transform(event: WriteEvent, writer: &mut RecordWriter) -> anyhow::Result<()> { let record = event.record; if record.key().is_none() { // Handle the error internally by logging it error!("Error: Record key is nil"); // Skip this record and continue to process other records return Ok(()); } // Allow errors with writes to escape return writer.write(record) } ``` #### JavaScript ```js import { onRecordWritten } from "@redpanda-data/transform-sdk"; // Register your transform function. onRecordWritten((event, writer) => { const record = event.record; if (!record.key) { // Handle the error internally by logging it console.error("Error: Record key is nil"); // Skip this record and continue to process other records return; } // Allow errors with writes to escape writer.write(record); }); ``` When you deploy this transform function, and produce a message without a key, you’ll get the following in the logs: ```js { "body": { "stringValue": "2024/06/20 08:17:33 Error: Record key is nil\n" }, "timeUnixNano": 1718871455235337000, "severityNumber": 13, "attributes": [ { "key": "transform_name", "value": { "stringValue": "test" } }, { "key": "node", "value": { "intValue": 0 } } ] } ``` You can view logs for transform functions using the `rpk transform logs ` command. To ensure that you are notified of any errors or issues in your data transforms, Redpanda provides metrics that you can use to monitor the state of your data transforms. See also: - [View logs for transform functions](../monitor/#logs) - [Monitor data transforms](../monitor/) - [Configure transform logging](../configure/#log) - [`rpk transform logs` reference](../../../reference/rpk/rpk-transform/rpk-transform-logs/) ### [](#avoid-state-management)Avoid state management Relying on in-memory state across transform invocations can lead to inconsistencies and unpredictable behavior. Data transforms operate with at-least-once semantics, meaning a transform function might be executed more than once for a given record. Redpanda may also restart a transform function at any point, which causes its state to be lost. ### [](#env-vars)Access environment variables You can access both [built-in and custom environment variables](../configure/#environment-variables) in your transform function. In this example, environment variables are checked once during initialization: #### Go ```go package main import ( "fmt" "os" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" ) func main() { // Check environment variables before registering the transform function. outputTopic1, ok := os.LookupEnv("REDPANDA_OUTPUT_TOPIC_1") if ok { fmt.Printf("Output topic 1: %s\n", outputTopic1) } else { fmt.Println("Only one output topic is set") } // Register your transform function. transform.OnRecordWritten(myTransform) } func myTransform(event transform.WriteEvent, writer transform.RecordWriter) error { return writer.Write(event.Record()) } ``` #### Rust ```rust use redpanda_transform_sdk::*; use std::env; use log::error; fn main() { // Set up logging env_logger::init(); // Check environment variables before registering the transform function. match env::var("REDPANDA_OUTPUT_TOPIC_1") { Ok(output_topic_1) => println!("Output topic 1: {}", output_topic_1), Err(_) => println!("Only one output topic is set"), } // Register your transform function. on_record_written(my_transform); } fn my_transform(_event: WriteEvent, _writer: &mut RecordWriter) -> anyhow::Result<()> { Ok(()) } ``` #### JavaScript ```js import { onRecordWritten } from "@redpanda-data/transform-sdk"; // Check environment variables before registering the transform function. const outputTopic1 = process.env.REDPANDA_OUTPUT_TOPIC_1; if (outputTopic1) { console.log(`Output topic 1: ${outputTopic1}`); } else { console.log("Only one output topic is set"); } // Register your transform function. onRecordWritten((event, writer) => { return writer.write(event.record); }); ``` ### [](#write-to-specific-output-topics)Write to specific output topics You can configure your transform function to write records to specific output topics based on message content, enabling powerful routing and fan-out patterns. This capability is useful for: - Filtering messages by criteria and routing to different topics - Fan-out patterns that distribute data from one input topic to multiple output topics - Event routing based on message type or schema - Data distribution for downstream consumers Wasm transforms provide a simpler alternative to external connectors like Kafka Connect for in-broker data routing, with lower latency and no additional infrastructure to manage. #### [](#basic-json-validation-example)Basic JSON validation example The following example shows a filter that outputs only valid JSON from the input topic into the output topic. The transform writes invalid JSON to a different output topic. ##### Go ```go import ( "encoding/json" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" ) func main() { transform.OnRecordWritten(filterValidJson) } func filterValidJson(event transform.WriteEvent, writer transform.RecordWriter) error { if json.Valid(event.Record().Value) { return writer.Write(event.Record()) } // Send invalid records to separate topic return writer.Write(event.Record(), transform.ToTopic("invalid-json")) } ``` ##### Rust ```rust use anyhow::Result; use redpanda_transform_sdk::*; fn main() { on_record_written(filter_valid_json); } fn filter_valid_json(event: WriteEvent, writer: &mut RecordWriter) -> Result<()> { let value = event.record.value().unwrap_or_default(); if serde_json::from_slice::(value).is_ok() { writer.write(event.record)?; } else { // Send invalid records to separate topic writer.write_with_options(event.record, WriteOptions::to_topic("invalid-json"))?; } Ok(()) } ``` ##### JavaScript The JavaScript SDK does not support writing records to a specific output topic. #### [](#multi-topic-fanout)Multi-topic fan-out with Schema Registry This example shows how to route batched updates from a single input topic to multiple output topics based on a routing field in each message. Messages are encoded with the [Schema Registry wire format](../../../manage/schema-reg/schema-reg-overview/#wire-format) for validation against the output topic schema. Consider using this pattern with Iceberg-enabled topics to fan out data directly into lakehouse tables. Input message example ```json { "updates": [ {"table": "orders", "data": {"order_id": "123", "amount": 99.99}}, {"table": "inventory", "data": {"product_id": "P456", "quantity": 50}}, {"table": "customers", "data": {"customer_id": "C789", "name": "Jane"}} ] } ``` [Configure the transform](../configure/) with multiple output topics: ```yaml name: event-router input_topic: events output_topics: - orders - inventory - customers ``` The transform extracts each update and routes it to the appropriate topic based on the `table` field. Schemas are registered dynamically in the `main()` function using the Schema Registry client, which returns the schema IDs needed for encoding messages in the wire format. > 📝 **NOTE** > > In this example, it is assumed that you have created the output topics and have the schema definitions ready. The transform registers the schemas dynamically on startup using the `{topic-name}-value` naming convention for schema subjects (for example, `orders-value`, `inventory-value`). ##### Go `go.mod` ```go module fanout-example go 1.20 require github.com/redpanda-data/redpanda/src/transform-sdk/go/transform v1.1.0 // v1.1.0+ required ``` `transform.go`: ```go package main import ( "encoding/binary" "encoding/json" "log" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform/sr" ) // Input message structure with array of updates type BatchMessage struct { Updates []TableUpdate `json:"updates"` } // Individual table update with routing field type TableUpdate struct { Table string `json:"table"` // Routing field - determines output topic Data json.RawMessage `json:"data"` // The actual data to write } // Schema IDs for each output topic, registered dynamically at startup var schemaIDs = make(map[string]int) func main() { // Create Schema Registry client client := sr.NewClient() // Define schemas for each output topic schemas := map[string]string{ "orders": `{"type":"record","name":"Order","fields":[{"name":"order_id","type":"string"},{"name":"amount","type":"double"}]}`, "inventory": `{"type":"record","name":"Inventory","fields":[{"name":"product_id","type":"string"},{"name":"quantity","type":"int"}]}`, "customers": `{"type":"record","name":"Customer","fields":[{"name":"customer_id","type":"string"},{"name":"name","type":"string"}]}`, } // Register schemas and store their IDs for topic, schemaStr := range schemas { subject := topic + "-value" schema := sr.Schema{ Schema: schemaStr, Type: sr.TypeAvro, } result, err := client.CreateSchema(subject, schema) if err != nil { log.Fatalf("Failed to register schema for %s: %v", topic, err) } schemaIDs[topic] = result.ID log.Printf("Registered schema for %s with ID %d", topic, result.ID) } log.Printf("Starting fanout transform with schema IDs: %v", schemaIDs) transform.OnRecordWritten(routeUpdates) } func routeUpdates(event transform.WriteEvent, writer transform.RecordWriter) error { var batch BatchMessage if err := json.Unmarshal(event.Record().Value, &batch); err != nil { log.Printf("Failed to parse batch message: %v", err) return nil // Skip invalid records } // Process each update in the batch for i, update := range batch.Updates { schemaID, exists := schemaIDs[update.Table] if !exists { log.Printf("Unknown table in update %d: %s", i, update.Table) continue } if err := writeUpdate(update, schemaID, writer, event); err != nil { log.Printf("Failed to write update %d to %s: %v", i, update.Table, err) } } return nil } func writeUpdate(update TableUpdate, schemaID int, writer transform.RecordWriter, event transform.WriteEvent) error { // Create Schema Registry wire format: [magic_byte, schema_id (4 bytes BE), data...] value := make([]byte, 5) value[0] = 0 // magic byte binary.BigEndian.PutUint32(value[1:5], uint32(schemaID)) value = append(value, update.Data...) record := transform.Record{ Key: event.Record().Key, Value: value, } return writer.Write(record, transform.ToTopic(update.Table)) } ``` ##### Rust `Cargo.toml` ```toml [package] name = "fanout-rust-example" version = "0.1.0" edition = "2021" [dependencies] redpanda-transform-sdk = "1.1.0" # v1.1.0+ required for WriteOptions API redpanda-transform-sdk-sr = "1.1.0" serde = { version = "1", features = ["derive"] } serde_json = "1" log = "0.4" env_logger = "0.11" [profile.release] opt-level = "z" lto = true strip = true ``` `src/main.rs`: ```rust use redpanda_transform_sdk::*; use redpanda_transform_sdk_sr::{SchemaRegistryClient, Schema, SchemaFormat}; use serde::Deserialize; use std::collections::HashMap; use std::error::Error; use std::sync::OnceLock; use log::{info, error}; #[derive(Deserialize)] struct BatchMessage { updates: Vec, } #[derive(Deserialize)] struct TableUpdate { table: String, data: serde_json::Value, } // Schema IDs for each output topic, registered dynamically at startup static SCHEMA_IDS: OnceLock> = OnceLock::new(); fn main() { // Initialize logging env_logger::init(); // Create Schema Registry client let mut client = SchemaRegistryClient::new(); // Define schemas for each output topic let schemas = [ ("orders", r#"{"type":"record","name":"Order","fields":[{"name":"order_id","type":"string"},{"name":"amount","type":"double"}]}"#), ("inventory", r#"{"type":"record","name":"Inventory","fields":[{"name":"product_id","type":"string"},{"name":"quantity","type":"int"}]}"#), ("customers", r#"{"type":"record","name":"Customer","fields":[{"name":"customer_id","type":"string"},{"name":"name","type":"string"}]}"#), ]; let mut schema_ids = HashMap::new(); // Register schemas and store their IDs for (topic, schema_str) in schemas { let subject = format!("{}-value", topic); let schema = Schema::new(schema_str.to_string(), SchemaFormat::Avro, vec![]); match client.create_schema(&subject, schema) { Ok(result) => { let id = result.id(); // SchemaId type schema_ids.insert(topic.to_string(), id.0); // Extract i32 from SchemaId wrapper info!("Registered schema for {} with ID {}", topic, id.0); } Err(e) => { error!("Failed to register schema for {}: {}", topic, e); panic!("Schema registration failed"); } } } let _ = SCHEMA_IDS.set(schema_ids); info!("Starting fanout transform with schema IDs"); on_record_written(route_updates); } fn write_update( update: &TableUpdate, schema_id: i32, writer: &mut RecordWriter, event: &WriteEvent, ) -> Result<(), Box> { // Create Schema Registry wire format: [magic_byte, schema_id (4 bytes BE), data...] let mut value = vec![0u8; 5]; value[0] = 0; // magic byte value[1..5].copy_from_slice(&schema_id.to_be_bytes()); let data_bytes = serde_json::to_vec(&update.data)?; value.extend_from_slice(&data_bytes); let key = event.record.key().map(|k| k.to_vec()); let record = BorrowedRecord::new(key.as_deref(), Some(&value)); writer.write_with_options(record, WriteOptions::to_topic(&update.table))?; Ok(()) } fn route_updates(event: WriteEvent, writer: &mut RecordWriter) -> Result<(), Box> { let batch: BatchMessage = serde_json::from_slice(event.record.value().unwrap_or_default())?; let schema_ids = SCHEMA_IDS.get().unwrap(); for update in batch.updates.iter() { if let Some(&schema_id) = schema_ids.get(&update.table) { write_update(update, schema_id, writer, &event)?; } } Ok(()) } ``` ##### JavaScript The JavaScript SDK does not support writing records to specific output topics. For multi-topic fan-out, use the Go or Rust SDK. ### [](#connect-to-the-schema-registry)Connect to the Schema Registry You can use the Schema Registry client library to read and write schemas as well as serialize and deserialize records. This client library is useful when working with schema-based topics in your data transforms. See also: - [Redpanda Schema Registry](../../../manage/schema-reg/schema-reg-overview/) - [Go Schema Registry client reference](../../../reference/data-transforms/golang-sdk/) - [Rust Schema Registry client reference](../../../reference/data-transforms/rust-sdk/) - [JavaScript Schema Registry client reference](../../../reference/data-transforms/js/js-sdk-sr/) ## [](#next-steps)Next steps [Configure Data Transforms](../configure/) ## [](#suggested-reading)Suggested reading - [How Data Transforms Work](../how-transforms-work/) - [Data Transforms SDKs](../../../reference/data-transforms/sdks/) - [`rpk transform` commands](../../../reference/rpk/rpk-transform/rpk-transform/) --- # Page 346: Configure Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/configure.md --- # Configure Data Transforms --- title: Configure Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/configure page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/configure.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/configure.adoc description: Learn how to configure data transforms in Redpanda, including editing the transform.yaml file, environment variables, and memory settings. This topic covers both the configuration of transform functions and the WebAssembly (Wasm) engine's environment. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- Learn how to configure data transforms in Redpanda, including editing the `transform.yaml` file, environment variables, and memory settings. This topic covers both the configuration of transform functions and the WebAssembly (Wasm) engine’s environment. ## [](#configure-transform-functions)Configure transform functions This section covers how to configure transform functions using the `transform.yaml` configuration file, command-line overrides, and environment variables. ### [](#config-file)Transform configuration file When you [initialize](../build/#init) a data transforms project, a `transform.yaml` file is generated in the provided directory. You can use this configuration file to configure the transform function with settings, including input and output topics, the language used for the data transform, and any environment variables. - `name`: The name of the transform function. - `description`: A description of what the transform function does. - `input-topic`: The topic from which data is read. - `output-topics`: A list of up to eight topics to which the transformed data is written. - `language`: The language used for the transform function. The language is set to the one you defined during [initialization](../build/#init). - `env`: A dictionary of custom environment variables that are passed to the transform function. Do not prefix keys with `REDPANDA_`. Check the list of all [limitations](../how-transforms-work/#limitations). Here is an example of a transform.yaml file: ```yaml name: redpanda-example description: | This transform function is an example to demonstrate how to configure data transforms in Redpanda. input-topic: example-input-topic output-topics: - example-output-topic-1 - example-output-topic-2 language: tinygo-no-goroutines env: DATA_TRANSFORMS_ARE_AWESOME: 'true' ``` ### [](#cl)Override configurations with command-line options You can set the name of the transform function, environment variables, and input and output topics on the command-line when you deploy the transform. These command-line settings take precedence over those specified in the `transform.yaml` file. See [Deploy Data Transforms](../deploy/) ### [](#built-in)Built-In environment variables As well as custom environment variables set in either the [command-line](#cl) or the [configuration file](#config-file), Redpanda makes some built-in environment variables available to your transform functions. These variables include: - `REDPANDA_INPUT_TOPIC`: The input topic specified. - `REDPANDA_OUTPUT_TOPIC_0..REDPANDA_OUTPUT_TOPIC_N`: The output topics in the order specified on the command line or in the configuration file. For example, `REDPANDA_OUTPUT_TOPIC_0` is the first variable, `REDPANDA_OUTPUT_TOPIC_1` is the second variable, and so on. Transform functions are isolated from the broker’s internal environment variables to maintain security and encapsulation. Each transform function only uses the environment variables explicitly provided to it. ## [](#configure-the-wasm-engine)Configure the Wasm engine This section covers how to configure the Wasm engine environment using Redpanda cluster configuration properties. ### [](#enable-transforms)Enable data transforms To use data transforms, you must enable it for a Redpanda cluster using the [`data_transforms_enabled`](../../../reference/properties/cluster-properties/#data_transforms_enabled) property. ### [](#log)Configure transform logging The following properties configure logging for data transforms: - [`data_transforms_logging_line_max_bytes`](../../../reference/properties/cluster-properties/#data_transforms_logging_line_max_bytes): Increase this value if your log messages are frequently truncated. Setting this value too low may truncate important log information. ## [](#next-steps)Next steps [Deploy Data Transforms](../deploy/) --- # Page 347: Manage Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/data-transforms.md --- # Manage Data Transforms --- title: Manage Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/data-transforms page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/data-transforms.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/data-transforms.adoc description: You can monitor the status and performance metrics of your transform functions. You can also view detailed logs and delete transform functions when they are no longer needed. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- You can monitor the status and performance metrics of your transform functions. You can also view detailed logs and delete transform functions when they are no longer needed. ## [](#prerequisites)Prerequisites Before you begin, ensure that you have the following: - [Data transforms enabled](../configure/#enable-transforms) in your Redpanda cluster. - At least one transform function deployed to your Redpanda cluster. ## [](#monitor)Monitor transform functions To monitor transform functions: 1. Navigate to the **Transforms** menu. 2. Click the name of a transform function to view detailed information: - The partitions that the function is running on - The broker (node) ID - Any lag (the amount of pending records on the input topic that have yet to be processed by the transform) ## [](#logs)View logs To view logs for a transform function: 1. Navigate to the **Transforms** menu. 2. Click on the name of a transform function. 3. Click the **Logs** tab to see the logs. Redpanda Cloud displays a limited number of logs for transform functions. To view the full history of logs, use the [`rpk` command-line tool](../monitor/#logs). ## [](#delete)Delete transform functions To delete a transform function: 1. Navigate to the **Transforms** menu. 2. Find the transform function you want to delete from the list. 3. Click the delete icon at the end of the row. 4. Confirm the deletion when prompted. Deleting a transform function will remove it from the cluster and stop any further processing. ## [](#suggested-reading)Suggested reading - [How Data Transforms Work](../how-transforms-work/) - [Deploy Data Transforms](../deploy/) - [Monitor Data Transforms](../monitor/) --- # Page 348: Deploy Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/deploy.md --- # Deploy Data Transforms --- title: Deploy Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/deploy page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/deploy.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/deploy.adoc description: Learn how to build, deploy, share, and troubleshoot data transforms in Redpanda. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- Learn how to build, deploy, share, and troubleshoot data transforms in Redpanda. ## [](#prerequisites)Prerequisites Before you begin, ensure that you have the following: - [Data transforms enabled](../configure/#enable-transforms) in your Redpanda cluster. - The [`rpk` command-line client](../../../manage/rpk/rpk-install/). - A [data transform](../build/) project. ## [](#build)Build the Wasm binary To build a Wasm binary: 1. Ensure your project directory contains a `transform.yaml` file. 2. Build the Wasm binary using the [`rpk transform build`](../../../reference/rpk/rpk-transform/rpk-transform-build/) command. ```bash rpk transform build ``` You should now have a Wasm binary named `.wasm`, where `` is the name specified in your `transform.yaml` file. This binary is your data transform function, ready to be deployed to a Redpanda cluster or hosted on a network for others to use. ## [](#deploy)Deploy the Wasm binary You can deploy your transform function using the [`rpk transform deploy`](../../../reference/rpk/rpk-transform/rpk-transform-deploy/) command. 1. Validate your setup against the pre-deployment checklist: - Do you meet the [Prerequisites](#prerequisites)? - Does your transform function access any environment variables? If so, make sure to set them in the `transform.yaml` file or in the command-line when you deploy the binary. - Do your configured input and output topics already exist? Input and output topics must exist in your Redpanda cluster before you deploy the Wasm binary. 2. Deploy the Wasm binary: ```bash rpk transform deploy ``` When the transform function reaches Redpanda, it starts processing new records that are written to the input topic. ### [](#reprocess)Reprocess records In some cases, you may need to reprocess records from an input topic that already contains data. Processing existing records can be useful, for example, to process historical data into a different format for a new consumer, to re-create lost data from a deleted topic, or to resolve issues with a previous version of a transform that processed data incorrectly. To reprocess records, you can specify the starting point from which the transform function should process records in each partition of the input topic. The starting point can be either a partition offset or a timestamp. > 📝 **NOTE** > > The `--from-offset` flag is only effective the first time you deploy a transform function. On subsequent deployments of the same function, Redpanda resumes processing from the last committed offset. To reprocess existing records using an existing function, [delete the function](#delete) and redeploy it with the `--from-offset` flag. To deploy a transform function and start processing records from a specific partition offset, use the following syntax: ```bash rpk transform deploy --from-offset +/- ``` In this example, the transform function will start processing records from the beginning of each partition of the input topic: ```bash rpk transform deploy --from-offset +0 ``` To deploy a transform function and start processing records from a specific timestamp, use the following syntax: ```bash rpk transform deploy --from-timestamp @ ``` In this example, the transform function will start processing from the first record in each partition of the input topic that was committed after the given timestamp: ```bash rpk transform deploy --from-timestamp @1617181723 ``` ### [](#share-wasm-binaries)Share Wasm binaries You can also deploy data transforms on a Redpanda cluster by providing an addressable path to the Wasm binary. This is useful for sharing transform functions across multiple clusters or teams within your organization. For example, if the Wasm binary is hosted at `https://my-site/my-transform.wasm`, use the following command to deploy it: ```bash rpk transform deploy --file=https://my-site/my-transform.wasm ``` ## [](#edit-existing-transform-functions)Edit existing transform functions To make changes to an existing transform function: 1. [Make your changes to the code](../build/). 2. [Rebuild](#build) the Wasm binary. 3. [Redeploy](#deploy) the Wasm binary to the same Redpanda cluster. When you redeploy a Wasm binary with the same name, it will resume processing from the last offset it had previously processed. If you need to [reprocess existing records](#reprocess), you must delete the transform function, and redeploy it with the `--from-offset` flag. Deploy-time configuration overrides must be provided each time you redeploy a Wasm binary. Otherwise, they will be overwritten by default values or the configuration file’s contents. ## [](#delete)Delete a transform function To delete a transform function, use the following command: ```bash rpk transform delete ``` For more details about this command, see [rpk transform delete](../../../reference/rpk/rpk-transform/rpk-transform-delete/). > 💡 **TIP** > > You can also delete transform functions in Redpanda Cloud. ## [](#troubleshoot)Troubleshoot This section provides guidance on how to diagnose and troubleshoot issues with building or deploying data transforms. ### [](#invalid-transform-environment)Invalid transform environment This error means that one or more of your configured custom environment variables are invalid. Check your custom environment variables against the list of [limitations](../how-transforms-work/#limitations). ### [](#invalid-webassembly)Invalid WebAssembly This error indicates that the binary is missing a required callback function: Invalid WebAssembly - the binary is missing required transform functions. Check the broker support for the version of the data transforms SDK being used. All transform functions must register a callback with the `OnRecordWritten()` method. For more details, see [Develop Data Transforms](../build/). ## [](#next-steps)Next steps [Set up monitoring](../monitor/) for data transforms. --- # Page 349: How Data Transforms Work **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/how-transforms-work.md --- # How Data Transforms Work --- title: How Data Transforms Work latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/how-transforms-work page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/how-transforms-work.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/how-transforms-work.adoc description: Learn how Redpanda data transforms work. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > Data transforms are supported on BYOC and Dedicated clusters running Redpanda version 24.3 and later. Redpanda provides the framework to build and deploy inline transformations (data transforms) on data written to Redpanda topics, delivering processed and validated data to consumers in the format they expect. Redpanda does this directly inside the broker, eliminating the need to manage a separate stream processing environment or use third-party tools. ![Data transforms in a broker](../../../shared/_images/wasm1.png) Data transforms let you run common data streaming tasks, like filtering, scrubbing, and transcoding, within Redpanda. For example, you may have consumers that require you to redact credit card numbers or convert JSON to Avro. Data transforms can also interact with the Redpanda Schema Registry to work with encoded data types. To learn how to build and deploy data transforms, see [How Data Transforms Work](./). ## [](#data-transforms-with-webassembly)Data transforms with WebAssembly Data transforms use [WebAssembly](https://webassembly.org/) (Wasm) engines inside a Redpanda broker, allowing Redpanda to control the entire transform lifecycle. For example, Redpanda can stop and start transforms when partitions are moved or to free up system resources for other tasks. Data transforms take data from an input topic and map it to one or more output topics. For each topic partition, a leader is responsible for handling the data. Redpanda runs a Wasm virtual machine (VM) on the same CPU core (shard) as these partition leaders to execute the transform function. Transform functions are the specific implementations of code that carry out the transformations. They read data from input topics, apply the necessary processing logic, and write the transformed data to output topics. To execute a transform function, Redpanda uses just-in-time (JIT) compilation to compile the bytecode in memory, write it to an executable space, then run the directly translated machine code. This JIT compilation ensures efficient execution of the machine code, as it is tailored to the specific hardware it runs on. When you deploy a data transform to a Redpanda broker, it stores the Wasm bytecode and associated metadata, such as input and output topics and environment variables. The broker then replicates this data across the cluster using internal Kafka topics. When the data is distributed, each shard runs its own instance of the transform function. This process includes several resource management features: - Each shard can run only one instance of the transform function at a time to ensure efficient resource utilization and prevent overload. - CPU time is dynamically allocated to the Wasm runtime to ensure that the code does not run forever and cannot block the broker from handling traffic or doing other work, such as Tiered Storage uploads. ## [](#flow-of-data-transforms)Flow of data transforms When a shard becomes the leader of a given partition on the input topic of one or more active transforms, Redpanda does the following: 1. Spins up a Wasm VM using the JIT-compiled Wasm module. 2. Pushes records from the input partition into the Wasm VM. 3. Writes the output. The output partition may exist on the same broker or on another broker in the cluster. Within Redpanda, a single Raft controller manages cluster information, including data transforms. On every shard, Redpanda knows what data transforms exist in the cluster, as well as metadata about the transform function, such as input and output topics and environment variables. ![Wasm architecture in Redpanda](../../../shared/_images/wasm_architecture.png) Each transform function reads from a specified input topic and writes to a specified output topic. The transform function processes every record produced to an input topic and returns zero or more records that are then produced to the specified output topic. Data transforms are applied to all partitions on an input topic. A record is processed after it has been successfully written to disk on the input topic. Because the transform happens in the background after the write finishes, the transform doesn’t affect the original produced record, doesn’t block writes to the input topic, and doesn’t block produce and consume requests. A new transform function reads the input topic from the latest offset. That is, it only reads new data produced to the input topic: it does not read records produced to the input topic before the transform was deployed. If a partition leader moves from one broker to another, then the instance of the transform function assigned to that partition moves with it. When a partition replica [loses leadership](../../../get-started/architecture/#partition-leadership-elections), the broker hosting that partition replica stops the instance of the transform function running on the same shard. The broker that is now hosting the partition’s new leader starts the transform function on the same shard as that leader, and the transform function resumes from the last committed offset. If the previous instance of the transform function failed to commit its latest offsets before moving with the partition leader (for example, if the broker crashed), then it’s likely that the new instance will reprocess some events. For broker failures, transform functions have at-least-once semantics, because records are retried from the committed last offset, and offsets are committed periodically. For more information, see [How Data Transforms Work](./). ## [](#limitations)Limitations This section outlines the limitations of data transforms. These constraints are categorized into general limitations affecting the overall functionality and specific limitations related to giving data transforms access to custom environment variables. ### [](#general)General - **No external access**: Transform functions have no external access to disk or network resources. - **Single message transforms**: Only single record transforms are supported, but multiple output records from a single input record are supported. For aggregations, joins, or complex transformations, consider using [Redpanda Connect](../../../../redpanda-connect/get-started/about/) or [Apache Flink](https://flink.apache.org/). - **Output topic limit**: Up to eight output topics are supported. - **Delivery semantics**: Transform functions have at-least-once delivery. - **Transactions API**: When clients use the Kafka Transactions API on partitions of an input topic, transform functions process only committed records. ### [](#javascript)JavaScript - **No native extensions**: Native Node.js extensions are not supported. Packages that require compiling native code or interacting with low-level system features cannot be used. - **Limited Node.js standard modules**: Only modules that can be polyfilled by the [esbuild plugin](https://www.npmjs.com/package/esbuild-plugin-polyfill-node#implemented-polyfills) can be used. Even if a module can be polyfilled, certain functionalities, such as network connections, will not work because the necessary browser APIs are not exposed in the Redpanda JavaScript runtime environment. For example, while the plugin can provide stubs for some Node.js modules such as `http` and `process`, these stubs will not work in the Redpanda JavaScript runtime environment. - **No write options**: The JavaScript SDK does not support write options, such as specifying which output topic to write to. ### [](#environment-variables)Environment variables - **Maximum number of variables**: You can set up to 128 custom environment variables. - **Reserved prefix**: Variable keys must not start with `REDPANDA_`. This prefix is reserved for [built-in environment variables](../configure/#built-in). - **Key length**: Each key must be less than 128 bytes in length. - **Total value length**: The combined length of all values for the environment variables must be less than 2000 bytes. - **Encoding**: All keys and values must be encoded in UTF-8. - **Control characters**: Keys and values must not contain any control characters, such as null bytes. ## [](#suggested-reading)Suggested reading - [Golang SDK for Data Transforms](../../../reference/data-transforms/golang-sdk/) - [Rust SDK for Data Transforms](../../../reference/data-transforms/rust-sdk/) - [`rpk transform` commands](../../../reference/rpk/rpk-transform/rpk-transform/) --- # Page 350: Monitor Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/monitor.md --- # Monitor Data Transforms --- title: Monitor Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/monitor page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/monitor.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/monitor.adoc description: This topic provides guidelines on how to monitor the health of your data transforms and view logs. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- This topic provides guidelines on how to monitor the health of your data transforms and view logs. ## [](#prerequisites)Prerequisites [Set up monitoring](../../../manage/monitor-cloud/) for your cluster. ## [](#performance)Performance You can identify performance bottlenecks by monitoring latency and CPU usage: - [`redpanda_transform_execution_latency_sec`](../../../reference/public-metrics-reference/#redpanda_transform_execution_latency_sec) - [`redpanda_wasm_engine_cpu_seconds_total`](../../../reference/public-metrics-reference/#redpanda_wasm_engine_cpu_seconds_total) If latency is high, investigate the transform logic for inefficiencies or consider scaling the resources. High CPU usage might indicate the need for optimization in the code or an increase in [allocated CPU resources](../configure/). ## [](#reliability)Reliability Tracking execution errors and error states helps in maintaining the reliability of your data transforms: - [`redpanda_transform_execution_errors`](../../../reference/public-metrics-reference/#redpanda_transform_execution_errors) - [`redpanda_transform_failures`](../../../reference/public-metrics-reference/#redpanda_transform_failures) - [`redpanda_transform_state`](../../../reference/public-metrics-reference/#redpanda_transform_state) Make sure to [implement robust error handling and logging](../build/#errors) within your transform functions to help with troubleshooting. ## [](#resource-usage)Resource usage Monitoring memory usage metrics and total execution time ensures that the Wasm engine does not exceed allocated resources, helping in efficient resource management: - [`redpanda_wasm_engine_memory_usage`](../../../reference/public-metrics-reference/#redpanda_wasm_engine_memory_usage) - [`redpanda_wasm_engine_max_memory`](../../../reference/public-metrics-reference/#redpanda_wasm_engine_max_memory) - [`redpanda_wasm_binary_executable_memory_usage`](../../../reference/public-metrics-reference/#redpanda_wasm_binary_executable_memory_usage) If memory usage is consistently high or exceeds the maximum allocated memory: - Review and optimize your transform functions to reduce memory consumption. This step can involve optimizing data structures, reducing memory allocations, and ensuring efficient handling of records. ## [](#throughput)Throughput Keeping track of read and write bytes and processor lag helps in understanding the data flow through your transforms, enabling better capacity planning and scaling: - [`redpanda_transform_read_bytes`](../../../reference/public-metrics-reference/#redpanda_transform_read_bytes) - [`redpanda_transform_write_bytes`](../../../reference/public-metrics-reference/#redpanda_transform_write_bytes) - [`redpanda_transform_processor_lag`](../../../reference/public-metrics-reference/#redpanda_transform_processor_lag) If there is a significant lag or low throughput, investigate potential bottlenecks in the data flow or consider scaling your infrastructure to handle higher throughput. ## [](#logs)View logs for data transforms Runtime logs for transform functions are written to an internal topic called `_redpanda.transform_logs`. You can read these logs by using the [`rpk transform logs`](../../../reference/rpk/rpk-transform/rpk-transform-logs/) command. ```bash rpk transform logs ``` Replace `` with the [configured name](../configure/) of the transform function. > 💡 **TIP** > > You can also view logs in the UI. By default, Redpanda provides several settings to manage logging for data transforms, such as buffer capacity, flush interval, and maximum log line length. These settings ensure that logging operates efficiently without overwhelming the system. However, you may need to adjust these settings based on your specific requirements and workloads. For information on how to configure logging, see the [Configure transform logging](../configure/#log) section of the configuration guide. ## [](#suggested-reading)Suggested reading - [Data transforms metrics](../../../reference/public-metrics-reference/#data_transform_metrics) --- # Page 351: Write Integration Tests for Transform Functions **URL**: https://docs.redpanda.com/redpanda-cloud/develop/data-transforms/test.md --- # Write Integration Tests for Transform Functions --- title: Write Integration Tests for Transform Functions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/test page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/test.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/data-transforms/test.adoc description: Learn how to write integration tests for data transform functions in Redpanda, including setting up unit tests and using testcontainers for integration tests. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- Learn how to write integration tests for data transform functions in Redpanda, including setting up unit tests and using testcontainers for integration tests. This guide covers how to write both unit tests and integration tests for your transform functions. While unit tests focus on testing individual components in isolation, integration tests verify that the components work together as expected in a real environment. ## [](#unit-tests)Unit tests You can create unit tests for transform functions by mocking the interfaces injected into the transform function and asserting that the input and output work correctly. This typically includes mocking the `WriteEvent` and `RecordWriter` interfaces. ```go package main import ( "testing" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/mock" "github.com/redpanda-data/redpanda/src/transform-sdk/go/transform" ) // MockWriteEvent is a mock implementation of the WriteEvent interface. type MockWriteEvent struct { mock.Mock } func (m *MockWriteEvent) Record() transform.Record { args := m.Called() return args.Get(0).(transform.Record) } // MockRecordWriter is a mock implementation of the RecordWriter interface. type MockRecordWriter struct { mock.Mock } func (m *MockRecordWriter) Write(record transform.Record) error { args := m.Called(record) return args.Error(0) } // copyRecord copies the record to the output topic. func copyRecord(event transform.WriteEvent, writer transform.RecordWriter) error { record := event.Record() return writer.Write(record) } // TestCopyRecord tests the copyRecord function. func TestCopyRecord(t *testing.T) { // Create mocks for the WriteEvent and RecordWriter event := new(MockWriteEvent) writer := new(MockRecordWriter) // Set up the expected behavior record := transform.Record{Value: []byte("test")} event.On("Record").Return(record) writer.On("Write", record).Return(nil) // Call the function under test err := copyRecord(event, writer) // Assert that no error occurred and that the expectations were met assert.NoError(t, err) event.AssertExpectations(t) writer.AssertExpectations(t) } ``` To run your unit tests, use the following command: ```bash go test ``` This will execute all tests in the current directory. ## [](#integration-tests)Integration tests Integration tests verify that your transform functions work correctly in a real Redpanda environment. You can use [testcontainers](https://github.com/testcontainers/testcontainers-go/tree/main) to set up and manage a Redpanda instance for testing. For more detailed examples and helper code for setting up integration tests, refer to the SDK integration tests on [GitHub](https://github.com/redpanda-data/redpanda/tree/dev/src/transform-sdk/tests). --- # Page 352: Use Redpanda with the HTTP Proxy API **URL**: https://docs.redpanda.com/redpanda-cloud/develop/http-proxy.md --- # Use Redpanda with the HTTP Proxy API --- title: Use Redpanda with the HTTP Proxy API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: http-proxy page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: http-proxy.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/http-proxy.adoc description: HTTP Proxy exposes a REST API to list topics, produce events, and subscribe to events from topics using consumer groups. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Redpanda HTTP Proxy (`pandaproxy`) allows access to your data through a REST API. For example, you can list topics or brokers, get events, produce events, subscribe to events from topics using consumer groups, and commit offsets for a consumer. See the [HTTP Proxy API reference](/api/doc/http-proxy/) for a full list of available endpoints. > 📝 **NOTE** > > The HTTP Proxy API is supported for BYOC and Dedicated clusters only. ## [](#prerequisites)Prerequisites ### [](#start-redpanda)Start Redpanda To log in to your Redpanda Cloud account, run `rpk cloud login`. HTTP Proxy is enabled by default on port 30082. For clusters with private connectivity (AWS PrivateLink, GCP Private Service Connect, and Azure Private Link) enabled, the default seed port for HTTP Proxy is 30282. You can find the HTTP Proxy endpoint on the **How to connect** section of the cluster overview in the Cloud UI. > 📝 **NOTE** > > The rest of this guide assumes that the HTTP Proxy port is `30082`. ## [](#authenticate-with-http-proxy)Authenticate with HTTP Proxy HTTP Proxy supports authentication using SCRAM credentials or OIDC tokens. The authentication method depends on the cluster’s [`http_authentication`](../../reference/properties/cluster-properties/#http_authentication) settings. ### [](#scram-authentication)SCRAM Authentication If HTTP Proxy is configured to support SASL, you can provide the SCRAM username and password as part of the Basic Authentication header in your request. For example, to list topics as an authenticated user: #### curl ```bash curl -s -u ":" "http://:30082/topics" ``` #### NodeJS ```javascript let options = { auth: { username: "", password: "" }, }; axios .get("http://:30082/topics", options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python auth = ("", "") res = requests.get("http://:30082/topics", auth=auth).json() pretty(res) ``` ### [](#oidc-authentication)OIDC Authentication If HTTP Proxy is configured to support OIDC, you can provide an OIDC token in the Authorization header. For example: #### curl ```bash curl -s -H "Authorization: Bearer " "http://:30082/topics" ``` #### NodeJS ```javascript let options = { headers: { Authorization: `Bearer ` }, }; axios .get("http://:30082/topics", options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python headers = {"Authorization": "Bearer "} res = requests.get("http://:30082/topics", headers=headers).json() pretty(res) ``` ## [](#set-up-libraries)Set up libraries You need an app that calls the HTTP Proxy endpoint. This app can be curl (or a similar CLI), or it could be your own custom app written in any language. Below are curl, JavaScript and Python examples. > 📝 **NOTE** > > In the examples, `` refers to your Redpanda cluster’s hostname or IP address. All following examples use a `base_uri` variable that combines the protocol, host, and port for consistency across curl, JavaScript, and Python examples. ### curl Curl is likely already installed on your system. If not, see [curl download instructions](https://curl.se/download.html). Set the base URI for your HTTP Proxy: ```bash base_uri="http://:30082" ``` ### NodeJS > 📝 **NOTE** > > This is based on the assumption that you’re in the root directory of an existing NodeJS project. See [Build a Chat Room Application with Redpanda and Node.js](../../../redpanda-labs/clients/docker-nodejs/) for an example of a NodeJS project. In a terminal window, run: ```bash npm install axios ``` Import the library into your code: ```javascript const axios = require('axios'); const base_uri = 'http://:30082'; ``` ### Python In a terminal window, run: ```bash pip install requests ``` Import the library into your code: ```python import requests import json def pretty(text): print(json.dumps(text, indent=2)) base_uri = "http://:30082" ``` ## [](#create-a-topic)Create a topic To create a test topic for this guide, use [`rpk`](../../manage/rpk/rpk-install/). You can configure `rpk` for your Redpanda deployment, using [profiles](../../manage/rpk/config-rpk-profile/), flags, or [environment variables](../../reference/rpk/rpk-x-options/#environment-variables). To create a topic named `test_topic` with three partitions, run: ```bash rpk topic create test_topic -p 3 ``` For more information, see the [rpk topic create](../../reference/rpk/rpk-topic/rpk-topic-create/) reference. ## [](#access-your-data)Access your data Here are some sample commands to produce and consume streams: ### [](#get-list-of-topics)Get list of topics #### curl ```bash curl -s "$base_uri/topics" ``` #### NodeJS ```javascript axios .get(`${base_uri}/topics`) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application. If your file name is `index.js` for example, you would run the following command: ```bash node index.js ``` #### Python ```python res = requests.get(f"{base_uri}/topics").json() pretty(res) ``` Expected output: ```bash ["test_topic"] ``` ### [](#send-events-to-a-topic)Send events to a topic Use POST to send events in the REST endpoint query. The header must include the following line: Content-Type:application/vnd.kafka.json.v2+json The following commands show how to send events to `test_topic`: #### curl ```bash curl -s \ -X POST \ "$base_uri/topics/test_topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"Redpanda", "partition":0 }, { "value":"HTTP proxy", "partition":1 }, { "value":"Test event", "partition":2 } ] }' ``` #### NodeJS ```javascript let payload = { records: [ { "value":"Redpanda", "partition": 0 }, { "value":"HTTP proxy", "partition": 1 }, { "value":"Test event", "partition": 2 } ]}; let options = { headers: { "Content-Type" : "application/vnd.kafka.json.v2+json" }}; axios .post(`${base_uri}/topics/test_topic`, payload, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.post( url=f"{base_uri}/topics/test_topic", data=json.dumps( dict(records=[ dict(value="Redpanda", partition=0), dict(value="HTTP Proxy", partition=1), dict(value="Test Event", partition=2) ])), headers={"Content-Type": "application/vnd.kafka.json.v2+json"}).json() pretty(res) ``` Expected output (may be formatted differently depending on the chosen application): ```bash {"offsets":[{"partition":0,"offset":0},{"partition":2,"offset":0},{"partition":1,"offset":0}]} ``` ### [](#get-events-from-a-topic)Get events from a topic After events have been sent to the topic, you can retrieve these same events. #### curl ```bash curl -s \ "$base_uri/topics/test_topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` #### NodeJS ```javascript let options = { headers: { accept: "application/vnd.kafka.json.v2+json" }, params: { offset: 0, timeout: "1000", max_bytes: "100000", }, }; axios .get(`${base_uri}/topics/test_topic/partitions/0/records`, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.get( url=f"{base_uri}/topics/test_topic/partitions/0/records", params={"offset": 0, "timeout":1000,"max_bytes":100000}, headers={"Accept": "application/vnd.kafka.json.v2+json"}).json() pretty(res) ``` Expected output: ```bash [{"topic":"test_topic","key":null,"value":"Redpanda","partition":0,"offset":0}] ``` ### [](#get-list-of-brokers)Get list of brokers #### curl ```bash curl "$base_uri/brokers" ``` #### NodeJS ```javascript axios .get(`${base_uri}/brokers`) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python res = requests.get(f"{base_uri}/brokers").json() pretty(res) ``` Expected output: ```bash {brokers: [0]} ``` ### [](#create-a-consumer)Create a consumer To retrieve events from a topic using consumers, you must create a consumer and a consumer group, and then subscribe the consumer instance to a topic. Each action involves a different endpoint and method. The first endpoint is: `/consumers/`. For this REST call, the payload is the group information. #### curl ```bash curl -s \ -X POST \ "$base_uri/consumers/test_group" \ -H "Content-Type: application/vnd.kafka.v2+json" \ -d '{ "format":"json", "name":"test_consumer", "auto.offset.reset":"earliest", "auto.commit.enable":"false", "fetch.min.bytes": "1", "consumer.request.timeout.ms": "10000" }' ``` #### NodeJS ```javascript let payload = { "name": "test_consumer", "format": "json", "auto.offset.reset": "earliest", "auto.commit.enable": "false", "fetch.min.bytes": "1", "consumer.request.timeout.ms": "10000" }; let options = { headers: { "Content-Type": "application/vnd.kafka.v2+json" }}; axios .post(`${base_uri}/consumers/test_group`, payload, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.post( url=f"{base_uri}/consumers/test_group", data=json.dumps({ "name": "test_consumer", "format": "json", "auto.offset.reset": "earliest", "auto.commit.enable": "false", "fetch.min.bytes": "1", "consumer.request.timeout.ms": "10000" }), headers={"Content-Type": "application/vnd.kafka.v2+json"}).json() pretty(res) ``` Expected output: ```bash {"instance_id":"test_consumer","base_uri":"http://:30082/consumers/test_group/instances/test_consumer"} ``` > 📝 **NOTE** > > - Consumers expire after five minutes of inactivity. To prevent this from happening, try consuming events within a loop. If the consumer has expired, you can create a new one with the same name. > > - The output `base_uri` is the full URL path for this specific consumer instance and differs from the `base_uri` variable used in the code examples. ### [](#subscribe-to-the-topic)Subscribe to the topic After creating the consumer, subscribe to the topic that you created. #### curl ```bash curl -s -o /dev/null -w "%{http_code}" \ -X POST \ "$base_uri/consumers/test_group/instances/test_consumer/subscription"\ -H "Content-Type: application/vnd.kafka.v2+json" \ -d '{ "topics": [ "test_topic" ] }' ``` #### NodeJS ```javascript let payload = { topics: ["test_topic"]}; let options = { headers: { "Content-Type": "application/vnd.kafka.v2+json" }}; axios .post(`${base_uri}/consumers/test_group/instances/test_consumer/subscription`, payload, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.post( url=f"{base_uri}/consumers/test_group/instances/test_consumer/subscription", data=json.dumps({"topics": ["test_topic"]}), headers={"Content-Type": "application/vnd.kafka.v2+json"}) ``` Expected response is an HTTP 204, without a body. Now you can get the events from `test_topic`. ### [](#retrieve-events)Retrieve events Retrieve the events from the topic: #### curl ```bash curl -s \ "$base_uri/consumers/test_group/instances/test_consumer/records?timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` #### NodeJS ```javascript let options = { headers: { Accept: "application/vnd.kafka.json.v2+json" }, params: { timeout: "1000", max_bytes: "100000", }, }; axios .get(`${base_uri}/consumers/test_group/instances/test_consumer/records`, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.get( url=f"{base_uri}/consumers/test_group/instances/test_consumer/records", params={"timeout":1000,"max_bytes":100000}, headers={"Accept": "application/vnd.kafka.json.v2+json"}).json() pretty(res) ``` Expected output: ```bash [{"topic":"test_topic","key":null,"value":"Redpanda","partition":0,"offset":0},{"topic":"test_topic","key":null,"value":"HTTP proxy","partition":1,"offset":0},{"topic":"test_topic","key":null,"value":"Test event","partition":2,"offset":0}] ``` ### [](#get-offsets-from-consumer)Get offsets from consumer #### curl ```bash curl -s \ -X 'GET' \ curl -s -o /dev/null -w "%{http_code}" \ -X 'POST' \ "$base_uri/consumers/test_group/instances/test_consumer/offsets" \ -H 'accept: application/vnd.kafka.v2+json' \ -H 'accept: application/vnd.kafka.v2+json' \ -H 'Content-Type: application/vnd.kafka.v2+json' \ -d '{ "partitions": [ { "topic": "test_topic", "partition": 0 }, { "topic": "test_topic", "partition": 1 }, { "topic": "test_topic", "partition": 2 } ] }' ``` #### Python ```python res = requests.get( url=f"{base_uri}/consumers/test_group/instances/test_consumer/offsets", data=json.dumps( dict(partitions=[ dict(topic="test_topic", partition=p) for p in [0, 1, 2] ])), headers={"Content-Type": "application/vnd.kafka.v2+json"}).json() pretty(res) ``` Expected output: ```bash { "offsets": [{ "topic": "test_topic", "partition": 0, "offset": 0, "metadata": "" },{ "topic": "test_topic", "partition": 1, "offset": 0, "metadata": "" }, { "topic": "test_topic", "partition": 2, "offset": 0, "metadata": "" }] } ``` ### [](#commit-offsets-for-consumer)Commit offsets for consumer After events have been handled by a consumer, the offsets can be committed, so that the consumer group won’t retrieve them again. #### curl ```bash curl -s -o /dev/null -w "%{http_code}" \ -X 'POST' \ "$base_uri/consumers/test_group/instances/test_consumer/offsets" \ -H 'accept: application/vnd.kafka.v2+json' \ -H 'Content-Type: application/vnd.kafka.v2+json' \ -d '{ "partitions": [ { "topic": "test_topic", "partition": 0, "offset": 0 }, { "topic": "test_topic", "partition": 1, "offset": 0 }, { "topic": "test_topic", "partition": 2, "offset": 0 } ] }' ``` #### NodeJS ```javascript let options = { headers: { accept: "application/vnd.kafka.v2+json", "Content-Type": "application/vnd.kafka.v2+json", } }; let payload = { partitions: [ { topic: "test_topic", partition: 0, offset: 0 }, { topic: "test_topic", partition: 1, offset: 0 }, { topic: "test_topic", partition: 2, offset: 0 }, ]}; axios .post(`${base_uri}/consumers/test_group/instances/test_consumer/offsets`, payload, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` Run the application: ```bash node index.js ``` #### Python ```python res = requests.post( url=f"{base_uri}/consumers/test_group/instances/test_consumer/offsets", data=json.dumps( dict(partitions=[ dict(topic="test_topic", partition=p, offset=0) for p in [0, 1, 2] ])), headers={"Content-Type": "application/vnd.kafka.v2+json"}) ``` Expected output: none. ### [](#delete-a-consumer)Delete a consumer To remove a consumer from a group, send a DELETE request as shown below: #### curl ```bash curl -s -o /dev/null -w "%{http_code}" \ -X 'DELETE' \ "$base_uri/consumers/test_group/instances/test_consumer" \ -H 'Content-Type: application/vnd.kafka.v2+json' ``` #### NodeJS ```javascript let options = { headers: { "Content-Type": "application/vnd.kafka.v2+json" }}; axios .delete(`${base_uri}/consumers/test_group/instances/test_consumer`, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python res = requests.delete( url=f"{base_uri}/consumers/test_group/instances/test_consumer", headers={"Content-Type": "application/vnd.kafka.v2+json"}) ``` ## [](#authenticate-with-http-proxy-2)Authenticate with HTTP Proxy HTTP Proxy supports authentication using SCRAM credentials or OIDC tokens. The authentication method depends on the cluster’s [`http_authentication`](../../reference/properties/cluster-properties/#http_authentication) settings. ### [](#scram-authentication-2)SCRAM Authentication If HTTP Proxy is configured to support SASL, you can provide the SCRAM username and password as part of the Basic Authentication header in your request. For example, to list topics as an authenticated user: #### curl ```bash curl -s -u ":" ":8082/topics" ``` #### NodeJS ```javascript let options = { auth: { username: "", password: "" }, }; axios .get(`${base_uri}/topics`, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python auth = ("", "") res = requests.get(f"{base_uri}/topics", auth=auth).json() pretty(res) ``` ### [](#oidc-authentication-2)OIDC Authentication If HTTP Proxy is configured to support OIDC, you can provide an OIDC token in the Authorization header. For example: #### curl ```bash curl -s -H "Authorization: Bearer " ":8082/topics" ``` #### NodeJS ```javascript let options = { headers: { Authorization: `Bearer ` }, }; axios .get(`${base_uri}/topics`, options) .then(response => console.log(response.data)) .catch(error => console.error(error)); ``` #### Python ```python headers = {"Authorization": "Bearer "} res = requests.get(f"{base_uri}/topics", headers=headers).json() pretty(res) ``` ## [](#use-swagger-with-http-proxy)Use Swagger with HTTP Proxy You can use Swagger UI to test and interact with Redpanda HTTP Proxy endpoints. Use Docker to start Swagger UI: ```bash docker run -p 80:8080 -d swaggerapi/swagger-ui ``` Verify that the Swagger container is available: ```bash docker ps ``` Verify that the Docker container has been added and is running: `swaggerapi/swagger-ui` with `Up…` status In a browser, enter `` in the address bar to open the Swagger console. Change the URL to `[http://:30082/v1](http://:30082/v1)`, and click `Explore` to update the page with Redpanda HTTP Proxy endpoints. You can call the endpoints in any application and language that supports web interactions. --- # Page 353: Kafka Compatibility **URL**: https://docs.redpanda.com/redpanda-cloud/develop/kafka-clients.md --- # Kafka Compatibility --- title: Kafka Compatibility latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: kafka-clients page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: kafka-clients.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/kafka-clients.adoc description: Kafka clients, version 0.11 or later, are compatible with Redpanda. Validations and exceptions are listed. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Redpanda is compatible with Apache Kafka versions 0.11 and later, with specific exceptions noted on this page. ## [](#kafka-client-compatibility)Kafka client compatibility Clients developed for Kafka versions 0.11 or later are compatible with Redpanda. Modern clients auto-negotiate protocol versions or use an earlier protocol version accepted by Redpanda brokers. > 💡 **TIP** > > Redpanda Data recommends always using the latest supported version of a client. The following clients have been validated with Redpanda. | Language | Client | | --- | --- | | Java | Apache Kafka Java Client | | C/C++ | librdkafka | | Go | franz-go | | Python | kafka-python-ng | | Rust | kafka-rust | | Node.js | KafkaJSconfluent-kafka-javascript | Clients that have not been validated by Redpanda Data, but use the Kafka protocol, remain compatible with Redpanda subject to the limitations below (particularly those based on librdkafka, such as confluent-kafka-dotnet or confluent-python). If you find a client that is not supported, reach out to the Redpanda team in the community [Slack](https://redpanda.com/slack). ## [](#unsupported-kafka-features)Unsupported Kafka features Redpanda does not currently support the following Apache Kafka features: - Multiple SCRAM mechanisms simultaneously for SASL users; for example, a user having both a `SCRAM-SHA-256` and a `SCRAM-SHA-512` credential. Redpanda supports only one SASL/SCRAM mechanism per user, either `SCRAM-SHA-256` or `SCRAM-SHA-512`. See the [Authentication](../../security/cloud-authentication/) guide for details. - HTTP Proxy (pandaproxy): Unlike other REST proxy implementations in the Kafka ecosystem, Redpanda HTTP Proxy does not support topic and ACLs CRUD through the HTTP Proxy. HTTP Proxy is designed for clients producing and consuming data that do not perform administrative functions. - The `delete.retention.ms` topic configuration in Kafka is not supported for Tiered Storage topics. Cloud Topics and local storage topics support Tombstone marker deletion using `delete.retention.ms`, but in Tiered Storage topics, Tombstone markers are only removed in accordance with normal topic retention, and only if the cleanup policy is `delete` or `compact, delete`. If you have any issues while working with a Kafka tool, you can [file an issue](https://github.com/redpanda-data/redpanda/issues/new). --- # Page 354: Kafka Connect **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors.md --- # Kafka Connect --- title: Kafka Connect latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/index.adoc description: Use Kafka Connect to stream data into and out of Redpanda. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-07" --- Use Kafka Connect to integrate your Redpanda data with different data systems. As managed solutions, connectors offer a simpler way to integrate your data than manually creating a solution with the Kafka API. You can set up and manage these connectors for BYOC and Dedicated clusters in the Redpanda Cloud UI or Cloud API. > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. Each connector is either a source or a sink: - A source connector imports data from a source system into a Redpanda cluster. The source connector’s main task is to fetch data from these sources and convert them into a format suitable for Redpanda. - A sink connector exports data from a Redpanda cluster and pushes it into a target system. Sink connectors read the data from Redpanda and transform it into a format that the target system can use. These sources and sinks work together to create a data pipeline that can move and transform data from one system to another. > ⚠️ **WARNING** > > Modifying the properties of topics that are created and managed by Redpanda applications can cause unexpected errors. This may lead to connector and cluster failures. - [Converters and Serialization](converters-and-serialization/) Use converters to handle the serialization and deserialization of data between a Redpanda topic and an external system with Kafka Connect. - [Monitor Kafka Connect](monitor-connectors/) Use metrics to monitor the health of Kafka Connect. - [Disable Kafka Connect](disable-kc/) Learn how to disable Kafka Connect using the Cloud API. - [Single Message Transforms](transforms/) Single Message Transforms (SMTs) let you modify the data and its characteristics as it passes through a connector. - [Sizing Connectors](sizing-connectors/) How to choose number of tasks to set for a connector. - [Create an S3 Sink Connector](create-s3-sink-connector/) Use the Redpanda Cloud UI to create an AWS S3 Sink Connector. - [Create a Google BigQuery Sink Connector](create-gcp-bigquery-connector/) Use the Redpanda Cloud UI to create a Google BigQuery Sink Connector. - [Create a GCS Sink Connector](create-gcs-connector/) Use the Redpanda Cloud UI to create a GCS Sink Connector. - [Create an Iceberg Sink Connector](create-iceberg-sink-connector/) Use the Redpanda Cloud UI to create an Iceberg Sink Connector. - [Create a JDBC Sink Connector](create-jdbc-sink-connector/) Use the Redpanda Cloud UI to create a JDBC Sink Connector. - [Create a JDBC Source Connector](create-jdbc-source-connector/) Use the Redpanda Cloud UI to create a JDBC Source Connector. - [Create a MirrorMaker2 Source Connector](create-mmaker-source-connector/) Use the Redpanda Cloud UI to create a MirrorMaker2 Source Connector. - [Create a MirrorMaker2 Checkpoint Connector](create-mmaker-checkpoint-connector/) Use the Redpanda Cloud UI to create a MirrorMaker2 Checkpoint Connector. - [Create a MirrorMaker2 Heartbeat Connector](create-mmaker-heartbeat-connector/) Use the Redpanda Cloud UI to create a MirrorMaker2 Heartbeat Connector. - [Create a MongoDB Sink Connector](create-mongodb-sink-connector/) Use the Redpanda Cloud UI to create a MongoDB Sink Connector. - [Create a MongoDB Source Connector](create-mongodb-source-connector/) Use the Redpanda Cloud UI to create a MongoDB Source Connector. - [Create a MySQL (Debezium) Source Connector](create-mysql-source-connector/) Use the Redpanda Cloud UI to create a MySQL (Debezium) Source Connector. - [Create a PostgreSQL (Debezium) Source Connector](create-postgresql-connector/) Use the Redpanda Cloud UI to create a PostgreSQL (Debezium) Source Connector. - [Create a SQL Server (Debezium) Source Connector](create-sqlserver-connector/) Use the Redpanda Cloud UI to create a SQL Server (Debezium) Source Connector. - [Create a Snowflake Sink Connector](create-snowflake-connector/) Use the Redpanda Cloud UI to create a Snowflake Sink Connector. --- # Page 355: Converters and Serialization **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/converters-and-serialization.md --- # Converters and Serialization --- title: Converters and Serialization latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/converters-and-serialization page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/converters-and-serialization.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/converters-and-serialization.adoc description: Use converters to handle the serialization and deserialization of data between a Redpanda topic and an external system with Kafka Connect. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-09-26" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. Connectors are a translation layer working between Redpanda and the remote system. For **sink** connectors the translation happens in the following phases: 1. Converter deserializes data from Redpanda message format (for example JSON or Avro) to a universal in-memory connect data format. 2. The in-memory connect data structure is translated by the connector to the data model of the remote system. For **source** connectors it is vice versa, the phases are: 1. Connector translates the data model from remote system format to the in-memory connect data structure. 2. Converter serializes the data from a universal in-memory connect format to a Redpanda message. Each Redpanda message is a key and value record. Record key and value converters are configured separately with the `Redpanda message key format` and `Redpanda message value format` properties. Key and value converters can be different. > 📝 **NOTE** > > If an external system requires structured data (like BigQuery or a SQL database), then you must provide data with a schema. Use the Avro, Protobuf, or JSON converter with a schema. ## [](#bytearray-converter)ByteArray converter The ByteArray converter is the most primitive and high-throughput converter. Schema is ignored. This is the default converter type for managed connectors. To use the converter, select the `ByteArray` option as a key or value message format. ## [](#string-converter)String converter The String converter is a high-throughput converter. Schema is ignored. All data is converted to a string. To use the converter, select the `String` option as a key or value message format. ## [](#json-converter)JSON converter The JSON converter supports a JSON schema embedded in the message, where each message contains a schema. It results in a bigger message size. The connector needs a message schema to check message format. To use the converter, select the `JSON` option as a key or value message format. Example JSON message with embedded schema: ```json { "schema": { "type": "struct", "fields": [ { "type": "int64", "optional": false, "field": "person_id" }, { "type": "string", "optional": false, "field": "name" } ] }, "payload": { "person_id": 1, "name": "Redpanda" } } ``` If you consume JSON data with no message schema, the schema check for the connector must be disabled with the `Message key JSON contains schema` or `Message value JSON contains schema` option. ## [](#avro-converter)Avro converter The Avro converter requires a schema in Schema Registry. Avro supports primitive types and complex types, like records, enums, arrays, maps, and unions. To specify a timestamp in an Avro schema for use with Kafka Connect, use: ```json { "name": "time1", "type": [ "null", { "type": "long", "connect.version": 1, "connect.name": "org.apache.kafka.connect.data.Timestamp", "logicalType": "timestamp-millis" } ], "default": null } ``` See also: - [Redpanda Schema Registry](../../../manage/schema-reg/schema-reg-overview/) - [Avro specification](https://avro.apache.org/docs/1.11.1/specification) ## [](#cloudevents-converter)CloudEvents converter The CloudEvents converter is specific to Debezium PostgreSQL and MySQL source connectors. See also: [CloudEvents Converter documentation](https://debezium.io/documentation/reference/2.2/integrations/cloudevents.html) ## [](#protobuf-converter)Protobuf converter ![Beta](https://img.shields.io/badge/Beta-red.svg) The Protobuf converter requires a schema in Schema Registry. The converter only supports sink connectors. Source connectors are not supported. To use the converter, select the `Protobuf` option as a key or value message format. See also: [Redpanda Schema Registry](../../../manage/schema-reg/schema-reg-overview/) ## [](#set-property-keys)Set property keys Kafka Connect connectors use a set of `=` to set up properties. For example if you want to set the property `topic.creation.enable` to `true`, use `topic.creation.enable=true` in the property settings page. --- # Page 356: Create a Google BigQuery Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-gcp-bigquery-connector.md --- # Create a Google BigQuery Sink Connector --- title: Create a Google BigQuery Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-gcp-bigquery-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-gcp-bigquery-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-gcp-bigquery-connector.adoc description: Use the Redpanda Cloud UI to create a Google BigQuery Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. The Google BigQuery Sink connector enables you to stream any structured data from Redpanda to BigQuery for advanced analytics. ## [](#prerequisites)Prerequisites Before you can create a Google BigQuery Sink connector in the Redpanda Cloud, you must: 1. Create a [Google Cloud](https://cloud.google.com/) account. 2. In the **Google home** page: 1. [Select an existing project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#get_an_existing_project) or [create a new one](https://cloud.google.com/resource-manager/docs/creating-managing-projects#creating_a_project). 2. [Create a new dataset](https://cloud.google.com/bigquery/docs/datasets) for the project. 3. (_Optional if your data has a schema_) After creating the dataset, [create a new table](https://cloud.google.com/bigquery/docs/tables) to hold the data you intend to stream from Redpanda Cloud topics. Specify a structure for the table using schema values that align with your Redpanda topic data. > 📝 **NOTE** > > This step is mandatory only if the data in Redpanda does not have a schema. If the data in Redpanda includes a schema, then the connector automatically creates the tables in BigQuery. 3. Create a [custom role](https://cloud.google.com/iam/docs/creating-custom-roles). The role must have the following permissions: bigquery.datasets.get bigquery.tables.create bigquery.tables.get bigquery.tables.getData bigquery.tables.list bigquery.tables.update bigquery.tables.updateData 4. Create a [service account](https://cloud.google.com/iam/docs/service-accounts-create). 5. [Add the custom role to your service account](https://cloud.google.com/iam/docs/granting-changing-revoking-access). 6. [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete), and then download it. ## [](#limitations)Limitations The Google BigQuery Sink connector doesn’t support schemas with recursion. ## [](#create-a-google-bigquery-sink-connector)Create a Google BigQuery Sink connector To create the Google BigQuery Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to Google BigQuery**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | A comma-separated list of the cluster topics you want to replicate to Google BigQuery. | | Topics regex | topics.regex | A Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | Credentials JSON | keyfile | A JSON key with BigQuery service account credentials. | | Project | project | The BigQuery project to which topic data will be written. | | Default dataset | defaultDataset | The default Google BigQuery dataset to be used. | | Kafka message value format | value.converter | The format of the value in the Redpanda topic. The default is JSON. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-google-bigquery-sink-connector-configuration)Advanced Google BigQuery Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require any additional property settings (for example, automatically create BigQuery tables or map topics to tables), then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Auto create tables | autoCreateTables | Automatically create BigQuery tables if they don’t already exist. If the table does not exist, then it is created based on the record schema. | | Topic to table map | topic2TableMap | Map of topics to tables. Format: comma-separated tuples, for example topic1:table1,topic2:table2. | | Allow new BigQuery fields | allowNewBigQueryFields | If true, new fields can be added to BigQuery tables during subsequent schema updates. | | Allow BigQuery required field relaxation | allowBigQueryRequiredFieldRelaxation | If true, fields in the BigQuery schema can be changed from REQUIRED to NULLABLE. | | Upsert enabled | upsertEnabled | Enables upsert functionality on the connector. | | Delete enabled | deleteEnabled | Enable delete functionality on the connector. | | Kafka key field name | kafkaKeyFieldName | The name of the BigQuery table field for the Kafka key. Must be set when upsert or delete is enabled. | | Time partitioning type | timePartitioningType | The time partitioning type to use when creating tables. | | BigQuery retry attempts | bigQueryRetry | The number of retry attempts made for each BigQuery request that fails with a backend or quota exceeded error. | | BigQuery retry attempts interval | bigQueryRetryWait | The minimum amount of time, in milliseconds, to wait between BigQuery backend or quota exceeded error retry attempts. | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` (`org.apache.kafka.connect.json.JsonConverter`) when your messages are JSON-encoded. Select `Message JSON contains schema`, with the `schema` and `payload` fields. If your messages do not contain schema, manually create tables in BigQuery. - `AVRO` (`io.confluent.connect.avro.AvroConverter`) when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. ## [](#topic-name-to-table-name-mapping)Topic name to table name mapping By default, the table name is the name of the topic. Use the `Topic to table map` (`topic2TableMap`) configuration property to remap topic names. For example, `topic1:table1,topic2:table2`. ## [](#test-the-connection)Test the connection After the connector is created, go to your BigQuery worksheets and query your table: ```sql SELECT * FROM `project.dataset.table` ``` It may take a couple of minutes for the records to be visible in BigQuery. ## [](#troubleshoot)Troubleshoot Google credentials are checked for validity during connector creation, upon clicking **Finish**. In cases where there are invalid credentials, the connector is not created. Other issues are reported using a failed task error message. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | Not found: Project invalid-project-name | Check to make sure Project contains a valid BigQuery project. | | Not found: Dataset project:invalid-dataset | Check to make sure Default dataset contains a valid BigQuery dataset. | | An unexpected error occurred while validating credentials for BigQuery: Failed to create credentials from input stream | The credentials given as a JSON file in the Credentials JSON property are incorrect. Copy a valid key from the Google Cloud service account. | | JsonConverter with schemas.enable requires "schema" and "payload" fields | The connector encountered an incorrect message format when reading from a topic. | | JsonParseException: Unrecognized token 'test': was expecting JSON | During reading from a topic the connector encountered a message that is invalid JSON. | | Streaming to metadata partition of column-based partitioning table {table_name} is disallowed. | Check to confirm that the bigQueryPartitionDecorator property is set to false. You can check the property in the connector configuration JSON view. | | Caused by: table: GenericData{classInfo=…​ insertion failed for the following rows:…​ no such field: | The Redpanda message contains a property that does not exist in a BigQuery table schema. | | BigQueryConnectException …​ insertion failed for the following rows: …​ [row index 0] (location fieldname[0], reason: invalid): This field: fieldname is not a record. | The Redpanda message contains an array of records, but the BigQuery table expects an array of strings. | | BigQueryConnectException: Failed to unionize schemas of records for the table…​ Could not convert to BigQuery schema with a batch of tombstone records. | The Redpanda message does not contain a schema, so the connector cannot create a BigQuery table. Create the BigQuery table manually. | --- # Page 357: Create a GCS Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-gcs-connector.md --- # Create a GCS Sink Connector --- title: Create a GCS Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-gcs-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-gcs-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-gcs-connector.adoc description: Use the Redpanda Cloud UI to create a GCS Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. The Google Cloud Storage (GCS) Sink connector stores Redpanda messages in a Google Cloud Storage bucket. ## [](#prerequisites)Prerequisites Before you can create a GCS Sink connector in the Redpanda Cloud, you must: 1. Create a [Google Cloud](https://cloud.google.com/) account. 2. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create) that will be used to connect to the GCS service. 3. [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete) and download it. 4. Create a [custom role](https://cloud.google.com/iam/docs/creating-custom-roles), which must have the following permissions: - `storage.objects.create` to create items in the GCS bucket - `storage.objects.delete` to overwrite items in the GCS bucket 5. [Create a GCS bucket](https://cloud.google.com/storage/docs/creating-buckets) to which to send data. 6. [Grant permissions](https://cloud.google.com/storage/docs/access-control/using-iam-permissions) to the bucket your created for your service account. Use the role created in step 4. ## [](#limitations)Limitations The GCS Sink connector has the following limitations: - You can use only the `STRING` and `BYTES` input formats for `CSV` output format. - You can use only the `PARQUET` format when your messages contain schema. ## [](#create-a-gcs-sink-connector)Create a GCS Sink connector To create the GCS Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to Google Cloud Storage**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | Comma-separated list of the cluster topics you want to replicate to GCS. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | GCS Credentials JSON | gcs.credentials.json | JSON object with GCS credentials. | | GCS bucket name | gcs.bucket.name | Name of an existing GCS bucket to store output files in. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. Use BYTES for no conversion. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. Use BYTES for no conversion. | | GCS file format | format.output.type | Format of the files created in GCS: CSV (the default), JSON, JSONL AVRO, or PARQUET. You can use the CSV format output only with BYTES and STRING. | | Avro codec | avro.codec | The Avro compression codec to be used for Avro output files. Available values: null (the default), deflate, snappy, and bzip2. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-gcs-sink-connector-configuration)Advanced GCS Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require any additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | File name template | file.name.template | The template for file names on GCS. Supports {{ variable }} placeholders for substituting variables. Supported placeholders are:topicpartitionstart_offset (the offset of the first record in the file)timestamp:unit=yyyy|MM|dd|HH (the timestamp of the record)key (when used, other placeholders are not substituted) | | File name prefix | file.name.prefix | The prefix to be added to the name of each file put in GCS. | | Output fields | format.output.fields | Fields to place into output files. Supported values are: 'key', 'value', 'offset', 'timestamp', and 'headers'. | | Value field encoding | format.output.fields.value.encoding | The type of encoding to be used for the value field. Supported values are: 'none' and 'base64'. | | Envelope for primitives | format.output.envelope | Specifies whether or not to enable additional JSON object wrapping of the actual value. | | Output file compression | file.compression.type | The compression type to be used for files put into GCS. Supported values are: 'none', 'gzip', 'snappy', and 'zstd'. | | Max records per file | file.max.records | The maximum number of records to put in a single file. Must be a non-negative number. 0 is interpreted as "unlimited", which is the default. In this case files are only flushed after file.flush.interval.ms. | | File flush interval milliseconds | file.flush.interval.ms | The time interval to periodically flush files and commit offsets. Value specified must be a non-negative number. Default is 60 seconds. 0 indicates that it is disabled. In this case, files are only flushed after reaching file.max.records record size. | | GCS bucket check | gcs.bucket.check | If set to true, the connector will attempt to put a test file to the GCS bucket to validate access. Default is true. | | GCS retry backoff initial delay milliseconds | gcs.retry.backoff.initial.delay.ms | Initial retry delay in milliseconds. The default value is 1000. | | GCS retry backoff max delay milliseconds | gcs.retry.backoff.max.delay.ms | Maximum retry delay in milliseconds. The default value is 32000. | | GCS retry backoff delay multiplier | gcs.retry.backoff.delay.multiplier | Retry delay multiplier. The default value is 2.0. | | GCS retry backoff max attempts | gcs.retry.backoff.max.attempts | Retry max attempts. The default value is 6. | | GCS retry backoff total timeout milliseconds | gcs.retry.backoff.total.timeout.ms | Retry total timeout in milliseconds. The default value is 50000. | | Retry back-off | kafka.retry.backoff.ms | Retry backoff in milliseconds. In case of transient exceptions, useful for performing recovery. Maximum value is 86400000 (24 hours). | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` (`org.apache.kafka.connect.json.JsonConverter`) when your messages are JSON-encoded. Select `Message JSON contains schema`, with the `schema` and `payload` fields. - `AVRO` (`io.confluent.connect.avro.AvroConverter`) when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. - `STRING` (`org.apache.kafka.connect.storage.StringConverter`) when your messages contain textual data. - `BYTES` (`org.apache.kafka.connect.converters.ByteArrayConverter`) when your messages contain arbitrary data. You can also select the output data format for your GCS files as follows: - `CSV` to produce data in the `CSV` format. For `CSV` only, you can set `STRING` and `BYTES` input formats. - `JSON` to produce data in the `JSON` format as an array of record objects. - `JSONL` to produce data in the `JSON` format, each message as a separate JSON, one per line. - `PARQUET` to produce data in the `PARQUET` format when your messages contain schema. - `AVRO` to produce data in the `AVRO` format when your messages contain schema. ## [](#test-the-connection)Test the connection After the connector is created, check the GCS bucket for a new file. Files should appear after the file flush interval (default is 60 seconds). ## [](#troubleshoot)Troubleshoot If there are any connection issues, an error message is returned. Depending on the `GCS bucket check` property value, the error results in a failed connector (`GCS bucket check = true`) or a failed task (`GCS bucket check = false`). Select **Show Logs** to view error details. Additional errors and corrective actions follow. | Message | Action | | --- | --- | | Failed to read credentials from JSON string | The credentials given as JSON file in the GCS credentials JSON property are incorrect. Copy a valid key from the Google Cloud service account. | | The specified bucket does not exist | Create the bucket if the bucket does not exist, or correct the bucket name if the bucket exists, but the specified GCS bucket name value is incorrect. | | No files in the GCS bucket | Be sure to wait until the connector performs the first file flush (default is 60 seconds). | --- # Page 358: Create an Iceberg Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-iceberg-sink-connector.md --- # Create an Iceberg Sink Connector --- title: Create an Iceberg Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-iceberg-sink-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-iceberg-sink-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-iceberg-sink-connector.adoc description: Use the Redpanda Cloud UI to create an Iceberg Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-03-31" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use the Iceberg Sink connector to accomplish the following: - Write data into Iceberg tables - Commit coordination for centralized Iceberg commits - Exactly-once delivery semantics - Multi-table fan-out - Row mutations (update/delete rows), upsert mode - Automatic table creation and schema evolution - Field name mapping via Iceberg’s column mapping functionality ## [](#prerequisites)Prerequisites Before you can create an Iceberg Sink connector in Redpanda Cloud, you must: 1. [Set up an Iceberg catalog](https://iceberg.apache.org/concepts/catalog/). 2. Create the Iceberg connector control topic, which cannot be used by other connectors. For details, see [Create a Topic](../../topics/create-topic/). ## [](#limitations)Limitations - Each Iceberg sink connector must have its own control topic, which you should create before creating the connector. ## [](#create-an-iceberg-sink-connector)Create an Iceberg Sink connector To create the Iceberg Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu and then click **Create Connector**. 2. Select **Export to Iceberg**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | Comma-separated list of the cluster topics you want to replicate. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | Iceberg control topic | iceberg.control.topic | The name of the control topic. You must create this topic before creating the Iceberg connector. It cannot be used by other Iceberg connectors. | | Iceberg catalog type | iceberg.catalog.type | The type of Iceberg catalog. Allowed options are: REST, HIVE, HADOOP. | | Iceberg tables | iceberg.tables | Comma-separated list of Iceberg table names, which are specified using the format {namespace}.{table}. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-iceberg-sink-connector-configuration)Advanced Iceberg Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Iceberg commit timeout | iceberg.control.commit.timeout-ms | Commit timeout interval in ms. The default is 30000 (30 sec). | | Iceberg tables route field | iceberg.tables.route-field | For multi-table fan-out, the name of the field used to route records to tables. | | Iceberg tables CDC field | iceberg.tables.cdc-field | Name of the field containing the CDC operation, I, U, or D. Default is none. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` when your messages are JSON-encoded. Select `Message JSON contains schema` with the `schema` and `payload` fields. If your messages do not contain schema, create Iceberg tables manually. - `AVRO` when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. An Iceberg table’s schema is a list of named columns. All data types are either primitives or nested types, which are maps, lists, or structs. A table schema is also a struct type. See also: [Schemas and Data Types](https://iceberg.apache.org/spec/#schemas-and-data-types) ## [](#sinking-data-produced-by-debezium-source-connector)Sinking data produced by Debezium source connector Debezium connectors produce data in CDC format. The message structure can be flattened by using Debezium built-in New Record State Extraction Single Message Transformation (SMT). Add the following properties to the Debezium connector configuration to make it produce flat messages: ```json { ... "transforms", "unwrap", "transforms.unwrap.type", "io.debezium.transforms.ExtractNewRecordState", "transforms.unwrap.drop.tombstones", "false", ... } ``` Depending on your particular use case, you can apply the SMT to a Debezium connector, or to a sink connector that consumes messages that the Debezium connector produces. To enable Apache Kafka to retain the Debezium change event messages in their original format, configure the SMT for a sink connector. See also: [Debezium New Record State Extraction SMT](https://debezium.io/documentation/reference/stable/transformations/event-flattening.html) ## [](#use-analytical-tools-with-iceberg)Use analytical tools with Iceberg Iceberg serves as a single storage solution for analytical data. It is inexpensive to read from various tools such as AWS Athena, Snowflake, or Apache Spark. Traditionally, data import involved pushing data to every tool, incurring high costs for data transfer and storage. Alternatively, you could use plain S3 buckets with Avro or CSV files, but this struggles with schema evolution. [Apache Iceberg](https://iceberg.apache.org) addresses all of these challenges: cost of data transfer, multiple data copies in storage, and support for schema evolution. ![Iceberg sink connector diagram](../../../shared/_images/iceberg_sink_connector_diagram.png) The following example uses: - Iceberg REST catalog - AWS S3 bucket as the storage for Iceberg files - Apache Spark, which reads the Iceberg data from an S3 bucket ```yaml version: '3' services: redpanda: image: docker.redpanda.com/redpandadata/redpanda:latest command: - redpanda start - --smp 1 - --overprovisioned - --node-id 0 - --reserve-memory 0M - --check=false - --set redpanda.auto_create_topics_enabled=false - --kafka-addr PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092 - --advertise-kafka-addr PLAINTEXT://redpanda:29092,OUTSIDE://localhost:9092 - --pandaproxy-addr 0.0.0.0:8082 - --advertise-pandaproxy-addr localhost:8082 ports: - 8081:8081 - 8082:8082 - 9092:9092 - 9644:9644 - 29092:29092 console: image: docker.redpanda.com/redpandadata/console:latest restart: on-failure entrypoint: /bin/sh command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console" environment: CONFIG_FILEPATH: /tmp/config.yml CONSOLE_CONFIG_FILE: | kafka: brokers: ["redpanda:29092"] schemaRegistry: enabled: true urls: ["http://redpanda:8081"] connect: enabled: true clusters: - name: connectors url: http://connect:8083 ports: - "8090:8080" depends_on: - redpanda connect: image: docker.redpanda.com/redpandadata/connectors:latest hostname: connect depends_on: - redpanda - spark-iceberg ports: - "8083:8083" - "9404:9404" environment: CONNECT_CONFIGURATION: | key.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter group.id=connectors-cluster offset.storage.topic=_internal_connectors_offsets config.storage.topic=_internal_connectors_configs status.storage.topic=_internal_connectors_status config.storage.replication.factor=-1 offset.storage.replication.factor=-1 status.storage.replication.factor=-1 producer.linger.ms=1 producer.batch.size=131072 config.providers=file config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider CONNECT_BOOTSTRAP_SERVERS: redpanda:29092 SCHEMA_REGISTRY_URL: http://redpanda:8081 CONNECT_GC_LOG_ENABLED: "false" CONNECT_HEAP_OPTS: -Xms512M -Xmx512M CONNECT_LOG_LEVEL: info CONNECT_TOPIC_LOG_ENABLED: "true" CONNECT_PLUGIN_PATH: "/opt/kafka/connect-plugins" spark-iceberg: image: tabulario/spark-iceberg:3.4.1_1.3.1 build: spark/ depends_on: - rest volumes: - ./warehouse:/home/iceberg/warehouse environment: - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} - AWS_REGION=${AWS_REGION} ports: - 8888:8888 - 8080:8080 - 10000:10000 - 10001:10001 rest: image: tabulario/iceberg-rest:0.6.0 ports: - 8181:8181 environment: - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} - AWS_REGION=${AWS_REGION} - CATALOG_WAREHOUSE=s3://bucket-name/ - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO ``` Use Spark-SQL to: - List databases: ```none spark-sql ()> show databases; testdb ``` - Show tables in database: ```none spark-sql ()> show tables in testdb; testtable ``` - Select data from table: ```none spark-sql ()> select * from testdb.testtable; ``` ## [](#use-with-aws-glue-data-catalog-and-aws-lake-formation)Use with AWS Glue Data Catalog and AWS Lake Formation The connector can be used with the AWS Glue Data Catalog and the AWS Lake Formation service. AWS Lake Formation only lets you use the role form of authentication. The connectors UI does not support Lake Formation-specific properties. Use the JSON editor instead. Sample configuration: ```json { ... "iceberg.catalog.client.assume-role.region": "the-region", "iceberg.catalog.client.assume-role.arn": "arn:aws:iam::account-number:role/role-name", "iceberg.catalog.glue.account-id": "NNN", "iceberg.catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog", "iceberg.catalog.client.assume-role.tags.LakeFormationAuthorizedCaller": "iceberg-connect", "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO", "iceberg.catalog": "catalog_name", "iceberg.catalog.warehouse": "s3://bucket-name/my/data", "iceberg.catalog.s3.path-style-access": "true" } ``` ## [](#test-the-connection)Test the connection After the connector is created, execute SELECT query on the Iceberg table to verify data. It may take a couple of minutes for the records to be visible in Iceberg. Check connector state and logs for errors. ## [](#troubleshoot)Troubleshoot Iceberg connection settings are checked for validity during first data processing. The connector can be successfully created with incorrect configuration and fail only when there are messages in source topic to process. | Message | Action | | --- | --- | | NoSuchTableException: Table does not exist | Make sure Iceberg table exists and the connector iceberg.tables configuration contains correct table name in {namespace}.{table} format. | | UnknownHostException: incorrectcatalog: Name or service not known | Cannot connect to Iceberg catalog. Check if Iceberg catalog URI is correct and accessible. | | DataException: An error occurred converting record, topic: topicName, partition, 0, offset: 0 | The connector cannot read the message format. Ensure the connector mapping configuration and data format are correct. | | NullPointerException: Cannot invoke "java.lang.Long.longValue()" because "value" is null | The connector cannot read the message format. Ensure the connector mapping configuration and data format are correct. | ## [](#suggested-reading)Suggested reading - For details about the Iceberg Sink connector configuration properties, see [Iceberg-Kafka-Connect](https://github.com/tabular-io/iceberg-kafka-connect) - For details about the Iceberg Sink connector internals, see [Iceberg-Kafka-Connect documentation](https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs) --- # Page 359: Create a JDBC Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-jdbc-sink-connector.md --- # Create a JDBC Sink Connector --- title: Create a JDBC Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-jdbc-sink-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-jdbc-sink-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-jdbc-sink-connector.adoc description: Use the Redpanda Cloud UI to create a JDBC Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a JDBC Sink connector to export structured data from Redpanda to a relational database. ## [](#prerequisites)Prerequisites Before you can create a JDBC Sink connector in the Redpanda Cloud, you must have a: - Relational database instance that is accessible from the JDBC Sink connector instance - Database user ## [](#limitations)Limitations The JDBC Sink connector has the following limitations: - Only `JSON` or `AVRO` formats can be used as a value converter. - Only the following databases are supported: - MySQL 5.7 and 8.0 - PostgreSQL 8.2 and higher using the version 3.0 of the PostgreSQL® protocol - SQLite - SQL Server - Microsoft SQL versions: Azure SQL Database, Azure Synapse Analytics, Azure SQL Managed Instance, SQL Server 2014, SQL Server 2016, SQL Server 2017, SQL Server 2019 ## [](#create-a-jdbc-sink-connector)Create a JDBC Sink connector To create the JDBC Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to JDBC**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | Comma-separated list of the cluster topics you want to replicate. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | JDBC URL | connection.url | The database connection JDBC URL. | | User | connection.user | Name of the database user to be used when connecting to the database. | | Password | connection.password | Password of the database user to be used when connecting to the database. | | Redpanda message key format | key.converter | Format of the key in the Redpanda topic. BYTES is the default. | | Redpanda message value format | value.converter | Format of the value in the Redpanda topic. JSON is the default. | | Auto-create | auto.create | When enabled, automatically creates the destination table (if it is missing) based on the record schema (issues a CREATE). The default is disabled. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-jdbc-sink-connector-configuration)Advanced JDBC Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Include fields | fields.whitelist | List of comma-separated record value field names. If the value of this property is empty, the connector uses all fields from the record to migrate to a database. Otherwise, the connector uses only the record fields that are specified (in a comma-separated format). Note that Primary Key Fields is applied independently in the context of which fields form the primary key columns in the destination database, while this configuration is applicable for the other columns. | | Topics to tables mapping | topics.to.tables.mapping | Kafka topics to database tables mapping. Comma-separated list of topic to table mapping in the format: topic_name:table_name. If the destination table is found in the mapping, then it overrides the generated one defined in table.name.format. | | Table name format | table.name.format | A format string for the destination table name, which may contain ${topic} as a placeholder for the original topic name. For example, kafka_${topic} for the topic orders maps to the table name kafka_orders. The default is ${topic}. | | Table name normalize | table.name.normalize | Specifies whether or not to normalize destination table names for topics. When enabled, the alphanumeric characters (a-z, A-Z, 0-9) and remain as is, others (such as .) are replaced with . By default, is disabled. | | Quote SQL identifiers | sql.quote.identifiers | Specifies whether or not to delimit (in most databases, a quote with double quotation marks) identifiers (for example, table names and column names) in SQL statements. By default, enabled. | | Auto-evolve | auto.evolve | Whether to automatically add columns in the table schema when found to be missing relative to the record schema by issuing ALTER. | | Batch size | batch.size | Specifies how many records to attempt to batch together for insertion into the destination table, when possible. The default is 3000. | | DB time zone | db.timezone | Name of the JDBC timezone that should be used in the connector when querying with time-based criteria. Default is UTC. | | Insert mode | insert.mode | The insertion mode to use. The supported modes are:INSERT: standard SQL INSERT statementsMULTI: multi-row INSERT statementsUPSERT: use the appropriate upsert semantics for the target database if it is supported by the connector; for example, INSERT .. ON CONFLICT .. DO UPDATE SET ..UPDATE: use the appropriate update semantics for the target database if it is supported by the connector; for example, UPDATE. | | Primary key mode | pk.mode | The primary key mode to use. Supported modes are:NONE: no keys utilizedkafka: Kafka coordinates (the topic, partition, and offset) are used as the primary keyRECORD_KEY: fields from the record key are used, which may be a primitive or a structRECORD_VALUE: fields from the record value are used, which must be a struct. | | Primary key fields | pk.fields | Comma-separated list of primary key field names. The runtime interpretation of this configuration depends on the pk.mode. Supported modes are:none: ignored because no fields are used as primary key in this mode.kafka: must be a trio representing the Kafka coordinates (the topic, partition, and offset). Defaults to connect_topic,connect_partition,__connect_offset if empty.record_key: if empty, all fields from the key struct will be used, otherwise used to extract the desired fields. For primitive key, only a single field name must be configured.record_value: if empty, all fields from the value struct will be used, otherwise used to extract the desired fields. | | Maximum retries | max.retries | The maximum number of times to retry on errors before failing the task. The default is 10. | | Retry backoff (ms) | retry.backoff.ms | The time in milliseconds to wait before a retry attempt is made following an error. The default is 3000. | | Database dialect | dialect.name | The name of the database dialect that should be used for this connector. By default. the connector automatically determines the dialect based upon the JDBC connection URL. Use if you want to override that behavior and specify a specific dialect. | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - Use the default `Redpanda message value format` = `JSON` (`org.apache.kafka.connect.json.JsonConverter`) property in your configuration. - Topics should contain data in JSON format with a defined JSON schema. For example: ```json { "schema": { "type": "struct", "fields": [ ] }, "payload": { } } ``` ## [](#test-the-connection)Test the connection After the connector is created, ensure that: - There are no errors in logs and in Redpanda Console. - Database tables contain data from Redpanda topics. ## [](#troubleshoot)Troubleshoot JDBC Sink connector issues are reported as failed tasks. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | PSQLException: FATAL: database "invalid-database" does not exist | Make sure the JDBC URL specifies an existing database name. | | UnknownHostException: invalid-host | Make sure the JDBC URL specifies a valid database host name. | | PSQLException: Connection to postgres:1234 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections | Make sure the JDBC URL specifies a valid database host name and port, and that the port is accessible. | | PSQLException: FATAL: password authentication failed for user "postgres" | Verify that the User and Password are correct. | | ConnectException: topic_name.Value (STRUCT) type doesn’t have a mapping to the SQL database column type | The JDBC Sink connector is not compatible with the Debezium PostgreSQL Source connector. Kafka Connect JSON produced by the Debezium Connector is not compatible with what the JDBC Sink Connector is expecting. Try changing a topic name. The JDBC Source connector is compatible with the JDBC Sink connector, and can be used as an alternative for a Debezium PostgreSQL source connector. | --- # Page 360: Create a JDBC Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-jdbc-source-connector.md --- # Create a JDBC Source Connector --- title: Create a JDBC Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-jdbc-source-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-jdbc-source-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-jdbc-source-connector.adoc description: Use the Redpanda Cloud UI to create a JDBC Source Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a JDBC Source connector to import batches of rows from MySQL, PostgreSQL, SQLite, and SQL Server relational databases into Redpanda topics. ## [](#prerequisites)Prerequisites - Relational database instance that is accessible from the JDBC Source connector instance. - Database user has been created. ## [](#limitations)Limitations The JDBC Source connector has the following limitations: - Only `JSON` or `AVRO` formats can be used as a value converter. - Only the following databases are supported: - MySQL 5.7 and 8.0 - PostgreSQL 8.2 and higher using the version 3.0 of the PostgreSQL® protocol - SQLite - SQL Server - Microsoft SQL versions: Azure SQL Database, Azure Synapse Analytics, Azure SQL Managed Instance, SQL Server 2014, SQL Server 2016, SQL Server 2017, SQL Server 2019 ## [](#create-a-jdbc-source-connector)Create a JDBC Source connector To create the JDBC Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from JDBC**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topic prefix | topic.prefix | Prefix to prepend to table names to generate the name of the Kafka topic to which to publish data, or in the case of a custom query, the full name of the topic to publish to. | | JDBC URL | connection.url | The database connection JDBC URL. | | User | connection.user | Name of the database user to be used when connecting to the database. | | Password | connection.password | Password of the database user to be used when connecting to the database. | | Redpanda message value format | value.converter | Format of the value in the Redpanda topic. JSON is the default. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates an exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-jdbc-source-connector-configuration)Advanced JDBC Source connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | JDBC connection attempts | connection.attempts | Maximum number of attempts to retrieve a valid JDBC connection. The default is 3. | | JDBC connection backoff (ms)) | connection.backoff.ms | Backoff time between connection attempts. The default is 10000. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. BYTES is the default. | | Kafka message headers format | header.converter | Format of the headers in the Kafka topic. The default is SIMPLE. | | Include tables | table.whitelist | List of tables to include when copying. If specified, you cannot specify the Exclude Tables property. | | Exclude tables | table.blacklist | List of tables to exclude when copying. If specified, you cannot specify the Include Tables property. | | Qualify table names | table.names.qualify | Specifies whether or not to use fully-qualified table names when querying the database. If disabled, queries are performed with unqualified table names. This property may be useful if the database has been configured with a search path that automatically directs unqualified queries to the correct table when there are multiple tables available with the same unqualified name. | | Catalog pattern | catalog.pattern | Catalog pattern used to fetch table metadata from the database. null (default) means that the catalog name is not to be used to narrow the search to fetch all table metadata, regardless of the catalog. `""`retrieves those without a catalog. | | Schema pattern | schema.pattern | Schema pattern used to fetch table metadata from the database: * "" retrieves those without a schema. * null (default) specifies that the schema name is not to be used to narrow the search, so that all table metadata is fetched, regardless of the schema. | | DB time zone | db.timezone | Name of the JDBC timezone that should be used in the connector when querying with time-based criteria. Default is UTC. | | Max rows per batch | batch.max.rows | Maximum number of rows to include in a single batch when polling for new data. You can use this property to limit the amount of data buffered internally in the connector. The default is 100. | | Incrementing column name | incrementing.column.name | The name of the strictly incrementing column to use to detect new rows. An empty value indicates the column should be autodetected by looking for an auto-incrementing column. This column cannot not be nullable. | | Incrementing column initial value | incrementing.initial | For the incrementing column, consider only the rows that have a value greater than this. Specify if you need to pick up rows with negative or zero value, or if you want to skip rows. The default is -1. To avoid excessive memory usage leading to a large data set, carefully select the initial value. | | Table loading mode | mode | The mode for updating a table each time it is polled. Options include:bulk: perform a bulk load of the entire table each time it is polled.incrementing: use a strictly incrementing column on each table to detect only new rows. Note that this does not detect modifications or deletions of existing rows.timestamp: use a timestamp (or timestamp-like) column to detect new and modified rows. Based on the assumption that the column is updated with each write, and that values are monotonically incrementing, but not necessarily unique.timestamp+incrementing: use two columns, a timestamp column that detects new and modified rows, and a strictly incrementing column, which provides a globally unique ID for updates so that each row can be assigned a unique stream offset. | | Map Numeric Values, Integral or Decimal, By Precision and Scale | numeric.mapping | Map NUMERIC values by precision and optionally scale to integral or decimal types:none (default): use if all NUMERIC columns are to be represented by Connect’s DECIMAL logical type. This may lead to serialization issues with Avro because Connect’s DECIMAL type is mapped to its binary representationbest_fit: use if NUMERIC columns should be cast to Connect’s INT8, INT16, INT32, INT64, or FLOAT64 based upon the column’s precision and scale. Is often preferred because it maps to the most appropriate primitive type.precision_only: use to map NUMERIC columns based only on the column’s precision (assuming that column’s scale is 0). | | Poll interval (ms) | poll.interval.ms | Frequency used to poll for new data in each table. The default is 5000. | | Query | query | Specifies the query to use to select new or updated rows. Use to join tables, select subsets of columns in a table, or to filter data. When specified, this connector will only copy data using this query, and whole-table copying will be disabled. Different query modes may still be used for incremental updates, but to properly construct the incremental query, it must be possible to append a WHERE clause to this query (that is, no WHERE clauses can be used). If you use a WHERE clause, it must handle incremental queries itself. | | Quote SQL identifiers | sql.quote.identifiers | Specifies whether or not to delimit (in most databases, a quote with double quotation marks) identifiers (for example, table names and column names) in SQL statements. | | Metadata change monitoring interval (ms) | table.poll.interval.ms | Frequency to poll for new or removed tables, which may result in updated task configurations to start polling for data in added tables, or stop polling for data in removed tables. The default is 60000. | | Table types | table.types | By default, the JDBC connector only detects tables with type TABLE from the source Database. This property allows a command separated list of table types to extract. Options include: TABLE (default) VIEW SYSTEM TABLE GLOBAL TEMPORARY LOCAL TEMPORARY ALIAS SYNONYM. In most cases, it is best to specify TABLE or VIEW. | | Timestamp column name | timestamp.column.name | Comma separated list of one or more timestamp columns to detect new or modified rows using the COALESCE SQL function. Rows whose first non-null timestamp value is greater than the largest previous timestamp value seen aare discovered with each poll. At least one column should not be nullable. | | Delay interval (ms) | timestamp.delay.interval.ms | The amount of time to wait after a row with a certain timestamp appears before including it in the result. You can add a delay to allow transactions with earlier timestamp to complete. The first execution fetches all available records (that is, starting at a timestamp greater than 0) until current time minus the delay. Every following execution will get data from the last time fetched until the current time, minus the delay. | | Initial timestamp (ms) since epoch | timestamp.initial.ms | The initial value of the timestamp when selecting records. Value can be negative. The records having a timestamp greater than the value are included in the result. To avoid excessive memory usage leading to a large data set, carefully select the initial timestamp. | | Validate non null | validate.non.null | By default, the JDBC connector validates that all incrementing and timestamp tables have NOT NULL set for the columns being used as their ID/timestamp. If the tables don’t, then the JDBC connector will fail to start. Setting to false disables these checks. | | Database dialect | dialect.name | The name of the database dialect that should be used for this connector. By default. the connector automatically determines the dialect based upon the JDBC connection URL. Use if you want to override that behavior and specify a specific dialect. | | Topic creation enabled | topic.creation.enable | Specifies whether or not to allow automatic creation of topics. Default is enabled. | | Topic creation partitions | topic.creation.default. partitions | Specifies the number of partitions for the created topics. The default is 1. | | Topic creation replication factor | topic.creation.default. replication.factor | Specifies the replication factor for the created topics. The default is -1. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - You can use Schema Registry as an alternative to the JSON schema. - Use `Kafka message value format` = `AVRO` (`io.confluent.connect.avro.AvroConverter`) to use Schema Registry with `AvroConverter`. Use the following properties to select the database data set to read from: - `Include tables` - `Exclude tables` - `Catalog pattern` - `Schema pattern` ## [](#test-the-connection)Test the connection After the connector is created, check to ensure that: - There are no errors in logs and in Redpanda Console. - Redpanda topics contain data from relational database tables. ## [](#troubleshoot)Troubleshoot Most JDBC Source connector issues are identified in the connector creation phase. Invalid `Include tables` are reported in logs. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | PSQLException: FATAL: database "invalid-database" does not exist | Make sure the JDBC URL specifies an existing database name. | | PSQLException: The connection attempt failed. for configuration Couldn’t open connection / PSQLException: Connection to postgres:1234 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections | Make sure the JDBC URL specifies a valid database host name and port, and that the port is accessible. | | PSQLException: FATAL: password authentication failed for user "postgres" | Verify that the User and Password are correct. | | IllegalArgumentException: Number of groups must be positive. | Make sure Include tables contains a valid tables list.Include tables setting is case-sensitive, even though the underlying database isn’t. Revise Include tables = tablename to Include Tables: tableName.Postgres occasionally refuses a connection for the first time. Retry creating the connector. | --- # Page 361: Create a MirrorMaker2 Checkpoint Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mmaker-checkpoint-connector.md --- # Create a MirrorMaker2 Checkpoint Connector --- title: Create a MirrorMaker2 Checkpoint Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mmaker-checkpoint-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mmaker-checkpoint-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mmaker-checkpoint-connector.adoc description: Use the Redpanda Cloud UI to create a MirrorMaker2 Checkpoint Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use the MirrorMaker2 Checkpoint connector to import consumer group offsets from other Kafka clusters. ## [](#prerequisites)Prerequisites - The external Kafka cluster is accessible. - A service account with read-only access to the external cluster is available. - The Kafka cluster topics connector is running for the same source cluster, with a matching configuration. ## [](#limitations)Limitations The MirrorMaker2 Checkpoint connector does not migrate consumer group offsets that are lower than the highest offsets synced by the MirrorMaker2 Source connector by the time the MirrorMaker2 Checkpoint connector is started. ## [](#create-a-mirrormaker2-checkpoint-connector)Create a MirrorMaker2 Checkpoint connector To create the MirrorMaker2 Checkpoint connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from Kafka cluster offsets**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to replicate | topics | Comma-separated topic names and regexes you want to replicate. | | Source cluster broker list | source.cluster.bootstrap.servers | A comma-separated list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers regardless of which servers are specified here for bootstrapping. | | Source cluster security protocol | source.cluster.security.protocol | The protocol used to communicate with source brokers. The default is PLAINTEXT. | | Source cluster SASL mechanism | source.cluster.sasl.mechanism | SASL mechanism used for connections to source cluster. Default is PLAIN. | | Source cluster SASL username | source.cluster.sasl.username | SASL username used for connections to source cluster. | | Source cluster SASL password | source.cluster.sasl.password | SASL password used for connections to source cluster. | | Groups | groups | Consumer groups to replicate. Supports comma-separated group IDs and regexes. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-mirrormaker2-checkpoint-connector-configuration)Advanced MirrorMaker2 Checkpoint connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Source cluster SSL custom certificate | source.cluster.ssl.truststore.certificates | Trusted certificates in the PEM format. | | Source cluster SSL keystore key | source.cluster.ssl.keystore.key | Private key in the PEM format. | | Source cluster SSL keystore certificate chain | source.cluster.ssl.keystore.certificate.chain | Certificate chain in the PEM format. | | Topics exclude | topics.exclude | Excluded topics. Supports comma-separated topic names and regexes. | | Source cluster alias | source.cluster.alias | When using DefaultReplicationPolicy, topic names will be prefixed with it. | | Replication policy class | replication.policy.class | Class that defines the remote topic naming convention. Use IdentityReplicationPolicy to preserve topic names. DefaultReplicationPolicy prefixes the topic with the source cluster alias. | | Emit checkpoints interval seconds | emit.checkpoints.interval.seconds | Frequency of checkpoints. The default is 60. | | Sync group offsets enabled | sync.group.offsets.enabled | Specifies whether or not to periodically write the translated offsets to the __consumer_offsets topic in the target cluster, as long as no active consumers in that group are connected to the target cluster. | | Sync group offsets interval seconds | sync.group.offsets.interval.seconds | Frequency of consumer group offset sync. The default is 60. | | Refresh groups interval seconds | refresh.groups.interval.seconds | Frequency of group refreshes. The default is 600. | | Offset-Syncs topic location | offset-syncs.topic.location | The location (source or target) of the offset-syncs topic. The default is source. | | Checkpoints topic replication factor | checkpoints.topic.replication.factor | Replication factor for checkpoints topic. The default is -1. | ## [](#test-the-connection)Test the connection After the connector is created: - Ensure that there are no errors in logs and in Redpanda Console. - Wait for the Kafka cluster topics connector to catch up. Then check to confirm that the consumer groups are replicated. ## [](#use-the-connectors-api)Use the Connectors API When using the Connectors API, instead of specifying a value for `source.cluster.sasl.username` and `source.cluster.sasl.password`, you can specify a value for `source.cluster.sasl.jaas.config`. ## [](#troubleshoot)Troubleshoot Most MirrorMaker2 Checkpoint connector issues are reported as a failed task at the time of creation. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | Connection to node -1 (/127.0.0.1:9092) could not be established. Broker may not be available. / LOGS: Timed out while checking for or creating topic 'mm2-offset-syncs.target.internal'. This could indicate a connectivity issue / TimeoutException: Timed out waiting for a node assignment | Make sure broker URLs are correct and that the source cluster security protocol is correct. | | SaslAuthenticationException: SASL authentication failed: security: Invalid credentials | Check to confirm that the username and password specified are correct. | | java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka config | Check to confirm that the username and password specified are correct. | | Client SASL mechanism 'PLAIN' not enabled in the server, enabled mechanisms are [SCRAM-SHA-256, SCRAM-SHA-512] | Check to confirm that the respective Source cluster SASL mechanism is correct. | | SaslAuthenticationException: SASL authentication failed: security: Invalid credentials | Make sure the respective Source cluster SASL mechanism is correct (for example, SCRAM-SHA-256 instead of SCRAM-SHA-512). | | terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue | Enable the SSL using Source cluster security protocol (specify SSL or SASL_SSL). | --- # Page 362: Create a MirrorMaker2 Heartbeat Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mmaker-heartbeat-connector.md --- # Create a MirrorMaker2 Heartbeat Connector --- title: Create a MirrorMaker2 Heartbeat Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mmaker-heartbeat-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mmaker-heartbeat-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mmaker-heartbeat-connector.adoc description: Use the Redpanda Cloud UI to create a MirrorMaker2 Heartbeat Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a MirrorMaker2 Heartbeat connector to generate heartbeat messages to a local cluster’s `heartbeat` topic. There are no prerequisites or limitations associated with this connector. ## [](#create-a-mirrormaker2-heartbeat-connector)Create a MirrorMaker2 Heartbeat connector To create the MirrorMaker2 Heartbeat connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from Heartbeat**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Emit heartbeats interval seconds | emit.heartbeats.interval.seconds | Frequency of heartbeats. The default is 1. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-mirrormaker2-heartbeat-connector-configuration)Advanced MirrorMaker2 Heartbeat connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Source cluster alias | source.cluster.alias | Used to generate the heartbeat topic key. The default is source. | | Target cluster alias | target.cluster.alias | Used to generate the heartbeat topic key. The default is target. | | Heartbeats topic replication factor | heartbeats.topic.replication.factor | Replication factor for heartbeats topic. The default is -1. | ## [](#test-the-connection)Test the connection After the connector is created, check to ensure that: - There are no errors in logs and in Redpanda Console. - Check to confirm the `heartbeat` topic has heartbeat messages. --- # Page 363: Create a MirrorMaker2 Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mmaker-source-connector.md --- # Create a MirrorMaker2 Source Connector --- title: Create a MirrorMaker2 Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mmaker-source-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mmaker-source-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mmaker-source-connector.adoc description: Use the Redpanda Cloud UI to create a MirrorMaker2 Source Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a MirrorMaker2 Source connector to import messages from another Kafka cluster. You can also use it to: - Replicate messages from an external Kafka or Redpanda cluster. - Create topics on the local cluster, with a configuration matching external topics. - Replicate topic access-control lists (ACLs). ## [](#prerequisites)Prerequisites - The external Kafka cluster must be accessible. - A service account with full access to the external cluster must be available. You can also use a service account with read-only ACLs when the `offset-syncs` topic location is set to `target`. You must have describe and/or describe-configs ACLs for the connector to read topic configurations on the source cluster and create the topics on the target cluster, unless you create the topics yourself. ## [](#limitations)Limitations - ACLs are copied, but service accounts are not created. - Only topic ACLs are copied (group ACLs are not). - Only ACLs for topics matching the connector configuration are copied (write ACLs are not copied). - All permissions ACLs are downgraded to read-only. ## [](#create-a-mirrormaker2-source-connector)Create a MirrorMaker2 Source connector To create the MirrorMaker2 Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from Kafka cluster topics**. 3. On the **Create Connector** form page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Regexes of topics to import | topics | Comma-separated topic names and regexes you want to replicate. | | Source cluster broker list | source.cluster.bootstrap.servers | A comma-separated list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers regardless of which servers are specified here for bootstrapping. This list only impacts the initial hosts used to discover the full set of servers, and should be in the form host1:port1,host2:port2,.... Because these servers are only used for the initial connection to discover the full cluster membership (which may change dynamically), it need not contain the full set of servers (you may want more than one, though, in case a server is down). | | Source cluster security protocol | source.cluster.security.protocol | The protocol to use to communicate with source brokers. Default is PLAINTEXT. | | Source cluster SASL mechanism | source.cluster.sasl.mechanism | SASL mechanism used for connections to source cluster. Default is PLAIN. | | Source cluster SASL username | source.cluster.sasl.username | SASL username used for connections to source cluster. | | Source cluster SASL password | source.cluster.sasl.password | SASL password used for connections to source cluster. | | Sync topic configs enabled | sync.topic.configs.enabled | Specifies whether to periodically configure remote topics to match their corresponding upstream topics. | | Sync topic ACLs enabled | sync.topic.acls.enabled | Specifies whether or not to periodically configure remote topic ACLs to match their corresponding upstream topics. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. > 📝 **NOTE** > > Offsets are not guaranteed to match between the source and target. For example, if data-retention deletes occur on the source topic and the earliest offset is `#5000`, then when that event is created on the target topic the offset for that event will be `#0`. > > Events written on the target topic use the timestamp that was set on the source event. For example, if the source event has a timestamp `2023-05-22 17:00`, then this would also be the timestamp on the target event. ### [](#advanced-mirrormaker2-source-connector-configuration)Advanced MirrorMaker2 Source connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Source cluster SSL custom certificate | source.cluster.ssl.truststore.certificates | Trusted certificates in the PEM format. | | Source cluster SSL keystore key | source.cluster.ssl.keystore.key | Private key in the PEM format. | | Source cluster SSL keystore certificate chain | source.cluster.ssl.keystore.certificate.chain | Certificate chain in the PEM format. | | Sync topic configs interval seconds | sync.topic.configs.interval.seconds | Frequency of topic config sync. | | Sync topic ACLs interval seconds | sync.topic.acls.interval.seconds | Frequency of topic ACL sync. | | Topics exclude | topics.exclude | Excluded topics. Supports comma-separated topic names and regexes. | | Source cluster alias | source.cluster.alias | When using DefaultReplicationPolicy, topic names will be prefixed with it. | | Replication policy class | replication.policy.class | Class that defines the remote topic naming convention. Use IdentityReplicationPolicy to preserve topic names. DefaultReplicationPolicy prefixes the topic with the source cluster alias. | | Replication factor | replication.factor | Replication factor for newly created remote topics. Set -1 for cluster default. | | Refresh topics interval seconds | refresh.topics.interval.seconds | Frequency of topic refresh. | | Offset-Syncs topic location | offset-syncs.topic.location | The location (source or target) of the offset-syncs topic. The default is source. | | Offset-Syncs topic replication factor | offset-syncs.topic.replication.factor | Replication factor for offset-syncs topic. The default is -1. | | Config properties exclude | config.properties.exclude | Topic config properties that should not be replicated. Supports comma-separated property names and regexes. | | Compression type | producer.override.compression.type | The compression type for all data generated by the producer. The default is none (no compression). | | Max size of a request | producer.override.max.request.size | The maximum size of a request in bytes. The default is 1048576. | | Auto offset reset | consumer.auto.offset.reset | What to do when there is no initial offset in Kafka, or if the current offset does not exist any more on the server (for example, because that data has been deleted). 'earliest' - automatically reset the offset to the earliest offset. 'latest' - automatically reset the offset to the latest offset. 'none' - throw exception to the consumer if no previous offset is found for the consumer’s group. | | Offset lag max | offset.lag.max | How out-of-sync a remote partition can be before it is resynced. This setting impacts the MirrorMaker2 Checkpoint connector as it is the maximum lag for syncing consumer groups. The default is 100 records. | ## [](#map-data)Map data The value converter does not require any schema; it copies data as bytes. ## [](#test-the-connection)Test the connection After the connector is created: - Ensure that there are no errors in logs and in Redpanda Console. - Confirm that Redpanda topics are being replicated. You should see messages coming into the topics. ## [](#use-the-connectors-api)Use the Connectors API When using the Connectors API, instead of specifying a value for `source.cluster.sasl.username` and `source.cluster.sasl.password`, you can specify a value for `source.cluster.sasl.jaas.config`. ## [](#troubleshoot)Troubleshoot Most MirrorMaker2 Source connector issues are reported as a failed task at the time of creation. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | Connection to node -1 (/127.0.0.1:9092) could not be established. Broker may not be available. / LOGS: Timed out while checking for or creating topic 'mm2-offset-syncs.target.internal'. This could indicate a connectivity issue / TimeoutException: Timed out waiting for a node assignment | Make sure broker URLs are correct and that the security.protocol is correct. | | SaslAuthenticationException: SASL authentication failed: security: Invalid credentials | Confirm that the username and password specified are correct. | | Terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue | Error indicates that the SSL should be enabled using Source cluster security protocol (use SSL or SASL_SSL). | | RecordTooLargeException: The message is N bytes (…​) | Use producer.override.max.request.size property to change max request size. | | RecordTooLargeException: The request included (…​) | The target server is not able to receive messages because it is too large in size. Disabled compression can be a root cause. Consider enabling compression: "Compression type": "snappy", | | Scheduler for MirrorSourceConnector caught exception in scheduled task: syncing topic ACLs | MirrorMaker2 requires an authorizer to be configured by the broker side, but it is not. Change the Sync topic ACLs enabled MirrorMaker2 property to false (default is true) to disable ACL syncing. | | TopicAuthorizationException: Topic authorization failed | Confirm the service account for the source cluster contains describe and/or describe-configs ACLs. | | OffsetOutOfRangeException Fetch position FetchPosition{offset=0, …​ ] | If the 0 offset for your topic does not exist in the source cluster, set Auto offset reset to either earliest or latest. | --- # Page 364: Create a MongoDB Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mongodb-sink-connector.md --- # Create a MongoDB Sink Connector --- title: Create a MongoDB Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mongodb-sink-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mongodb-sink-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mongodb-sink-connector.adoc description: Use the Redpanda Cloud UI to create a MongoDB Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. The MongoDB Sink managed connector exports Redpanda structured data to a MongoDB database. ## [](#prerequisites)Prerequisites - Valid credentials with the `readWrite` role to access the MongoDB database. For more granular access, you need to allow `insert`, `remove` and `update` actions for specific databases or collections. ## [](#limitations)Limitations If you want to use the MongoDB sink connector with the `MongoDB` CDC handler for data sourced from MongoDB (using the MongoDB source connector), you must select `STRING` or `BYTES` as the value converter for both the source and sink connectors. ## [](#create-a-mongodb-sink-connector)Create a MongoDB Sink connector To create a MongoDB Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to MongoDB Sink**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | A comma-separated list of the cluster topics you want to export to MongoDB. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | MongoDB Connection URL | connection.url | The MongoDB connection URI string to connect to your MongoDB instance or cluster. For example, mongodb://locahost/. | | MongoDB username | connection.username | A valid MongoDB user. | | MongoDB password | connection.password | The password for the account associated with the MongoDB user. | | MongoDB database name | database | The name of an existing MongoDB database to store output files in. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. Default is STRING. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. Default is STRING. | | Default MongoDB collection name | collection | (Optional). Single sink collection name to write to. If following multiple topics, then this will be the default collection to which they are mapped. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-mongodb-sink-connector-configuration)Advanced MongoDB Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | CDC handler | change.data.capture.handler | The CDC (change data capture) handler to use for processing. The MongoDB handler requires plain JSON or BSON format. The default is NONE. | | Key projection type | key.projection.type | The type of key projection to use: either AllowList or BlockList. | | Key projection list | key.projection.list | A comma-separated list of field names for key projection. | | Value projection type | value.projection.type | Only use with Value projection list. The type of value projection to use: AllowList or BlockList. The default is NONE. | | Value projection list | value.projection.list | A comma-separated list of field names for value projection. | | Field renamer mapping | field.renamer.mapping | An inline JSON array with objects describing field name mappings. For example: [{"oldName":"key.fieldA","newName":"field1"},{"oldName":"value.xyz","newName":"abc"}]. | | Field used for time | timeseries.timefield | Name of the top level field used for time. Inserted documents must specify this field, and it must be of the BSON datetime type. | | Field describing the series | timeseries.metafield | The name of the top-level field that contains metadata in each time series document. The metadata in the specified field should be data that is used to label a unique series of documents. The metadata should rarely, if ever, change. This field is used to group related data and may be of any BSON type, except for array. The metadata field may not be the same as the timeField or _id. | | Convert the field to a BSON datetime type | timeseries.timefield.auto.convert | Converts the timeseries field to a BSON datetime type. If the value is a numeric value it will use the milliseconds from epoch. Any fractional parts are discarded. If the value is a STRING it will use the timeseries.timefield.auto.convert.date.format property to parse the date. | | DateTimeFormatter pattern for the date | timeseries.timefield.auto.convert .date.format | The DateTimeFormatter pattern to use when converting string dates. Defaults to support ISO style date times. A string is expected to contain both the date and time. If the string only contains date information, then the time since epoch is taken from the start of that day. If a string representation does not contain a timezone offset, then the extracted date and time is interpreted as UTC. | | Data expiry time in seconds | timeseries.expire.after.seconds | The amount of time in seconds that the data will be kept in MongoDB before being automatically deleted. | | Data expiry time | timeseries.granularity | The expected interval between subsequent measurements for a time series. Possible values are "seconds", "minutes" or "hours". | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` (`org.apache.kafka.connect.json.JsonConverter`) when your messages are structured JSON. Select `Message JSON contains schema`, with the `schema` and `payload` fields. - `AVRO` (`io.confluent.connect.avro.AvroConverter`) when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. - `STRING` (`org.apache.kafka.connect.storage.StringConverter`) when your messages contain plaintext JSON. - `BYTES` (`org.apache.kafka.connect.converters.ByteArrayConverter`) when your messages contain BSON. ## [](#test-the-connection)Test the connection After the connector is created, verify that your new collections apper in your MongoDB database: show collections ## [](#use-the-connectors-api)Use the Connectors API When using the Connectors API, instead of specifying a value for `connection.url`, `connection.username`, and `connection.password`, you can specify a value for `connection.uri` in the form `mongodb+srv://username:password@cluster0.xxx.mongodb.net`. ## [](#troubleshoot)Troubleshoot Issues are reported using a failed task error message. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | Invalid value wrong_uri for configuration connection.uri: The connection string is invalid. Connection strings must start with either 'mongodb://' or 'mongodb+srv:// | Check to make sure the Connection URI is a valid MongoDB URL. | | Unable to connect to the server. | Check to ensure that the Connection URI is valid and that the MongoDB server accepts connections. | | Invalid user permissions authentication failed. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='user', source='admin', password=, mechanismProperties=}. | Check to ensure that you specified valid username and password credentials. | | DataException: Could not convert key into a BsonDocument. | Make sure your message keys are valid JSONs or skip configuration for fields that require valid JSON keys. | | DataException: Error: operationType field doc is missing. | Make sure the input record format is correct (produced by a MongoDB source connector if you use MongoDB CDC handler). | | DataException: Value document is missing or CDC operation is not a string | Make sure the input record format is correct (produced by a Debezium source connector if you use Debezium CDC handler). | | JsonParseException: Unrecognized token 'text': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false') | Make sure the input record format is JSON. | | Unexpected documentKey field type, expecting a document but found BsonString…​: {…​} | Make sure the source data is in the plain JSON or BSON format (value converter STRING or BYTES). | ## [](#suggested-reading)Suggested reading - [MongoDB Kafka Sink Connector](https://www.mongodb.com/docs/kafka-connector/current/sink-connector/) --- # Page 365: Create a MongoDB Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mongodb-source-connector.md --- # Create a MongoDB Source Connector --- title: Create a MongoDB Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mongodb-source-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mongodb-source-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mongodb-source-connector.adoc description: Use the Redpanda Cloud UI to create a MongoDB Source Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. The MongoDB Source managed connector imports collections from a MongoDB database into Redpanda topics. ## [](#prerequisites)Prerequisites - Valid credentials with the `read` role to access the MongoDB database. For more granular access, you need to allow `find` and `changeStream` actions for specific databases or collections. ## [](#create-a-mongodb-source-connector)Create a MongoDB Source connector To create a MongoDB Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from MongoDB**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topic prefix | topic.prefix | Prefix to prepend to database and collection names to generate the name of the Kafka topic to which to publish data. Used by the DefaultTopicMapper. | | MongoDB Connection URL | connection.url | The MongoDB connection URL string as supported by the official drivers. For example, mongodb://locahost/. | | MongoDB username | connection.username | A valid MongoDB user. | | MongoDB password | connection.password | The password for the account associated with the MongoDB user. | | Database to watch | database | The MongoDb database from which the connector imports data into Redpanda topics. The connector monitors changes in this database. Leave the field empty to watch all databases. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. Default is STRING. Use AVRO or JSON for schematic output, STRING for plain JSON, or BYTES for BSON. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. Default is STRING. Use AVRO or JSON for schematic output, STRING for plain JSON, or BYTES for BSON. | | Collection to watch | collection | The collection in the MongoDB database to watch. If not set, then all collections are watched. | | Start up behavior when there is no source offset available | startup.mode | Specifies how the connector should start up when there is no source offset available. Resuming a change stream requires a resume token, which the connector stores as reads from the source offset. If no source offset is available, the connector may either ignore all or some existing source data, or may at first copy all existing source data and then continue with processing new data. Possible values are:latest (default): The connector creates a new change stream, processes change events from it and stores resume tokens from them, thus ignoring all existing source data.timestamp: actuates startup.mode.timestamp.* properties. If no such properties are configured, then timestamp is equivalent to latest.copy_existing: actuates startup.mode.copy.existing.* properties. The connector creates a new change stream and stores its resume token, copies all existing data from all the collections being used as the source, then processes new data starting from the stored resume token. Note that reads of all the data during the copy and subsequent change stream events may produce duplicated events. During the copy, clients can make changes to the source data, which may be represented both by the copying process and the change stream. However, as the change stream events are idempotent, it’s possible to apply them multiple times with the same effect as if they were applied once. Renaming a collection during the copying process is not supported. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-mongodb-source-connector-configuration)Advanced MongoDB Source connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Enable Infer Schemas for the value | output.schema.infer.value | Specifies whether or not to infer the schema for the value. Each Document is processed in isolation, which may lead to multiple schema definitions for the data. Only enable when Kafka message value format is set to AVRO or JSON. | | startAtOperationTime | startup.mode.timestamp .start.at.operation.time | Actuated only if startup.mode = timestamp specifies the starting point for the change stream. Must be either an integer number of seconds because the Epoch is in the decimal format (for example: 30), or an instant in the ISO-8601 format with one second precision (for example: 1970-01-01T00:00:30Z), or a BSON timestamp in the canonical extended JSON (v2) format (for example: {"$timestamp": {"t": 30, "i": 0}}). You can specify 0 to start at the beginning of the oplog. Requires MongoDB 4.0 or above. For more detail, see the $changeStream definition. | | Copy existing namespace regex | startup.mode.copy.existing .namespace.regex | Use a regular expression to define which existing namespaces data should be copied from. A namespace is the database name and collection, separated by a period (for example, database.collection). Example: The following regular expression only includes collections starting with a in the demo database: demo\.a.*. | | Copy existing initial pipeline | startup.mode.copy.existing .pipeline | An inline JSON array with objects describing the pipeline operations to run when copying existing data. Specifying this property can improve the use of indexes by the copying manager and make copying more efficient. Use this property if there is any filtering of collection data in the pipeline configuration to speed up the copying process. For example: [{"$match": {"closed": "false"}}]. | | Pipeline to apply to the change stream | pipeline | An inline JSON array with objects describing the pipeline operations to run. For example: [{"$match": {"operationType": "insert"}}, {"$addFields": {"Kafka": "Rules!"}}]. | | fullDocument | change.stream.full.document | Specifies what to return for update operations when using a change stream. When set to updateLookup, the change stream for partial updates will include both a delta describing the changes to the document, and a copy of the entire document that was changed _ at some point_ after the change occurred. See db.collection.watch for more detail. | | fullDocumentBeforeChange | change.stream.full.document .before.change | Specifies the pre-image configuration when creating a change stream. The pre-image is not available in source records published while copying existing data as a result of enabling copy.existing. The pre-image configuration has no effect on copying. Requires MongoDB 6.0 or above. For details, see possible values. | | Publish only the fullDocument | publish.full.document.only | When enabled, only publishes the actual changed document (rather than the full change stream document). Automatically sets change.stream.full.document=updateLookup so updated documents will be included. | | Send a null value on a delete event | publish.full.document.only .tombstone.on.delete | When enabled, requires publish.full.document.only=true. Default is false (disabled). | | Error tolerance | mongo.errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Heartbeat interval milliseconds | heartbeat.interval.ms | The length of time it takes when sending heartbeat messages to record the post-batch resume token when no source records have been published. Improves the resumability of the connector for low volume namespaces. Specify 0 to disable. | | heartbeat topic name | heartbeat.topic.name | The name of the topic to publish heartbeats to. Defaults to __mongodb_heartbeats. | | Offset partition name | offset.partition.name | Use to specify a custom offset partition name. If blank, the default partition name based on the connection details is used. | | Topic creation enabled | topic.creation.enable | Specifies whether or not to allow automatic creation of topics. Default is true. | | Topic creation partitions | topic.creation.default. partitions | Specifies the number of partitions for the created topics. The default is 1. | | Topic creation replication factor | topic.creation.default. replication.factor | Specifies the replication factor for the created topics. The default is -1. | ## [](#map-data)Map data - `AVRO` (`io.confluent.connect.avro.AvroConverter`) or `JSON` (`org.apache.kafka.connect.json.JsonConverter`) for output with a preset schema. Additionally, you can set `Enable Infer Schemas` for the value. Each document will be processed in isolation, which may lead to multiple schema definitions for the data. - `STRING` (`org.apache.kafka.connect.storage.StringConverter`) when your messages contain plaintext JSON. - `BYTES` (`org.apache.kafka.connect.converters.ByteArrayConverter`) when your messages contain BSON. After the connector is created, check to ensure that: - There are no errors in logs and in Redpanda Console. - Redpanda topics contain data from relational database tables. ## [](#use-the-connectors-api)Use the Connectors API When using the Connectors API, instead of specifying a value for `connection.url`, `connection.username`, and `connection.password`, you can specify a value for `connection.uri` in the form `mongodb+srv://username:password@cluster0.xxx.mongodb.net`. ## [](#troubleshoot)Troubleshoot Most MongoDB Source connector issues are identified in the connector creation phase. Invalid Include Tables are reported in logs. Select **Show Logs** to view error details. | Message | Action | | --- | --- | | Invalid value wrong_uri for configuration connection.uri: The connection string is invalid. Connection strings must start with either 'mongodb://' or 'mongodb+srv:// | Check to make sure the MongoDB Connection URL is a valid MongoDB URL. | | Unable to connect to the server. | Check to ensure that the MongoDB Connection URL is valid and that the MongoDB server accepts connections. | | Invalid user permissions authentication failed. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='user', source='admin', password=, mechanismProperties=}. | Check to ensure that you specified valid username and password credentials. | | MongoCommandException: Command failed with error 8000 (AtlasError): 'user is not allowed to do action [find] on [db1.characters]' on server ac-nboibsg-shard-00-01.4hagsz0.mongodb.net:27017. The full response is {"ok": 0, "errmsg": "user is not allowed to do action [find] on [db1.characters]", "code": 8000, "codeName": "AtlasError"} | Check the permissions of the MongoDB user. Also confirm that the MongoDB server accepts connections. | | Command failed with error 286 (ChangeStreamHistoryLost): 'PlanExecutor error during aggregation :: caused by :: Resume of change stream was not possible, as the resume point may no longer be in the oplog | See Troubleshoot invalid resume token | ## [](#suggested-reading)Suggested reading - [MongoDB Kafka Source Connector](https://www.mongodb.com/docs/kafka-connector/current/source-connector/) --- # Page 366: Create a MySQL (Debezium) Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-mysql-source-connector.md --- # Create a MySQL (Debezium) Source Connector --- title: Create a MySQL (Debezium) Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-mysql-source-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-mysql-source-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-mysql-source-connector.adoc description: Use the Redpanda Cloud UI to create a MySQL (Debezium) Source Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a MySQL (Debezium) Source connector to import a stream of changes from MySQL, AmazonRDS, and Amazon Aurora. ## [](#prerequisites)Prerequisites - A MySQL database that is accessible from the connector instance. - A MySQL user exists. This database user for the Debezium connector must have LOCK TABLES privileges. For details, see [MySQL Creating a user](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-creating-user). - A [binlog must be enabled](https://debezium.io/documentation/reference/stable/connectors/mysql.html#enable-mysql-binlog) for the source MySQL cluster. ## [](#limitations)Limitations - Only `JSON`, `CloudEvents` or `AVRO` formats can be used as a a Kafka message key and value format. - The MySQL (Debezium) Source connector can work with only a single task at a time. ## [](#create-a-mysql-debezium-source-connector)Create a MySQL (Debezium) Source connector To create the MySQL (Debezium) Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from MySQL (Debezium)**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topic prefix | topic.prefix | A topic prefix that identifies and provides a namespace for the particular database server/cluster that is capturing changes. The topic prefix should be unique across all other connectors because it is used as a prefix for all Kafka topic names that receive events emitted by this connector. Only alphanumeric characters, hyphens, dots, and underscores are accepted. | | Hostname | database.hostname | A resolvable hostname or IP address of the MySQL database server. | | Port | database.port | Integer port number of the MySQL database server. | | User | database.user | Name of the MySQL user to be used when connecting to the MySQL database. | | Password | database.password | The password of the MySQL database user who will be connecting to the MySQL database. | | SSL mode | database.ssl.mode | Specifies whether to use an encrypted connection to the MySQL server. Select disable to use an unencrypted connection. Select 'preferred' to use an encrypted connection if the server supports secure connections. If the server does not support secure connections, falls back to an unencrypted connection. Select require to use a secure, or encrypted connection. If a secure connection cannot be established when required is selected, then the connector fails. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. | | Message key JSON contains schema | key.converter.schemas.enable | Enable to specify that the message key contains schema in the schema field. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. | | Message value JSON contains schema | value.converter.schemas.enable | Enable to specify that the message value contains schema in the schema field. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ## [](#map-data)Map data Use `Include databases`, `Include tables`, and `Include columns` to define data mapping. Alternatively, use `Exclude databases`, `Exclude tables`, and `Exclude columns`. Following is an example table in `db` database: ```sql CREATE TABLE IF NOT EXISTS Persons ( Id int PRIMARY KEY, FirstName varchar(255), LastName varchar(255) ); ``` The table has one record: ```sql INSERT INTO Persons (FirstName, LastName) VALUES (1, 'Winnie', 'the Pooh'); ``` The connector configuration for the table: ```bash column.include.list = db\\.Persons\\.(Id|FirstName|LastName) table.include.list = db\\.Persons database.include.list = db topic.prefix = frommysql ``` The connector configuration will create the Redpanda topic `frommysql.db.Persons`. For `Kafka message value format` = `JSON` (`org.apache.kafka.connect.json.JsonConverter`), the connector produces JSON messages with a schema like the following: ```json { "payload": { "schema": { // schema definition }, "payload": { "before": null, "after": { "Id": 1, "FirstName": "Winnie", "LastName": "the Pooh" }, ... } }, "encoding": "json", "schemaId": 0 } ``` For `Kafka message value format` = `AVRO` (`io.confluent.connect.avro.AvroConverter`), the connector creates a Schema Registry `frommysql.db.Persons-value` record and produces messages like the following: ```js { "payload": { "before": null, "after": { "mysql.db.Persons.Value": { "Id": 1, "FirstName": { "string": "Winnie" }, "LastName": { "string": "the Pooh" } } }, ... }, "encoding": "avro", "schemaId": 2 } ``` For `Kafka message value format` = `CloudEvents` (`io.debezium.converters.CloudEventsConverter`), the connector uses `JSON` or `AVRO` data serializer. - For `JSON` data serializer, enable `Message value CloudEvents JSON contains schema` to include JSON schema in message - For `AVRO` data serializer, connector creates schema in Schema Registry and produces messages in CloudEvents data format. ## [](#test-the-connection)Test the connection After the connector is created: - Check the connector status and confirm that there are no errors in logs and in Redpanda Console. - Review the Redpanda topic to confirm that it contains the expected data. ## [](#troubleshoot)Troubleshoot If the connector configuration is invalid, an error appears upon clicking **Finish**. If the connector fails, check the error message or select **Show Logs** to view error details. - **Topics not created by the connector** Create the topic manually or let the connector create it by setting (use desired number of partitions and replication factor): Topic creation enabled: true Topic creation partitions: 1 Topic creation replication factor: -1 Or in JSON: ```json "topic.creation.enable": true, "topic.creation.default.partitions": "1", "topic.creation.default.replication.factor": "-1" ``` - **Connector requires binlog file 'mysql-bin-changelog.257116', but MySQL only has mysql-bin-changelog.257123** Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted" Connector requires binlog file 'mysql-bin-changelog.257116', but MySQL only has mysql-bin-changelog.257123, mysql-bin-changelog.257124, mysql-bin-changelog.257125 The connector needs a binlog file that was already purged. Change the `Snapshot mode` property from the default to `when_needed`. Additional errors and corrective actions follow. | Message | Action | | --- | --- | | Unable to connect: Public Key Retrieval is not allowed | Set Allow public key retrieval property to true. | | Unable to connect: Communications link failure | Confirm that Hostname and Port are correct. | | Access denied for user | Confirm that User and Password credentials are valid. | | Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Invalid schema Invalid namespace: from-mysql.db.Persons; error code: 422 | The Schema Registry namespace is incorrect. Consider changing the Topic prefix value, remove unallowed characters. | ## [](#suggested-reading)Suggested reading - [Debezium connector for MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html) --- # Page 367: Create a PostgreSQL (Debezium) Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-postgresql-connector.md --- # Create a PostgreSQL (Debezium) Source Connector --- title: Create a PostgreSQL (Debezium) Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-postgresql-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-postgresql-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-postgresql-connector.adoc description: Use the Redpanda Cloud UI to create a PostgreSQL (Debezium) Source Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use a PostgreSQL (Debezium) Source connector to import updates to Redpanda from PostgreSQL. ## [](#prerequisites)Prerequisites Before you can create a PostgreSQL (Debezium) Source connector in the Redpanda Cloud, you must: - [Make the PostgreSQL (Debezium) database accessible](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-security) from connectors instance. - [Create a PostgreSQL (Debezium) user](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-permissions) with the necessary permissions. ## [](#limitations)Limitations The PostgreSQL (Debezium) Source connector has the following limitations: - Only `JSON`, `CloudEvents` or `AVRO` formats can be used for a Kafka message key and value format. - PostgreSQL (Debezium) connector can work with only a single task at a time. ## [](#create-a-postgresql-debezium-source-connector)Create a PostgreSQL (Debezium) Source connector To create the PostgreSQL (Debezium) Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from PostgreSQL (Debezium)**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topic prefix | topic.prefix | A topic prefix that identifies and provides a namespace for the particular database server/cluster that is capturing changes. The topic prefix should be unique across all other connectors because it is used as a prefix for all Kafka topic names that receive events emitted by this connector. Only alphanumeric characters, hyphens, dots, and underscores are accepted. | | Hostname | database.hostname | A resolvable hostname or IP address of the PostgreSQL database server. | | Port | database.port | Integer port number of the PostgreSQL database server. | | User | database.user | Name of the PostgreSQL user to be used when connecting to the PostgreSQL database. | | Password | database.password | The password of the PostgreSQL database user who will be connecting to the PostgreSQL database. | | Database | database.dbname | The name of the database from which the connector will import changes. | | SSL mode | database.sslmode | Specifies whether to use an encrypted connection to the PostgreSQL server. Select disable to use an unencrypted connection. Select require to use a secure, or encrypted connection. If a secure connection cannot be established when required is selected, then the connector fails. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. | | Message key JSON contains schema | key.converter.schemas.enable | Enable to specify that the message key contains schema in the schema field. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. | | Message value JSON contains schema | value.converter.schemas.enable | Enable to specify that the message value contains schema in the schema field. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - Use `Include Schemas`, `Include Tables` and `Include Columns` properties to define lists of columns, tables, and schemas to read from. Alternatively, use `Exclude Schemas`, `Exclude Tables`, and `Exclude Columns` to define lists of columns, tables, and schemas to exclude from sources list. - Use only `JSON` (`org.apache.kafka.connect.json.JsonConverter`), `AVRO` (`io.confluent.connect.avro.AvroConverter`) and `CloudEvents` (`io.debezium.converters.CloudEventsConverter`) formats for the Kafka message key and value format. ## [](#test-the-connection)Test the connection After the connector is created: 1. Open Redpanda Console, click the **Topics** tab and select a topic. Check to check to confirm that it contains data migrated from PostgreSQL. Alternatively, use the `rpk consume` to check the topic. 2. Click the **Connectors** tab to confirm no issues have been reported for the connector. ## [](#troubleshoot)Troubleshoot If the connector configuration is invalid, an error appears upon clicking **Finish**. Select **Show Logs** to view error details. Additional errors and corrective actions follow. | Message | Action | | --- | --- | | Missing tables or topics | The Debezium connector replicates tables one by one. Wait for other tables to be replicated. If the database is quite large, then replication takes longer to complete. | | non-existing-db | Make sure the provided database name in Database is correct, and that the database exists. | | The connection attempt failed / Connection to postgres:9999 refused | Check to make sure that hostname and port are correct. | | Password authentication failed for user | Make sure that the User and Password credentials are valid. | | The Plugin name value is invalid | Make sure that Plugin contains a valid value, either decoderbufs or pgoutput. | | Postgres server wal_level property is replica | Specify wal_level as logical for your database. | | RecordTooLargeException: The message is 1050766 bytes when serialized, which is larger than 1048576, the value of the max.request.size configuration. | Increase the max request size to unblock the connector and allow large messages to pass: "producer.override.max.request.size": "209715200". The connector may be reaching memory limits and failing if the amount of data to pass or your messages are too large. | ## [](#suggested-reading)Suggested reading - [Debezium connector for PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html) --- # Page 368: Create an S3 Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-s3-sink-connector.md --- # Create an S3 Sink Connector --- title: Create an S3 Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-s3-sink-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-s3-sink-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-s3-sink-connector.adoc description: Use the Redpanda Cloud UI to create an AWS S3 Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. The Amazon S3 Sink connector exports Apache Kafka messages to files in AWS S3 buckets. ## [](#prerequisites)Prerequisites Before you can create an AWS S3 sink connector in the Redpanda Cloud, you must complete these tasks: 1. [Create an AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html). 2. [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html) that you will send data to. 3. [Create an IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) that will be used to connect to the S3 service. 4. [Attach the following policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_change-permissions.html) to the user, replacing `bucket-name` with the name you specified in step 2. ```js { "Version": "2012-10-17", "Statement": [ { "Principal": "*", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts", "s3:ListBucketMultipartUploads" ], "Resource": [ "arn:aws:s3:::bucket-name/*", "arn:aws:s3:::bucket-name" ] } ] } ``` 5. [Create access keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) for the user created in step 3. 6. Copy the access key ID and the secret access key. You will need them to configure the connector. ## [](#limitations)Limitations - You can use only the `STRING` and `BYTES` input formats for `CSV` output format. - You can use only the `PARQUET` format when your messages contain schema. ## [](#create-an-aws-s3-sink-connector)Create an AWS S3 Sink connector To create the AWS S3 Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to S3**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | Comma-separated list of the cluster topics whose records will be exported to the S3 bucket. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | AWS access key ID | aws.access.key.id | Enter the AWS access key ID. | | AWS secret access key | aws.secret.access.key | Enter the AWS secret access key. | | AWS S3 bucket name | aws.s3.bucket.name | Specify the name of the AWS S3 bucket to which the connector is to send data. | | AWS S3 region | aws.s3.region | Select the region for the S3 bucket used for storing the records. The default us-east-1. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. The default is BYTES. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. The default is BYTES. | | S3 file format | format.output.type | Format of the files created in S3: CSV (the default), AVRO, JSON, JSONL, or PARQUET. You can use the CSV format output only with BYTES and STRING. | | Avro codec | avro.codec | The Avro compression codec to be used for Avro output files. Available values: null (the default), deflate, snappy, and bzip2. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-aws-s3-sink-connector-configuration)Advanced AWS S3 Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | File name template | file.name.template | The template for file names on S3. Supports {{ variable }} placeholders for substituting variables. Supported placeholders are:topicpartitionstart_offset (the offset of the first record in the file)timestamp:unit=yyyy|MM|dd|HH (the timestamp of the record)key (when used, other placeholders are not substituted) | | File name prefix | file.name.prefix | The prefix to be added to the name of each file put in S3. | | Output fields | format.output.fields | Fields to place into output files. Supported values are: 'key', 'value', 'offset', 'timestamp', and 'headers'. | | Value field encoding | format.output.fields.value.encoding | The type of encoding to be used for the value field. Supported values are: 'none' and 'base64'. | | Envelope for primitives | format.output.envelope | Specifies whether or not to enable additional JSON object wrapping of the actual value. | | Output file compression | file.compression.type | The compression type to be used for files put into S3. Supported values are: 'none' (default), 'gzip', 'snappy', and 'zstd'. | | Max records per file | file.max.records | The maximum number of records to put in a single file. Must be a non-negative number. 0 is interpreted as "unlimited", which is the default. In this case files are only flushed after file.flush.interval.ms. | | File flush interval milliseconds | file.flush.interval.ms | The time interval to periodically flush files and commit offsets. Value specified must be a non-negative number. Default is 60 seconds. 0 indicates that it is disabled. In this case, files are only flushed after reaching file.max.records record size. | | AWS S3 bucket check | aws.s3.bucket.check | If set to true (default), the connector will attempt to put a test file to the S3 bucket to validate access. | | AWS S3 part size bytes | s3.part.size | The part size in S3 multi-part uploads in bytes. Maximum is 2147483647 (2GB) and default is 5242880 (5MB). | | S3 retry backoff | aws.s3.backoff.delay.ms | S3 default base sleep time (in milliseconds) for non-throttled exceptions. Default is 100. | | S3 maximum back-off | aws.s3.backoff.max.delay.ms | S3 maximum back-off time (in milliseconds) before retrying a request. Default is 20000. | | S3 max retries | aws.s3.backoff.max.retries | Maximum retry limit (if the value is greater than 30, there can be integer overflow issues during delay calculation). Default is 3. | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` (`org.apache.kafka.connect.json.JsonConverter`) when your messages are JSON-encoded. Select `Message JSON contains schema`, with the `schema` and `payload` fields. - `AVRO` (`io.confluent.connect.avro.AvroConverter`) when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. - `STRING` (`org.apache.kafka.connect.storage.StringConverter`) when your messages contain textual data. - `BYTES` (`org.apache.kafka.connect.converters.ByteArrayConverter`) when your messages contain arbitrary data. You can also select the output data format for your S3 files as follows: - `CSV` to produce data in the `CSV` format. For `CSV` only, you can set `STRING` and `BYTES` input formats. - `JSON` to produce data in the `JSON` format as an array of record objects. - `JSONL` to produce data in the `JSON` format, each message as a separate JSON, one per line. - `PARQUET` to produce data in the `PARQUET` format when your messages contain schema. - `AVRO` to produce data in the `AVRO` format when your messages contain schema. ## [](#test-the-connection)Test the connection After the connector is created, test the connection by writing to one of your topics, then checking the contents of the S3 bucket in the AWS management console. Files should appear after the file flush interval (default is 60 seconds). ## [](#troubleshoot)Troubleshoot If there are any connection issues, an error message is returned. Depending on the `AWS S3 bucket check` property value, the error results in a failed connector (`AWS S3 bucket check = true`) or a failed task (`AWS S3 bucket check = false`). Select **Show Logs** to view error details. Additional errors and corrective actions follow. | Message | Action | | --- | --- | | The AWS Access Key Id you provided does not exist in our records | AWS access key ID is invalid. Check to confirm that a valid existing AWS access key is specified. | | The authorization header is malformed; the region us-east-1 is wrong; expecting us-east-2 | The selected region (AWS S3 region) of the AWS bucket is incorrect. Check to confirm that you have specified the region in which the bucket was created. | | The specified bucket does not exist | Create the bucket specified in the AWS S3 bucket name property, or provide the correct name of the existing bucket. | | No files in the S3 bucket | Be sure to wait until the connector completes the first file flush (default 60 seconds). Verify that the topics specified are correct. Then verify that the topics contain messages to be pushed to S3. | --- # Page 369: Create a Snowflake Sink Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-snowflake-connector.md --- # Create a Snowflake Sink Connector --- title: Create a Snowflake Sink Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-snowflake-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-snowflake-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-snowflake-connector.adoc description: Use the Redpanda Cloud UI to create a Snowflake Sink Connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use the Snowflake Sink connector to ingest and store Redpanda structured data into a Snowflake database for analytics and decision-making. ## [](#prerequisites)Prerequisites Before you can create a Snowflake Sink connector in the Redpanda Cloud, you must: 1. [Create a role](https://docs.snowflake.com/en/user-guide/kafka-connector-install#creating-a-role-to-use-the-kafka-connector) for use by Kafka Connect. 2. [Create a key pair](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) for authentication. 3. [Create a database](https://docs.snowflake.com/en/user-guide/getting-started-tutorial-create-objects#creating-a-database) to hold the data you intend to stream from Redpanda Cloud messages. ## [](#limitations)Limitations Refer to the [Snowflake Kafka Connector Limitations](https://docs.snowflake.com/en/user-guide/kafka-connector-overview#kafka-connector-limitations) documentation for details. ## [](#create-a-snowflake-sink-connector)Create a Snowflake Sink connector To create a Snowflake Sink connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Export to Snowflake**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topics to export | topics | A comma-separated list of the cluster topics you want to export to Snowflake. | | Topics regex | topics.regex | Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected. | | Snowflake URL name | snowflake.url.name | The Snowflake URL to be used for the connection. | | Snowflake database name | snowflake.database.name | The Snowflake database name to be used for the exported data. | | Snowflake user name | snowflake.user.name | The name of the user who created the key pair. | | Snowflake private key | snowflake.private.key | The private key name for the Snowflake user. | | Snowflake private key passphrase | snowflake.private.key.passphrase | (Optional) If created and encrypted, the passphrase of the private key. | | Snowflake role name | snowflake.role.name | The name of the role created in Prerequisites. | | Kafka message value format | value.converter | The format of the value in the Redpanda topic. The default is SNOWFLAKE_JSON. | | Max Tasks | tasks.max | Maximum number of tasks to use for this connector. The default is 1. Each task replicates exclusive set of partitions assigned to it. | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ### [](#advanced-snowflake-sink-connector-configuration)Advanced Snowflake Sink connector configuration In most instances, the preceding basic configuration properties are sufficient. If you require additional property settings, then specify any of the following _optional_ advanced connector configuration properties by selecting **Show advanced options** on the **Create Connector** page: | Property name | Property key | Description | | --- | --- | --- | | Snowflake schema name | snowflake.schema.name | The Snowflake database schema name. The default is PUBLIC. | | Snowflake ingestion method | snowflake.ingestion.method | The default, SNOWPIPE, allows for structured data, while SNOWPIPE_STREAMING is lower latency option. | | Snowflake topic2table map | snowflake.topic2table.map | (Optional) Map of topics to tables. Format is comma-separated tuples. For example, :,:. | | Buffer count records | buffer.count.records | Number of records buffered in memory per partition before triggering Snowflake ingestion. Default is 10000. | | Buffer flush time | buffer.flush.time | The time in seconds to flush cached data. Default is 120. | | Buffer size bytes | buffer.size.bytes | Cumulative size of records buffered in memory per partition before triggering Snowflake ingestion. Default is 5000000. | | Error tolerance | errors.tolerance | Error tolerance response during connector operation. Default value is none and signals that any error will result in an immediate connector task failure. Value of all changes the behavior to skip over problematic records. | | Dead letter queue topic name | errors.deadletterqueue.topic.name | The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. | | Dead letter queue topic replication factor | errors.deadletterqueue.topic .replication.factor | Replication factor used to create the dead letter queue topic when it doesn’t already exist. | | Enable error context headers | errors.deadletterqueue.context .headers.enable | When true, adds a header containing error context to the messages written to the dead letter queue. To avoid clashing with headers from the original record, all error context header keys, start with __connect.errors. | ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - `JSON` formatted records should use `SNOWFLAKE_JSON` (`com.snowflake.kafka.connector.records.SnowflakeJsonConverter`). - `AVRO` formatted records that use Kafka’s Schema Registry Service should use `SNOWFLAKE_AVRO` (`com.snowflake.kafka.connector.records.SnowflakeAvroConverter`). - `AVRO` formatted records that contain the schema (and therefore do not need Kafka’s Schema Registry Service) should use `SNOWFLAKE_AVRO_WITHOUT_SCHEMA_REGISTRY` (`com.snowflake.kafka.connector.records.SnowflakeAvroConverterWithoutSchemaRegistry`). - Plain text formatted records should use `STRING` (`org.apache.kafka.connect.storage.StringConverter`). ## [](#test-the-connection)Test the connection After the connector is created, verify in your Snowflake worksheet that your table is populated: SELECT \* FROM TEST.PUBLIC.TABLE\_NAME; It may take a couple of minutes for the records to be visible in Snowflake. ## [](#troubleshoot)Troubleshoot After submitting the connector for creation in Redpanda Console, the Snowflake Sink connector attempts to authenticate to the Snowflake database to validate the configuration. This validation must be successful before the connector is created. It can take up 10 seconds or more to respond. If the connector fails, check the error message or select **Show Logs** to view error details. Additional errors and corrective actions follow. | Message | Action | | --- | --- | | snowflake.url.name is not a valid snowflake url | Check to make sure Snowflake URL name contains a valid Snowflake URL. | | snowflake.user.name: Cannot connect to Snowflake | Check to make sure Snowflake user name contains a valid Snowflake user. | | snowflake.private.key must be a valid PEM RSA private key / java.lang.IllegalArgumentException: Last encoded character (before the padding, if any) is a valid base 64 alphabet but not a possible value. Expect the discarded bits to be zero. | Snowflake private key is invalid. Provide a valid key. | | snowflake.database.name+ database does not exist | Specify a valid database name in snowflake.database.name. | | Object does not exist, or operation cannot be performed | Snowflake error that can have several causes: an invalid role is being used, there is no existing Snowflake table, or an incorrect schema name is specified. Verify that the connector configuration and Snowflake settings are valid. | | Config:value.converter has provided value:com.snowflake.kafka.connector.records.SnowflakeJsonConverter. If ingestionMethod is:snowpipe_streaming, Snowflake Custom Converters are not allowed. | Use STRING for the Kafka message value format. | ## [](#suggested-reading)Suggested reading - For more about limitations, see [Kafka Connector Limitations](https://docs.snowflake.com/en/user-guide/kafka-connector-overview#kafka-connector-limitations) - For testing the connection, see [Using Worksheets for Queries / DML / DDL](https://docs.snowflake.com/en/user-guide/ui-worksheet) - For details about all Snowflake Sink connector properties, see [Kafka Configuration Properties](https://docs.snowflake.com/en/user-guide/kafka-connector-install#required-properties) --- # Page 370: Create a SQL Server (Debezium) Source Connector **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/create-sqlserver-connector.md --- # Create a SQL Server (Debezium) Source Connector --- title: Create a SQL Server (Debezium) Source Connector latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/create-sqlserver-connector page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/create-sqlserver-connector.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/create-sqlserver-connector.adoc description: Use the Redpanda Cloud UI to create a SQL Server (Debezium) Source Connector. page-git-created-date: "2024-10-03" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can use an SQL Server (Debezium) Source connector to import updates to Redpanda from SQL Server. ## [](#prerequisites)Prerequisites Before you can create an SQL Server (Debezium) Source connector in the Redpanda Cloud, you must: - Make the SQL Server (Debezium) database accessible from the connector instance. - Create a SQL Server (Debezium) user with the necessary permissions. ## [](#limitations)Limitations The SQL Server (Debezium) Source connector has the following limitations: - Only `JSON`, `CloudEvents` or `AVRO` formats can be used for a Kafka message key and value format. - SQL Server (Debezium) connector can work with only a single task at a time per database name. ## [](#create-an-sql-server-debezium-source-connector)Create an SQL Server (Debezium) Source connector To create the SQL Server (Debezium) Source connector: 1. In Redpanda Cloud, click **Connectors** in the navigation menu, and then click **Create Connector**. 2. Select **Import from SQL Server (Debezium)**. 3. On the **Create Connector** page, specify the following required connector configuration options: | Property name | Property key | Description | | --- | --- | --- | | Topic prefix | topic.prefix | A topic prefix that identifies and provides a namespace for the particular database server/cluster that is capturing changes. The topic prefix should be unique across all other connectors because it is used as a prefix for all Kafka topic names that receive events emitted by this connector. Only alphanumeric characters, hyphens, dots, and underscores are accepted. | | Hostname | database.hostname | A resolvable hostname or IP address of the SQL Server database server. | | Port | database.port | Integer port number of the SQL Server database server. | | User | database.user | Name of the SQL Server user to be used when connecting to the SQL Server database. | | Password | database.password | The password of the SQL Server database user who will be connecting to the SQL Server database. | | Database instance | database.instance | Specifies the instance name of the SQL Server named instance. If both database.port and database.instance are specified, database.instance is ignored. | | Databases | database.names | The comma-separated list of the SQL Server database names from which to stream the changes. | | Kafka message key format | key.converter | Format of the key in the Redpanda topic. | | Message key JSON contains schema | key.converter.schemas.enable | Enable to specify that the message key contains schema in the schema field. | | Kafka message value format | value.converter | Format of the value in the Redpanda topic. | | Message value JSON contains schema | value.converter.schemas.enable | Enable to specify that the message value contains schema in the schema field. | | Max tasks | tasks.max | The maximum number of tasks that the connector can use to capture data from the database instance. If the Databases list contains more than one element, you can increase the value of this property to a number less than or equal to the number of elements in the list. Default: 1 | | Connector name | name | Globally-unique name to use for this connector. | 4. Click **Next**. Review the connector properties specified, then click **Create**. ## [](#map-data)Map data Use the appropriate key or value converter (input data format) for your data as follows: - Use the `Include Schemas`, `Include Tables`, and `Include Columns` properties to define lists of columns, tables, and schemas to read from. Alternatively, use `Exclude Schemas`, `Exclude Tables`, and `Exclude Columns` to define lists of columns, tables, and schemas to exclude from sources list. - Use only `JSON` (`org.apache.kafka.connect.json.JsonConverter`), `AVRO` (`io.confluent.connect.avro.AvroConverter`), and `CloudEvents` (`io.debezium.converters.CloudEventsConverter`) formats for the Kafka message key and value format. ## [](#test-the-connection)Test the connection After the connector is created: 1. Open Redpanda Console, click the **Topics** tab, and select a topic. Check to confirm that it contains data migrated from SQL Server. Alternatively, run `rpk consume` to check the topic. 2. Click the **Connectors** tab to confirm that no issues have been reported for the connector. ## [](#suggested-reading)Suggested reading - [Debezium connector for SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html) --- # Page 371: Disable Kafka Connect **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/disable-kc.md --- # Disable Kafka Connect --- title: Disable Kafka Connect latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/disable-kc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/disable-kc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/disable-kc.adoc description: Learn how to disable Kafka Connect using the Cloud API. page-git-created-date: "2025-08-07" page-git-modified-date: "2025-08-20" --- Kafka Connect is disabled by default on new clusters. If you previously enabled Kafka Connect on a cluster and want to disable it, you can use the [Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview). > 📝 **NOTE** > > Redpanda Support does not manage or monitor Kafka Connect, but Support can enable the feature for your account. ## [](#verify-kafka-connect-is-enabled)Verify Kafka Connect is enabled If Kafka Connect is enabled on your cluster, you will see it configured on the **Connect** page in the Redpanda Cloud UI. You can also verify with the Cloud API: ```bash curl -sX GET "https://api.redpanda.com/v1/clusters/{cluster.id}" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H 'accept: application/json' | jq -r '.cluster.kafka_connect' ``` Replace `{cluster.id}` with your actual cluster ID. You can find the cluster ID in the Redpanda Cloud UI. Look in the **Details** section of the cluster overview. If Kafka Connect is enabled, the response will show: ```bash "enabled": true ``` ## [](#prerequisites)Prerequisites - You have the cluster ID of a cluster that has Kafka Connect enabled. - You have a valid bearer token for the Cloud API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > Make sure to stop any active connectors gracefully before disabling Kafka Connect to avoid data loss or incomplete processing. ## [](#disable-kafka-connect)Disable Kafka Connect After you are authenticated to the Cloud API, make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request, replacing `{cluster.id}` with your actual cluster ID. ```bash curl -X PATCH "https://api.redpanda.com/v1/clusters/{cluster.id}" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"kafka_connect":{"enabled":false}}' ``` The `PATCH` request returns the ID of a long-running operation. You can check the status of the operation by polling the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint: ```bash curl -X GET "https://api.redpanda.com/v1/operations/{operation.id}" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "Content-Type: application/json" ``` When the operation is complete, the status will show `"state": "STATE_COMPLETED"`. You can verify that Kafka Connect has been disabled by running the verification command from the previous section. The response should show: ```bash "enabled": false ``` --- # Page 372: Monitor Kafka Connect **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/monitor-connectors.md --- # Monitor Kafka Connect --- title: Monitor Kafka Connect latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/monitor-connectors page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/monitor-connectors.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/monitor-connectors.adoc description: Use metrics to monitor the health of Kafka Connect. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-07" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. You can monitor the health of Kafka Connect with metrics that Redpanda exports through a Prometheus HTTPS endpoint. You can use Grafana to visualize the metrics and set up alerts. The most important metrics to be monitored by alerts are: - connector failed tasks - connector lag / connector lag rate ## [](#view-connector-logs)View connector logs Connector logs are written to the system topic `__redpanda.connectors_logs`. You can view logs in Redpanda Cloud on the Topics page for your cluster, or you can download logs with `rpk`. For example: ```bash # Last 100 messages (most recent) rpk topic consume __redpanda.connectors_logs -o -100 -n 100 # Last 10 minutes rpk topic consume __redpanda.connectors_logs -o @-10m:end # Stream new logs only (like tail -f) rpk topic consume __redpanda.connectors_logs -o end # Filter by connector name rpk topic consume __redpanda.connectors_logs -o @-10m:end -O json \ | jq -r 'select(.message | test(""; "i"))' ``` > 📝 **NOTE** > > Access to system topics may be restricted by organization/project roles. Log retention follows cluster/system-topic policies and messages may expire. ## [](#limitations)Limitations The connectors dashboard renders metrics that are exported by managed connectors. However, when a connector does not create a task (for example, an empty topic list), the dashboard will not show metrics for that connector. --- # Page 373: Sizing Connectors **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/sizing-connectors.md --- # Sizing Connectors --- title: Sizing Connectors latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/sizing-connectors page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/sizing-connectors.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/sizing-connectors.adoc description: How to choose number of tasks to set for a connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. ## [](#connector-tasks)Connector tasks When you set up a connector, its main responsibility is to validate the configuration and spawn _connector tasks_, which perform the work. Setting up multiple tasks for a connector allows for parallelization of the work, resulting in higher throughputs. Before setting up connector tasks, consider the following: - For source connectors, the ability to add tasks to achieve higher throughput depends on the connector implementation and configuration. For many connectors, only a single connector task is allowed (for example, Debezium allows a single task only). When Redpanda Cloud does not offer an option to set the number of tasks, the source connector runs only one task. - For sink connectors, parallelism is achieved by evenly distributing configured topic partitions for the connector amongst connector tasks. The number of partitions must be equal to or greater than the number of tasks. ## [](#single-task-throughput)Single task throughput Connector throughput depends on many factors, including converters used, compression, message size, and the performance of external systems. As a rule of thumb, expect a single connector task to provide 1-2 MB/s of throughput. ## [](#specify-number-of-connector-tasks-for-a-sink-connector)Specify number of connector tasks for a sink connector It can be a challenge to determine the number of connector tasks to use for a given workload, so you must experiment to find the right number. Start with low number of connector tasks and wait a couple of minutes to view performance. Keep increasing the number of tasks until satisfactory throughput is achieved. Keep in mind that the underlying infrastructure must scale to provide room for additional connector tasks. Waiting roughly 10 minutes after each change should provide sufficient time for the system to scale up. --- # Page 374: Single Message Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/develop/managed-connectors/transforms.md --- # Single Message Transforms --- title: Single Message Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: managed-connectors/transforms page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: managed-connectors/transforms.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/managed-connectors/transforms.adoc description: Single Message Transforms (SMTs) let you modify the data and its characteristics as it passes through a connector. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-05" --- > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. Single Message Transforms (SMTs) help you modify data and its characteristics as it passes through a connector, without needing additional stream processors. Prior to using an SMT with production data, test the configuration on a smaller subset of data to verify the behavior of the SMT. ## [](#cast)Cast Cast SMT lets you change the data type of fields in a Redpanda message, updating the schema if one is present. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.Cast$Key`) or value (`org.apache.kafka.connect.transforms.Cast$Value`). ### [](#configuration)Configuration | Property key | Description | | --- | --- | | spec | Comma-separated list of field names and the type to which they should be cast; for example: my-field1:int32,my-field2:string. Allowed types are: `int8, int16, int32, int64, float32, float64, boolean, and string. | ### [](#example)Example "transforms": "Cast", "transforms.Cast.type": "org.apache.kafka.connect.transforms.Cast$Value", "transforms.Cast.spec": "price:float64" Before: {"price": 1234, "product\_id": "9987"} After: {"price": 1234.0,"product\_id": "9987"} ## [](#dropheaders)DropHeaders DropHeaders SMT removes one or more headers from each record. ### [](#configuration-2)Configuration | Property key | Description | | --- | --- | | headers | Comma-separated list of header names to drop. | ### [](#example-2)Example Sample configuration: "transforms": "DropHeader", "transforms.DropHeader.type": "org.apache.kafka.connect.transforms.DropHeaders", "transforms.DropHeader.headers": "source-id,conv-id" ## [](#eventrouter-debezium)EventRouter (Debezium) The outbox pattern is a way to safely and reliably exchange data between multiple (micro) services. An outbox pattern implementation avoids inconsistencies between a service’s internal state (as typically persisted in its database) and state in events consumed by services that need the same data. To implement the outbox pattern in a Debezium application, configure a Debezium connector to: - Capture changes in an outbox table - Apply the Debezium outbox EventRouter Single Message Transformation > 📝 **NOTE** > > EventRouter SMT is available for managed Debezium connectors only. ### [](#configuration-3)Configuration | Property key | Description | | --- | --- | | route.by.field | Specifies the name of a column in the outbox table. The default behavior is that the value in this column becomes a part of the name of the topic to which the connector emits the outbox messages. | | route.topic.replacement | Specifies the name of the topic to which the connector emits outbox messages. The default topic name is outbox.event. followed by the aggregatetype column value in the outbox table record. | | table.expand.json.payload | Specifies whether the JSON expansion of a String payload should be done. If no content is found, or if there’s a parsing error, the content is kept "as is". | | fields.additional.placement | Specifies one or more outbox table columns to add to outbox message headers or envelopes. Specify a comma-separated list of pairs. In each pair, specify the name of a column and whether you want the value to be in the header or the envelope. | | table.field.event.key | Specifies the outbox table column that contains the event key. When this column contains a value, the SMT uses that value as the key in the emitted outbox message. This is important for maintaining the correct order in Kafka partitions. | ### [](#example-3)Example Sample JSON configuration: "transforms": "outbox", "transforms.outbox.route.by.field": "type", "transforms.outbox.route.topic.replacement": "my-topic.${routedByValue}", "transforms.outbox.table.expand.json.payload": "true", "transforms.outbox.table.field.event.key": "aggregate\_id", "transforms.outbox.table.fields.additional.placement": "before:envelope", "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter" ### [](#suggested-reading)Suggested reading - [Debezium Outbox Event Router SMT](https://debezium.io/documentation/reference/stable/transformations/outbox-event-router.html) ## [](#extractfield)ExtractField ExtractField SMT pulls the specified field from a Struct when a schema is present, or a Map for schemaless data. Any null values are passed through unmodified. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.ExtractField$Key`) or value (`org.apache.kafka.connect.transforms.ExtractField$Value`). ### [](#configuration-4)Configuration | Property key | Description | | --- | --- | | field | Field name to extract. | ### [](#example-4)Example Sample configuration: "transforms": "ExtractField", "transforms.ExtractField.type": "org.apache.kafka.connect.transforms.ExtractField$Value", "transforms.ExtractField.field": "product\_id" Before: ```json {"product_id":9987,"price":1234} ``` After: ```json {"value":9987} ``` ## [](#filter)Filter Filter SMT drops all records, filtering them from subsequent transformations in the chain. This is intended to be used conditionally to filter out records matching (or not matching) a particular predicate. ### [](#configuration-5)Configuration | Property key | Description | | --- | --- | | predicate | Name of predicate filtering records. | ### [](#example-5)Example Sample configuration: "transforms": "Filter", "transforms.Filter.type": "org.apache.kafka.connect.transforms.Filter", "transforms.Filter.predicate": "IsMyTopic", "predicates": "IsMyTopic", "predicates.IsMyTopic.type": "org.apache.kafka.connect.transforms.predicates.TopicNameMatches", "predicates.IsMyTopic.pattern": "my-topic" ### [](#predicates)Predicates Managed connectors support the following predicates: #### [](#topicnamematches)TopicNameMatches `org.apache.kafka.connect.transforms.predicates.TopicNameMatches` - A predicate that is true for records with a topic name that matches the configured regular expression. | Property key | Description | | --- | --- | | pattern | A Java regular expression for matching against the name of a record’s topic. | #### [](#hasheaderkey)HasHeaderKey `org.apache.kafka.connect.transforms.predicates.HasHeaderKey` - A predicate that is true for records with at least one header with the configured name. | Property key | Description | | --- | --- | | name | The header name. | #### [](#recordistombstone)RecordIsTombstone `org.apache.kafka.connect.transforms.predicates.RecordIsTombstone` - A predicate that is true for records that are tombstones (that is, they have null values). ## [](#flatten)Flatten Flatten SMT flattens a nested data structure, generating names for each field by concatenating the field names at each level with a configurable delimiter character. Applies to Struct when a schema is present, or a Map for schemaless data. Array fields and their contents are not modified. The default delimiter is `.`. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.Flatten$Key`) or value (`org.apache.kafka.connect.transforms.Flatten$Value`). ### [](#configuration-6)Configuration | Property key | Description | | --- | --- | | delimiter | Delimiter to insert between field names from the input record when generating field names for the output record. | ### [](#example-6)Example "transforms": "flatten", "transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value", "transforms.flatten.delimiter": "." Before: ```json { "user": { "id": 10, "name": { "first": "Red", "last": "Panda" } } } ``` After: ```json { "user.id": 10, "user.name.first": "Red", "user.name.last": "Panda" } ``` ## [](#headerfrom)HeaderFrom HeaderFrom SMT moves or copies fields in the key or value of a record into that record’s headers. Corresponding elements of `fields` and `headers` together identify a field and the header it should be moved or copied to. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.HeaderFrom$Key`) or value (`org.apache.kafka.connect.transforms.HeaderFrom$Value`). ### [](#configuration-7)Configuration | Property key | Description | | --- | --- | | fields | Comma-separated list of field names in the record whose values are to be copied or moved to headers. | | headers | Comma-separated list of header names, in the same order as the field names listed in the fields configuration property. | | operation | Either move if the fields are to be moved to the headers (removed from the key/value), or copy if the fields are to be copied to the headers (retained in the key/value). | ### [](#example-7)Example "transforms": "HeaderFrom", "transforms.HeaderFrom.type": "org.apache.kafka.connect.transforms.HeaderFrom$Value", "transforms.HeaderFrom.fields": "id,last\_login\_ts", "transforms.HeaderFrom.headers": "user\_id,timestamp", "transforms.HeaderFrom.operation": "move" Before: - Record value: { "id": 11, "name": "Harry Wilson", "last\_login\_ts": 1715242380 } - Record header: { "conv\_id": "uier923" } After: - Record value: { "name": "Harry Wilson" } - Record header: { "conv\_id": "uier923", "user\_id": 11, "timestamp": 1715242380 } ## [](#hoistfield)HoistField HoistField SMT wraps data using the specified field name in a Struct when schema present, or a Map in the case of schemaless data. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.HoistField$Key`) or value (`org.apache.kafka.connect.transforms.HoistField$Value`). ### [](#configuration-8)Configuration | Property key | Description | | --- | --- | | field | Field name for the single field that will be created in the resulting Struct or Map. | ### [](#example-8)Example "transforms": "HoistField", "transforms.HoistField.type": "org.apache.kafka.connect.transforms.HoistField$Value", "transforms.HoistField.field": "name" Message: ```none Red Panda ``` After: ```none {"name":"Red"} {"name":"Panda"} ``` ## [](#insertfield)InsertField InsertField SMT inserts field(s) using attributes from the record metadata or a configured static value. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.InsertField$Key`) or value (`org.apache.kafka.connect.transforms.InsertField$Value`). ### [](#configuration-9)Configuration | Property key | Description | | --- | --- | | offset.field | Field name for Redpanda offset. | | partition.field | Field name for Redpanda partition. | | static.field | Field name for static data field. | | static.value | The static field value. | | timestamp.field | Field name for record timestamp. | | topic.field | Field name for Redpanda topic. | ### [](#example-9)Example Sample configuration: "transforms": "InsertField", "transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.InsertField.static.field": "cluster\_id", "transforms.InsertField.static.value": "19423" Before: ```json {"product_id":9987,"price":1234} ``` After: ```json {"price":1234,"cluster_id":"19423","product_id":9987} ``` ## [](#maskfield)MaskField MaskField SMT replaces the contents of fields in a record. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.MaskField$Key`) or value (`org.apache.kafka.connect.transforms.MaskField$Value`). ### [](#configuration-10)Configuration | Property key | Description | | --- | --- | | fields | Comma-separated list of fields to mask. | | replacement | Custom value replacement used to mask field values. | ### [](#example-10)Example "transforms": "MaskField", "transforms.MaskField.type": "org.apache.kafka.connect.transforms.MaskField$Value", "transforms.MaskField.fields": "metadata", "transforms.MaskField.replacement": "\*\*\*" Before: {"product\_id":9987,"price":1234,"metadata":"test"} After: {"metadata":"\*\*\*","price":1234,"product\_id":9987} ## [](#regexrouter)RegexRouter RegexRouter SMT updates the record topic using the configured regular expression and replacement string. Under the hood, the regex is compiled to a `java.util.regex.Pattern`. If the pattern matches the input topic, `java.util.regex.Matcher#replaceFirst()` is used with the replacement string to obtain the new topic. ### [](#configuration-11)Configuration | Property key | Description | | --- | --- | | regex | Regular expression to use for matching. | | replacement | Replacement string. | ### [](#example-11)Example This configuration snippet shows how to add the prefix `prefix_` to the beginning of a topic. "transforms": "AppendPrefix", "transforms.AppendPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.AppendPrefix.regex": ".\*", "transforms.AppendPrefix.replacement": "prefix\_$0" Before: `topic-name` After: `prefix_topic-name` ## [](#replacefield)ReplaceField ReplaceField SMT filters or renames fields in a Redpanda record. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.ReplaceField$Key`) or value (`org.apache.kafka.connect.transforms.ReplaceField$Value`). ### [](#configuration-12)Configuration | Property key | Description | | --- | --- | | exclude | Fields to exclude. This takes precedence over the fields to include. | | include | Fields to include. If specified, only these fields are used. | | renames | List of comma-separated pairs. For example: foo:bar,abc:xyz | ### [](#example-12)Example Sample configuration: "transforms": "ReplaceField", "transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.ReplaceField.renames": "product\_id:item\_number" Before: ```json {"product_id":9987,"price":1234} ``` After: ```json {"item_number":9987,"price":1234} ``` ## [](#replacetimestamp-redpanda)ReplaceTimestamp (Redpanda) ReplaceTimestamp (Redpanda) SMT is designed to support using a record key/value field as a record timestamp, which then can be used to partition data with an S3 connector. Use the concrete transformation type designed for the record key (`com.redpanda.connectors.transforms.ReplaceTimestamp$Key`) or value (`com.redpanda.connectors.transforms.ReplaceTimestamp$Value`). > 📝 **NOTE** > > ReplaceTimestamp is available for Sink connector only. ### [](#configuration-13)Configuration | Property key | Description | | --- | --- | | field | Specifies the name of a field to be used as a source of timestamp. | ### [](#example-13)Example To use `my-timestamp` field as a source of the timestamp for the record, update a connector config with: "transforms": "ReplaceTimestamp", "transforms.ReplaceTimestamp.type": "com.redpanda.connectors.transforms.ReplaceTimestamp$Value", "transforms.ReplaceTimestamp.field": "my-timestamp" for messages in a format: { "name": "my-name", ... "my-timestamp": 1707928150868, ... } The SMT needs structured data to be able to extract the field from it, which means either a Map in the case of schemaless data, or a Struct when a schema is present. The timestamp value should be of a numeric type (epoch millis), or a Java Date object (which is the case when using `"connect.name":"org.apache.kafka.connect.data.Timestamp"` in schema). ## [](#schemaregistryreplicator-redpanda)SchemaRegistryReplicator (Redpanda) SchemaRegistryReplicator (Redpanda) SMT is a transform to replicate schemas. > 📝 **NOTE** > > SchemaRegistryReplicator SMT is designed to be used with the MirrorMaker2 connector only. To use it, remove the `_schema` topic from the topic exclude list. ### [](#example-14)Example Sample configuration: "transforms": "schema-replicator", "transforms.schema-replicator.type": "com.redpanda.connectors.transforms.SchemaRegistryReplicator" ## [](#setschemametadata)SetSchemaMetadata SetSchemaMetadata SMT sets the schema name, version, or both on the record’s key (`org.apache.kafka.connect.transforms.SetSchemaMetadata$Key`) or value (`org.apache.kafka.connect.transforms.SetSchemaMetadata$Value`) schema. ### [](#configuration-14)Configuration | Property key | Description | | --- | --- | | schema.name | Schema name to set. | | schema.version | Schema version to set. | ### [](#example-15)Example Sample configuration: "transforms": "SetSchemaMetadata", "transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value", "transforms.SetSchemaMetadata.schema.name": "transaction-value" "transforms.SetSchemaMetadata.schema.version": "3" ## [](#timestampconverter)TimestampConverter TimestampConverter SMT converts timestamps between different formats, such as Unix epoch, strings, and Connect Date/Timestamp types. It applies to individual fields or to the entire value. Use the concrete transformation type designed for the record key (`org.apache.kafka.connect.transforms.TimestampConverter$Key`) or value (`org.apache.kafka.connect.transforms.TimestampConverter$Value`). ### [](#configuration-15)Configuration | Property key | Description | | --- | --- | | field | The field containing the timestamp, or empty if the entire value is a timestamp. Default: "". | | target.type | The desired timestamp representation: string, unix, Date, Time, or Timestamp. | | format | A SimpleDateFormat-compatible format for the timestamp. Used to generate the output when target.type=string or used to parse the input if the input is a string. Default: "". | | unix.precision | The desired Unix precision for the timestamp: seconds, milliseconds, microseconds, or nanoseconds. Used to generate the output when type=unix or used to parse the input if the input is a Long. Note: This SMT causes precision loss during conversions from, and to, values with sub-millisecond components. Default: milliseconds. | ### [](#example-16)Example Sample configuration: "transforms": "TimestampConverter", "transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.TimestampConverter.field": "last\_login\_date", "transforms.TimestampConverter.format": "yyyy-MM-dd", "transforms.TimestampConverter.target.type": "string" Before: `1702041416` After: `2023-12-08` ## [](#timestamprouter)TimestampRouter TimestampRouter SMT updates the record’s topic field as a function of the original topic value and the record timestamp. This is mainly useful for sink connectors, because the topic field is often used to determine the equivalent entity name in the destination system (for example, a database table or search index name). > 📝 **NOTE** > > TimestampRouter SMT should be used with sink connectors only. ### [](#configuration-16)Configuration | Property key | Description | | --- | --- | | topic.format | Format string that can contain ${topic} and ${timestamp} as placeholders for the topic and timestamp, respectively. | | timestamp.format | Format string for the timestamp that is compatible with java.text.SimpleDateFormat. | ### [](#example-17)Example Sample configuration: "transforms": "router", "transforms.router.type": "org.apache.kafka.connect.transforms.TimestampRouter", "transforms.router.topic.format": "${topic}\_${timestamp}", "transforms.router.timestamp.format": "YYYY-MM-dd" ## [](#valuetokey)ValueToKey ValueToKey SMT replaces the record key with a new key formed from a subset of fields in the record value. ### [](#configuration-17)Configuration | Property key | Description | | --- | --- | | fields | Comma-separated list of field names on the record value to extract as the record key. | ### [](#example-18)Example Sample configuration: "transforms": "valueToKey", "transforms.valueToKey.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.valueToKey.fields": "txn-id" ## [](#error-handling)Error handling By default, `Error tolerance` is set to `NONE`, so SMTs fail for any exception (notably, data parsing or data processing errors). To avoid the connector crashing for data issues, set `Error tolerance` to `ALL`, and specify `Dead Letter Queue Topic Name` as a place where failed messages are redirected. --- # Page 375: Produce Data **URL**: https://docs.redpanda.com/redpanda-cloud/develop/produce-data.md --- # Produce Data --- title: Produce Data latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: produce-data/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: produce-data/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/produce-data/index.adoc description: Learn how to configure producers and idempotent producers. page-git-created-date: "2024-07-25" page-git-modified-date: "2024-08-01" --- - [Configure Producers](configure-producers/) Learn about configuration options for producers, including write caching and acknowledgment settings. - [Idempotent Producers](idempotent-producers/) Idempotent producers assign a unique ID to every write request, guaranteeing that each message is recorded only once in the order in which it was sent. - [Configure Leader Pinning](leader-pinning/) Learn about Leader Pinning and how to configure a preferred partition leader location based on cloud availability zones or regions. --- # Page 376: Configure Producers **URL**: https://docs.redpanda.com/redpanda-cloud/develop/produce-data/configure-producers.md --- # Configure Producers --- title: Configure Producers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: produce-data/configure-producers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: produce-data/configure-producers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/produce-data/configure-producers.adoc description: Learn about configuration options for producers, including write caching and acknowledgment settings. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Producers are client applications that write data to Redpanda in the form of events. Producers communicate with Redpanda through the Kafka API. When a producer publishes a message to a Redpanda cluster, it sends it to a specific partition. Every event consists of a key and value. When selecting which partition to produce to, if the key is blank, then the producer publishes in a round-robin fashion between the topic’s partitions. If a key is provided, then the partition hashes the key using the murmur2 algorithm and modulates across the number of partitions. ## [](#producer-acknowledgment-settings)Producer acknowledgment settings The `acks` property sets the number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. Redpanda guarantees data safety with fsync, which means flushing to disk. - With `acks=all`, every write is fsynced by default. - With `write.caching` enabled at the topic level, Redpanda fsyncs to disk according to `flush.ms` and `flush.bytes`, whichever is reached first. ### [](#acks0)`acks=0` The producer doesn’t wait for acknowledgments from the leader and doesn’t retry sending messages. This increases throughput and lowers latency of the system at the expense of durability and data loss. This option allows a producer to immediately consider a message acknowledged when it is sent to the Redpanda broker. This means that a producer does not have to wait for any response from the Redpanda broker. This is the least safe option, because a leader-broker crash can cause data loss if the data has not yet replicated to the other brokers in the replica set. However, this setting is useful when you want to optimize for the highest throughput and are willing to risk some data loss. Because of the lack of guarantees, this setting is the most network bandwidth-efficient. This is helpful for use cases like IoT/sensor data collection, where updates are periodic or stateless and you can afford some degree of data loss, but you want to gather as much data as possible in a given time interval. ### [](#acks1)`acks=1` The producer waits for an acknowledgment from the leader, but it doesn’t wait for the leader to get acknowledgments from followers. This setting doesn’t prioritize throughput, latency, or durability. Instead, `acks=1` attempts to provide a balance between all of them. Replication is not guaranteed with this setting because it happens in the background, after the leader broker sends an acknowledgment to the producer. This setting could result in data loss if the leader broker crashes before any followers manage to replicate the message or if a majority of replicas go down at the same time before fsyncing the message to the disk. ### [](#acksall)`acks=all` The producer receives an acknowledgment after the majority of (implicitly, all) replicas acknowledge the message. Redpanda guarantees data safety by fsyncing every message to disk before acknowledgement back to clients. This increases durability at the expense of lower throughput and increased latency. Sometimes referred to as `acks = -1`, this option instructs the broker that replication is considered complete when the message has been replicated (and fsynced) to the majority of the brokers responsible for the partition in the cluster. As soon as the fsync call is complete, the message is considered acknowledged and is made visible to readers. > 📝 **NOTE** > > This property has an important distinction compared to Kafka’s behavior. In Kafka, a message is considered acknowledged without the requirement that it has been fsynced. Messages that have not been fsynced to disk may be lost in the event of a broker crash. So when using `acks=all`, the Redpanda default configuration is more resilient than Kafka’s. You can also consider using write caching, which is a relaxed mode of `acks=all` that acknowledges a message as soon as it is received and acknowledged on a majority of brokers, without waiting for it to fsync to disk. This provides lower latency while still ensuring that a majority of brokers acknowledge the write. ### [](#retries)`retries` This property controls the number of times a message is re-sent to the broker if the broker fails to acknowledge it. This is essentially the same as if the client application resends the erroneous message after receiving an error response. The default value of `retries` in most client libraries is 0. This means that if the send fails, the message is not re-sent at all. If you increase this to a higher value, check the `max.in.flight.requests.per.connection` value as well, because leaving that property at its default value can potentially cause ordering issues in the target topic where the messages arrive. This occurs if two batches are sent to a single partition and the first fails and is retired, but the second succeeds so the records in the second batch may appear first. ### [](#max-in-flight-requests-per-connection)`max.in.flight.requests.per.connection` This property controls how many unacknowledged messages can be sent to the broker simultaneously at any given time. The default value is 5 in most client libraries. If you set this to 1, then the producer does not send any more messages until the previous one is either acknowledged or an error happens, which can prompt a retry. If you set this to a value higher than 1, then the producer sends more messages at the same time, which can help increase throughput but adds a risk of message reordering if retries are enabled. When you configure the producer to be [idempotent](../idempotent-producers/), up to five requests can be guaranteed to be in flight with the order preserved. ### [](#enable-idempotence)`enable.idempotence` To enable idempotence, set `enable.idempotence` to `true` (the default) in your Redpanda configuration. When idempotence is enabled, the producer ensures that exactly one copy of every message is written to the broker. When set to `false`, the producer retries sending a message for any reason (such as transient errors like brokers not being available or not enough replicas exception), and it can lead to duplicates. In most client libraries `enable.idempotence` is set to true by default. Internally, this is implemented using a special identifier that is assigned to every producer (the producer ID or PID). This ID, along with a sequence number, is included in every message sent to the broker. The broker checks if the PID/sequence number combination is larger than the previous one and, if not, it discards the message. To guarantee true idempotent behavior, you must also set `acks=all` to ensure that all brokers record messages in order, even in the event of node failures. In this configuration, both the producer and the broker prefer safety and durability over throughput. Idempotence is only guaranteed within a session. A session starts after the producer is instantiated and a connection is established between the client and the Redpanda broker. When the connection is closed, the session ends. If your application code retries a request, the producer client assigns a new ID to that request, which may lead to duplicate messages. ## [](#message-batching)Message batching Batching is an efficient way to save on both network bandwidth and disk size, because messages can be compressed easier. When a producer prepares to send messages to a broker, it first fills up a buffer. When this buffer is full, the producer compresses (if instructed to do so) and sends out this batch of messages to the broker. The number of batches that can be sent in a single request to the broker is limited by the `max.request.size` property. The number of requests that can simultaneously be in this sending state is controlled by the `max.in.flight.requests.per.connection` value, which defaults to 5 in most client libraries. Tune the batching configuration with the following properties: ### [](#buffer-memory)`buffer.memory` This property controls the total amount of memory available to the producer for buffering. If messages are sent faster than they can be delivered to the broker, the producer application may run out of memory, which causes it to either block subsequent send calls or throw an exception. The `max.block.ms` property controls the amount of time the producer blocks before throwing an exception if it cannot immediately send messages to the broker. ### [](#batch-size)`batch.size` This property controls the maximum size of coupled messages that can be batched together in one request. The producer automatically puts messages being sent to the same partition into one batch. This configuration property is given in bytes, as opposed to the number of messages. When the producer is gathering messages to assign to a batch, at some point it hits this byte-size limit, which triggers it to send the batch to the broker. However, the producer does not necessarily wait (for as much time as set using `linger.ms`) until the batch is full. Sometimes, it can even send single-message batches. This means that setting the batch size too large is not necessarily undesirable, because it won’t cause throttling when sending messages; rather, it only causes increased memory usage. Conversely, setting the batch size too small can cause the producer to send batches of messages faster, which can cause network overhead, meaning a reduced throughput. The default value is usually 16384, but you can set this as low as 0, which turns off batching entirely. ### [](#linger-ms)`linger.ms` This property controls the maximum amount of time the producer waits before sending out a batch of messages, if it is not already full. This means you can somewhat force the producer to make sure that batches are filled as efficiently as possible. If you’re willing to tolerate some latency, setting this value to a number larger than the default of `0` causes the producer to send fewer, more efficient batches of messages. If you set the value to `0`, there is still a high chance messages arrive around the same time to be batched together. ## [](#common-producer-configurations)Common producer configurations ### [](#compression-type)`compression.type` This property controls how the producer should compress a batch of messages before sending it to the broker. The default is `none`, which means the batch of messages is not compressed at all. Compression occurs on full batches, so you can improve batching throughput by setting this property to use one of the available compression algorithms (along with increasing batch size). The available options are: `zstd`, `lz4`, `gzip`, and `snappy`. ### [](#serializers)Serializers Serializers are responsible for converting a message to a byte array. You can influence the speed/memory efficiency of your streaming setup by choosing one of the built-in serializers or writing a custom one. The performance consequences of using serializers is not typically significant. For example, if you opt for the JSON serializer, you have more data to transport with each message because every record contains its schema in a verbose format, which impacts your compression speeds and network throughput. Alternatively, going with AVRO or Protobuf allows you to only define the schema in one place, while also enabling features like schema evolution. ## [](#broker-timestamps)Broker timestamps Redpanda employs a unique strategy to help ensure the accuracy of retention operations. In this strategy, closed segments are only eligible for deletion when the age of all messages in the segment exceeds a configured threshold. However, when a producer sends a message to a topic, the timestamp set by the producer may not accurately reflect the time the message reaches the broker. To address this time skew, each time a producer sends a message to a topic, Redpanda records the broker’s system date and time in the `broker_timestamp` property of the message. This property helps maintain accurate retention policies, even when the message’s creation timestamp deviates from the broker’s time. > 📝 **NOTE** > > Clock synchronization should be monitored by the server owner, as Redpanda does not monitor clock synchronization. While Redpanda does not rely on clocks for correctness, if you are using `LogAppendTime` (server timestamp set by Redpanda), server clocks may affect the time your application sees. ## [](#producer-optimization-strategies)Producer optimization strategies You can optimize for speed (throughput and latency) or safety (durability and availability) by adjusting properties. Finding the optimal configuration depends on your use case. There are many configuration options within Redpanda. The configuration options mentioned here work best when combined with other broker and consumer configuration options. See also: - [Consumer Offsets](../../consume-data/consumer-offsets/) ### [](#optimize-for-speed)Optimize for speed To get data into Redpanda as quickly as possible, you can maximize latency and throughput in a variety of ways: - Experiment with [acks](#producer-acknowledgment-settings) settings. The quicker a producer receives a reply from the broker that the message has been committed, the sooner it can send the next message, which generally results in higher throughput. Hence, if you set `acks=1`, then the leader broker does not need to wait for replication to occur, and it can reply as soon as it finishes committing the message. This can result in less durability overall. - Enable [write caching](#Write caching), which acknowledges a message as soon as it is received and acknowledged on a majority of brokers, without waiting for it to fsync to disk. This provides lower latency while still ensuring that a majority of brokers acknowledge the write. - Experiment with other component’s properties, like the topic partition size. - Explore how the producer batches messages. Increasing the value of `batch.size` and `linger.ms` can increase throughput by making the producer add more messages into one batch before sending it to the broker and waiting until the batches can properly fill up. This approach negatively impacts latency though. By contrast, if you set `linger.ms` to `0` and `batch.size` to `1`, you can achieve lower latency, but sacrifice throughput. ### [](#optimize-for-safety)Optimize for safety For applications where you must guarantee that there are no lost messages, duplicates, or service downtime, you can use higher durability `acks` settings. If you set `acks=all`, then the producer waits for a majority of replicas to acknowledge the message before it can send the next message, resulting in lower latency, because there is more communication required between brokers. This approach can guarantee higher durability because the message is replicated to all brokers. You can also increase durability by increasing the number of retries the broker can make in case messages are not delivered successfully. The trade-off is that duplicates may enter the system and potentially alter the ordering of messages. --- # Page 377: Idempotent Producers **URL**: https://docs.redpanda.com/redpanda-cloud/develop/produce-data/idempotent-producers.md --- # Idempotent Producers --- title: Idempotent Producers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: produce-data/idempotent-producers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: produce-data/idempotent-producers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/produce-data/idempotent-producers.adoc description: Idempotent producers assign a unique ID to every write request, guaranteeing that each message is recorded only once in the order in which it was sent. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- When a producer writes messages to a topic, each message should be recorded only once in the order in which it was sent. However, network issues such as a connection failure can result in a timeout, which prevents a write request from succeeding. In such cases, the client retries the write request until one of these events occurs: - The client receives an acknowledgment from the broker that the write was successful. - The retry limit is reached. - The message delivery timeout limit is reached. Since there is no way to tell if the initial write request succeeded before the disruption, a retry can result in a duplicate message. A retry can also cause subsequent messages to be written out of order. Idempotent producers prevent this problem by assigning a unique ID to every write request. The request ID consists of the producer ID and a sequence number. The sequence number identifies the order in which each write request was sent. If a retry results in a duplicate message, Redpanda detects and rejects the duplicate message and maintains the original order of the messages. If new write requests continue while a previous request is being retried, the new requests are stored in the client’s memory in the order in which they were sent. The client must also retry these requests once the previous request is successful. ## [](#enable-idempotence-for-producers)Enable idempotence for producers To make producers idempotent, the `enable.idempotence` property must be set to `true` in your producer configuration, as well as in the Redpanda cluster configuration, where it is set to `true` by default. Some Kafka clients have `enable.idempotence` set to `false` by default. In this case, set the property to `true` by following the instructions for your particular client. Idempotence is guaranteed within a session. A session starts once a producer is created and a connection is established between the client and the Kafka broker. > 📝 **NOTE** > > Idempotent producers retry unsuccessful write requests automatically. If you manually retry a write request, the client will assign a new ID to that request, which may lead to duplicate messages. --- # Page 378: Configure Leader Pinning **URL**: https://docs.redpanda.com/redpanda-cloud/develop/produce-data/leader-pinning.md --- # Configure Leader Pinning --- title: Configure Leader Pinning latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: produce-data/leader-pinning page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: produce-data/leader-pinning.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/produce-data/leader-pinning.adoc description: Learn about Leader Pinning and how to configure a preferred partition leader location based on cloud availability zones or regions. learning-objective-1: Configure preferred partition leader placement using rack labels learning-objective-2: Configure ordered rack preference for priority-based leader failover learning-objective-3: Identify conditions where Leader Pinning cannot place leaders in preferred racks page-git-created-date: "2024-12-04" page-git-modified-date: "2026-03-31" --- Produce requests that write data to Redpanda topics are routed through the topic partition leader, which syncs messages across its follower replicas. For a Redpanda cluster deployed across multiple availability zones (AZs), Leader Pinning ensures that a topic’s partition leaders are geographically closer to clients, which helps decrease networking costs and guarantees lower latency. If consumers are located in the same preferred region or AZ for Leader Pinning, and you have not set up [follower fetching](../../consume-data/follower-fetching/), Leader Pinning can also help reduce networking costs on consume requests. After reading this page, you will be able to: - Configure preferred partition leader placement using rack labels - Configure ordered rack preference for priority-based leader failover - Identify conditions where Leader Pinning cannot place leaders in preferred racks ## [](#set-leader-rack-preferences)Set leader rack preferences Configure Leader Pinning if you have Redpanda deployed in a multi-AZ or multi-region cluster and your ingress is concentrated in a particular AZ or region. Use the topic configuration property `redpanda.leaders.preference` to configure Leader Pinning for individual topics. The property accepts the following string values: - `none`: Disable Leader Pinning for the topic. - `racks:[,,…​]`: Specify the preferred location (rack) of all topic partition leaders. The list can contain one or more racks, and you can list the racks in any order. Spaces in the list are ignored, for example: `racks:rack1,rack2` and `racks: rack1, rack2` are equivalent. You cannot specify empty racks, for example: `racks: rack1,,rack2`. If you specify multiple racks, Redpanda tries to distribute the partition leader locations equally across brokers in these racks. - `ordered_racks:[,,…​]`: Supported in Redpanda v26.1 or later. Specify the preferred racks in priority order. Redpanda places leaders in the first listed rack when available, failing over to each subsequent rack when higher-priority racks are unavailable. If all listed racks are unavailable, leaders fall back to any other available brokers. Brokers with no rack assignment are treated as lowest priority. To find the rack identifiers of all brokers, run: ```bash rpk cluster info ``` Expected output ```bash CLUSTER ======= redpanda.be267958-279d-49cd-ae86-98fc7ed2de48 BROKERS ======= ID HOST PORT RACK 0* 54.70.51.189 9092 us-west-2a 1 35.93.178.18 9092 us-west-2b 2 35.91.121.126 9092 us-west-2c ``` To set the topic property: ```bash rpk topic alter-config --set redpanda.leaders.preference=ordered_racks:, ``` If there is more than one broker in the preferred AZ (or AZs), Leader Pinning distributes partition leaders uniformly across brokers in the AZ. ## [](#limitations)Limitations Leader Pinning controls which replica is elected as leader, and does not move replicas to different brokers. If all of a topic’s replicas are on brokers in non-preferred racks, no replica exists in the preferred racks to elect as leader, and Redpanda may elect a non-preferred leader indefinitely. For example, consider a cluster deployed across four racks (A, B, C, D) with Leader Pinning configured as `ordered_racks:A,B,C,D`. With a replication factor of 3, rack awareness can only place replicas in three of the four racks. If the highest-priority rack (A) does not receive a replica, no replica exists there to elect as leader, and Redpanda may elect a non-preferred leader indefinitely. To prevent this scenario, ensure the topic’s replication factor at least equals the total number of racks in the cluster, so every rack, including the highest-priority rack, receives a replica. ## [](#leader-pinning-failover-across-availability-zones)Leader Pinning failover across availability zones If there are three AZs: A, B, and C, and A becomes unavailable, the failover behavior with `racks` is as follows: - The topic with `A` as the preferred leader AZ will have its partition leaders uniformly distributed across B and C. - The topic with `A,B` as the preferred leader AZs will have its partition leaders in B. - The topic with `B` as the preferred leader AZ will have its partition leaders in B as well. ### [](#failover-with-ordered-rack-preference)Failover with ordered rack preference With `ordered_racks`, the failover order follows the configured priority list. Leaders move to the next available rack in the list when higher-priority racks become unavailable. For a topic configured with `ordered_racks:A,B,C`: - The topic with `A` as the first-priority rack will have its partition leaders in A. - If A becomes unavailable, leaders move to B. - If A and B become unavailable, leaders move to C. - If A, B, and C all become unavailable, leaders fall back to any available brokers. If a higher-priority rack recovers and the topic’s replication factor ensures that rack receives a replica, Redpanda automatically moves leaders back to the highest available preferred rack. ## [](#suggested-reading)Suggested reading - For latency-tolerant, high-throughput workloads where cross-AZ networking charges are a major cost driver, also consider [Cloud Topics](../../topics/cloud-topics/) - [Follower Fetching](../../consume-data/follower-fetching/) --- # Page 379: Topics **URL**: https://docs.redpanda.com/redpanda-cloud/develop/topics.md --- # Topics --- title: Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: topics/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: topics/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/topics/index.adoc description: Overview of standard topics in Redpanda Cloud. page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- - [Topics Overview](create-topic/) Learn how to create a topic for a Redpanda Cloud cluster. - [Manage Topics](config-topics/) Learn how to create topics, update topic configurations, and delete topics or records. - [Manage Cloud Topics](cloud-topics/) Cloud Topics are Redpanda topics that enable users to trade off latency for lower costs. --- # Page 380: Manage Cloud Topics **URL**: https://docs.redpanda.com/redpanda-cloud/develop/topics/cloud-topics.md --- # Manage Cloud Topics --- title: Manage Cloud Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: topics/cloud-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: topics/cloud-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/topics/cloud-topics.adoc description: Cloud Topics are Redpanda topics that enable users to trade off latency for lower costs. page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- Starting in v26.1, Redpanda provides [Cloud Topics](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#cloud-topic) to support multi-modal streaming workloads in the most cost-effective way possible: as a per-topic configuration running mixed latency workloads. While standard Redpanda [topics](../config-topics/) that use local storage or Tiered Storage are ideal for latency-sensitive workloads (for example, for audit logs or analytics), Cloud Topics are optimized for latency-tolerant, high-throughput workloads where cross-AZ networking charges are a major consideration that can become the dominant cost driver at high throughput. These workloads can include observability streams, offline analytics, AI/ML model training data feeds, or development environments that have flexible latency requirements. Instead of replicating every byte across expensive network links, Cloud Topics leverage durable, inexpensive cloud storage (S3, ADLS, GCS, MinIO) as the primary mechanism to both replicate data and serve it to consumers. This eliminates over 90% of the cost of replicating data over network links in multi-AZ clusters. The end-to-end latency experienced when using Cloud Topics can range from 500 ms to as high as a few seconds with different object stores. Lower latencies may be achievable in certain environments, but Cloud Topics is optimized for throughput rather than low latency or tightly constrained tail latency. This latency profile is often acceptable for many streaming workloads, and can unlock new streaming use cases that previously were not cost effective. With Cloud Topics, data from the client is not acknowledged until it is uploaded to object storage. This maintains durability in the face of infrastructure failures, but results in an increase in both produce latency and end-to-end latency, driven by both batching of produced data and the inherent latency of the underlying object store. You should generally expect end-to-end latencies of 1-2 seconds with public cloud stores. ## [](#prerequisites)Prerequisites - [Install rpk](../../../manage/rpk/rpk-install/) v26.1 or later. ## [](#limitations)Limitations - Shadow links do not currently support Cloud Topics. - Once created, a Cloud Topic cannot be converted back to a standard Redpanda topic that uses local or Tiered Storage. Conversely, existing topics created as local or Tiered Storage topics cannot be converted to Cloud Topics. ## [](#enable-cloud-topics)Enable Cloud Topics To enable Cloud Topics for a cluster: ```bash rpk cluster config set cloud_topics_enabled=true ``` > 📝 **NOTE** > > This configuration update requires a restart to take effect. After enabling Cloud Topics, you can proceed to create new Cloud Topics: ```bash rpk topic create -c redpanda.storage.mode=cloud ``` ```console TOPIC STATUS audit.analytics.may2025 OK ``` You can make a topic a Cloud Topic only at topic creation time. In addition to replication, cross-AZ ingress (producer) and egress (consumer) traffic can also contribute substantially to cloud networking costs. When running multi-AZ clusters in general, Redpanda strongly recommends using [Follower Fetching](../../consume-data/follower-fetching/), which allows consumers to avoid crossing network zones. When possible, you can use [leader pinning](../../produce-data/leader-pinning/), which positions a topic’s partition leader close to the producers, providing a similar benefit for ingress traffic. These features can add additional savings to the replication cost savings of Cloud Topics. --- # Page 381: Manage Topics **URL**: https://docs.redpanda.com/redpanda-cloud/develop/topics/config-topics.md --- # Manage Topics --- title: Manage Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: topics/config-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: topics/config-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/topics/config-topics.adoc description: Learn how to create topics, update topic configurations, and delete topics or records. page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- Topics provide a way to organize events in a data streaming platform. ## [](#create-a-topic)Create a topic Creating a topic can be as simple as specifying a name for your topic on the command line. For example, to create a topic named `xyz`, run: ```bash rpk topic create xyz ``` This command creates a topic named `xyz` with one partition and three replicas, because these are the default values set in the cluster configuration file. Replicas are copies of partitions that are distributed across different brokers, so if one broker goes down, other brokers still have a copy of the data. Redpanda Cloud supports 40,000 topics per cluster. ### [](#choose-the-number-of-partitions)Choose the number of partitions A partition acts as a log file where topic data is written. Dividing topics into partitions allows producers to write messages in parallel and consumers to read messages in parallel. The higher the number of partitions, the greater the throughput. > 💡 **TIP** > > As a general rule, select a number of partitions that corresponds to the maximum number of consumers in any consumer group that will consume the data. For example, suppose you plan to create a consumer group with 10 consumers. To create topic `xyz` with 10 partitions, run: ```bash rpk topic create xyz -p 10 ``` ## [](#update-topic-configurations)Update topic configurations After you create a topic, you can update the topic property settings for all new data written to it. For example, you can add partitions or change the cleanup policy. ### [](#add-partitions)Add partitions You can assign a certain number of partitions when you create a topic, and add partitions later. For example, suppose you add brokers to your cluster, and you want to take advantage of the additional processing power. To increase the number of partitions for existing topics, run: ```bash rpk topic add-partitions [TOPICS...] --num [#] ``` Note that `--num <#>` is the number of partitions to _add_, not the total number of partitions. > 📝 **NOTE** > > If a topic already has messages and you add partitions, the existing messages won’t be redistributed to the new partitions. If you require messages to be redistributed, then you must create a new topic with the new partition count, then stream the messages from the old topic to the new topic so they are appropriately distributed according to the new partition hashing. ### [](#change-the-cleanup-policy)Change the cleanup policy The cleanup policy determines how to clean up the partition log files when they reach a certain size: - `delete` deletes data based on age or log size. Topics retain all records until then. - `compact` compacts the data by only keeping the latest values for each KEY. - `compact,delete` combines both methods. Unlike compacted topics, which keep only the most recent message for a given key, topics configured with a `delete` cleanup policy provide a running history of all changes for those topics. > ⚠️ **WARNING** > > All topic properties take effect immediately after being set. Do not modify properties on internal Redpanda topics (such as `__consumer_offsets`, `_schemas`, or other system topics) as this can cause cluster instability. For example, to change a topic’s policy to `compact`, run: ```bash rpk topic alter-config [TOPICS…] —-set cleanup.policy=compact ``` ### [](#configure-write-caching)Configure write caching Write caching is a relaxed mode of [`acks=all`](../../produce-data/configure-producers/#acksall) that provides better performance at the expense of durability. It acknowledges a message as soon as it is received and acknowledged on a majority of brokers, without waiting for it to be written to disk. This provides lower latency while still ensuring that a majority of brokers acknowledge the write. Write caching applies to user topics. It does not apply to transactions or consumer offsets: data written in the context of a transaction and consumer offset commits is always written to disk and fsynced before being acknowledged to the client. Only enable write caching on workloads that can tolerate some data loss in the case of multiple, simultaneous broker failures. Leaving write caching disabled safeguards your data against complete data center or availability zone failures. #### [](#configure-at-topic-level)Configure at topic level To override the cluster-level setting at the topic level, set the topic-level property `write.caching`: `rpk topic alter-config my_topic --set write.caching=true` With `write.caching` enabled at the topic level, Redpanda fsyncs to disk according to `flush.ms` and `flush.bytes`, whichever is reached first. ### [](#remove-a-configuration-setting)Remove a configuration setting You can remove a configuration that overrides the default setting, and the setting will use the default value again. For example, suppose you altered the cleanup policy to use `compact` instead of the default, `delete`. Now you want to return the policy setting to the default. To remove the configuration setting `cleanup.policy=compact`, run `rpk topic alter-config` with the `--delete` flag: ```bash rpk topic alter-config [TOPICS...] --delete cleanup.policy ``` ## [](#list-topic-configuration-settings)List topic configuration settings To display all the configuration settings for a topic, run: ```bash rpk topic describe -c ``` The `-c` flag limits the command output to just the topic configurations. This command is useful for checking the default configuration settings before you make any changes and for verifying changes after you make them. The following command output displays after running `rpk topic describe test-topic`, where `test-topic` was created with default settings: ```bash rpk topic describe test_topic SUMMARY ======= NAME test_topic PARTITIONS 1 REPLICAS 3 CONFIGS ======= KEY VALUE SOURCE cleanup.policy delete DYNAMIC_TOPIC_CONFIG compression.type producer DEFAULT_CONFIG max.message.bytes 20971520 DEFAULT_CONFIG message.timestamp.type CreateTime DEFAULT_CONFIG redpanda.datapolicy function_name: script_name: DEFAULT_CONFIG redpanda.remote.delete true DEFAULT_CONFIG redpanda.remote.read false DEFAULT_CONFIG redpanda.remote.write false DEFAULT_CONFIG retention.bytes -1 DEFAULT_CONFIG retention.local.target.bytes -1 DEFAULT_CONFIG retention.local.target.ms 86400000 DEFAULT_CONFIG retention.ms 604800000 DEFAULT_CONFIG segment.bytes 1073741824 DEFAULT_CONFIG ``` ## [](#delete-a-topic)Delete a topic To delete a topic, run: ```bash rpk topic delete ``` When a topic is deleted, its underlying data is deleted, too. To delete multiple topics at a time, provide a space-separated list. For example, to delete two topics named `topic1` and `topic2`, run: ```bash rpk topic delete topic1 topic2 ``` You can also use the `-r` flag to specify one or more regular expressions; then, any topic names that match the pattern you specify are deleted. For example, to delete topics with names that start with “f” and end with “r”, run: ```bash rpk topic delete -r '^f.*' '.*r$' ``` Note that the first regular expression must start with the `^` symbol, and the last expression must end with the `$` symbol. This requirement helps prevent accidental deletions. ## [](#delete-records-from-a-topic)Delete records from a topic Redpanda allows you to delete data from the beginning of a partition up to a specific offset (a monotonically increasing sequence number for records in a partition). Deleting records frees up disk space, which is especially helpful if your producers are pushing more data than anticipated in your retention plan. Delete records when you know that all consumers have read up to that given offset, and the data is no longer needed. There are different ways to delete records from a topic, including using the [`rpk topic trim-prefix`](../../../reference/rpk/rpk-topic/rpk-topic-trim-prefix/) command, using the `DeleteRecords` Kafka API with Kafka clients, or using Redpanda Cloud. > 📝 **NOTE** > > - To delete records, `cleanup.policy` must be set to `delete` or `compact,delete`. > > - Object storage is deleted asynchronously. After messages are deleted, the partition’s start offset will have advanced, but garbage collection of deleted segments may not be complete. > > - Similar to Kafka, after deleting records, local storage and object storage may still contain data for deleted offsets. (Redpanda does not truncate segments. Instead, it bumps the start offset, then it attempts to delete as many whole segments as possible.) Data before the new start offset is not visible to clients but could be read by someone with access to the local disk of a Redpanda node. > ⚠️ **WARNING** > > When you delete records from a topic with a timestamp, Redpanda advances the partition start offset to the first record whose timestamp is after the threshold. If record timestamps are not in order with respect to offsets, this may result in unintended deletion of data. Before using a timestamp, verify that timestamps increase in the same order as offsets in the topic to avoid accidental data loss. For example: > > ```bash > rpk topic consume -n 50 --format '%o %d{go[2006-01-02T15:04:05Z07:00]} %k %v' > ``` ## [](#next-steps)Next steps [Configure Producers](../../produce-data/configure-producers/) --- # Page 382: Topics Overview **URL**: https://docs.redpanda.com/redpanda-cloud/develop/topics/create-topic.md --- # Topics Overview --- title: Topics Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: topics/create-topic page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: topics/create-topic.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/topics/create-topic.adoc description: Learn how to create a topic for a Redpanda Cloud cluster. page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- Topics provide a way to organize events. After creating a cluster, you can create a topic in it. Each cluster can have up to 40,000 topics. Topic properties are populated from information stored in the broker. Redpanda features, such as Tiered Storage, are enabled and configured by default in Redpanda Cloud. You can optionally overwrite some settings. > ⚠️ **WARNING** > > Modifying the properties of topics that are created and managed by Redpanda applications can cause unexpected errors. This may lead to connector and cluster failures. | Property | Description | | --- | --- | | Partitions | The number of partitions for the topic. | | Replication factor | The number of partition replicas for the topic.Redpanda Cloud requires a minimum of 3 topic replicas. If a topic is created with a replication factor of 1, Redpanda resets the replication factor to 3. | | Cleanup policy | The policy that determines how to clean up old log segments.The default is delete. | | Retention time | The maximum length of time to keep messages in a topic.The default is 7 days. | | Retention size | The maximum size of each partition. If a partition reaches this size and more messages are added, the oldest messages are deleted.The default is infinite. | | Message size | The maximum size of a message or batch for a newly-created topic.The default is 20 MiB for BYOC and Dedicated clusters, and 8 MiB for Serverless clusters. You can increase this value up to 32 MiB for BYOC and Dedicated clusters, and 20 MiB for Serverless clusters, with the message.max.bytes topic property. | ## [](#next-steps)Next steps - [Manage Topics](../config-topics/) - [Manage Cloud Topics](../cloud-topics/) --- # Page 383: Transactions **URL**: https://docs.redpanda.com/redpanda-cloud/develop/transactions.md --- # Transactions --- title: Transactions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: transactions page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: transactions.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/transactions.adoc description: Learn how to use transactions; for example, you can fetch messages starting from the last consumed offset and transactionally process them one by one, updating the last consumed offset and producing events at the same time. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Redpanda supports Apache Kafka®-compatible transaction semantics and APIs. For example, you can fetch messages starting from the last consumed offset and transactionally process them one by one, updating the last consumed offset and producing events at the same time. A transaction can span partitions from different topics, and a topic can be deleted while there are active transactions on one or more of its partitions. In-flight transactions can detect deletion events, remove the deleted partitions (and related messages) from the transaction scope, and commit changes to the remaining partitions. If a producer is sending multiple messages to the same or different partitions, and network connectivity or broker failure cause the transaction to fail, then it’s guaranteed that either all messages are written to the partitions or none. This is important for applications that require strict guarantees, like financial services transactions. Transactions guarantee both exactly-once semantics (EOS) and atomicity: - EOS helps developers avoid the anomalies of at-most-once processing (with potential lost events) and at-least-once processing (with potential duplicated events). Redpanda supports EOS when transactions are used in combination with [idempotent producers](../produce-data/idempotent-producers/). - Atomicity additionally commits a set of messages across partitions as a unit: either all messages are committed or none. Encapsulated data received or sent across multiple topics in a single operation can only succeed or fail globally. ## [](#use-transactions)Use transactions By default, the `[enable_transactions](../../reference/properties/cluster-properties/#enable_transactions)` cluster configuration property is set to true. However, in the following use cases, clients must explicitly use the Transactions API to perform operations within a transaction: - [Atomic (all or nothing) publishing of multiple messages](#atomic-publishing-of-multiple-messages) - [Exactly-once stream processing](#exactly-once-stream-processing) When you use transactions, you must set the [`transactional.id`](https://kafka.apache.org/documentation/#producerconfigs_transactional.id) property in the producer configuration. This property uniquely identifies the producer and enables reliable semantics across multiple producer sessions. It ensures that all transactions issued by a given producer are completed before any new transactions are started. ### [](#atomic-publishing-of-multiple-messages)Atomic publishing of multiple messages A banking IT system with an event-sourcing microservice architecture illustrates why transactions are necessary. In this system, each bank branch is implemented as an independent microservice that manages its own distinct set of accounts. Every branch maintains its own transaction history, stored as a Redpanda partition. When a branch starts, it replays the transaction history to reconstruct its current state. Financial transactions such as money transfers require the following guarantees: - A sender can’t withdraw more than the account withdrawal limit. - A recipient receives exactly the same amount sent. - A transaction is fast and is run at most once. - If a transaction fails, the system rolls back to the initial state. - Without withdrawals and deposits, the amount of money in the system remains constant with any history of money transfers. These requirements are easy to satisfy when the sender and the recipient of a financial transaction are hosted by the same branch. The operation doesn’t leave the consistency domain, and all checks and locks can be performed within a single service (ledger). Things get more complex with cross-branch financial transactions, because they involve several ledgers, and the operations should be performed atomically (all or nothing). The default approach (saga pattern) breaks a transaction into a sequence of reversible idempotent steps; however, this violates the isolation principle and adds complexity, making the application responsible for orchestrating the steps. Redpanda natively supports transactions, so it’s possible to atomically update several ledgers at the same time. For example: Show multi-ledger transaction example: ```java Properties props = new Properties(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "..."); props.put(ProducerConfig.ACKS_CONFIG, "all"); props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "app-id"); Producer producer = null; while (true) { // waiting for somebody to initiate a financial transaction var sender_branch = ...; var sender_account = ...; var recipient_branch = ...; var recipient_account = ...; var amount = 42; if (producer == null) { try { producer = new KafkaProducer<>(props); producer.initTransactions(); } catch (Exception e1) { // TIP: log error for further analysis try { if (producer != null) { producer.close(); } } catch(Exception e2) { } producer = null; // TIP: notify the initiator of a transaction about the failure continue; } } producer.beginTransaction(); try { var f1 = producer.send(new ProducerRecord("ledger", sender_branch, sender_account, "" + (-amount))); var f2 = producer.send(new ProducerRecord("ledger", recipient_branch, recipient_account, "" + amount)); f1.get(); f2.get(); } catch (Exception e1) { // TIP: log error for further analysis try { producer.abortTransaction(); } catch (Exception e2) { // TIP: log error for further analysis try { producer.close(); } catch (Exception e3) { } producer = null; } // TIP: notify the initiator of a transaction about the failure continue; } try { producer.commitTransaction(); } catch (Exception e1) { try { producer.close(); } catch (Exception e3) {} producer = null; // TIP: notify the initiator of a transaction about the failure continue; } // TIP: notify the initiator of a transaction about the success } ``` When a transaction fails before a `commitTransaction` attempt completes, you can assume that it is not executed. When a transaction fails after a `commitTransaction` attempt completes, the true transaction status is unknown. Redpanda only guarantees that there isn’t a partial result: either the transaction is committed and complete, or it is fully rolled back. ### [](#exactly-once-stream-processing)Exactly-once stream processing Redpanda is commonly used as a pipe connecting different applications and storage systems. An application could use an OLTP database and then rely on change data capture to deliver the changes to a data warehouse. Redpanda transactions let you use streams as a smart pipe in your applications, building complex atomic operations that transform, aggregate, or otherwise process data transiting between external applications and storage systems. For example, here is the regular pipe flow: Postgresql -> topic -> warehouse Here is the smart pipe flow, with a transformation in `topic(1) -> topic(2)`: Postgresql -> topic(1) transform topic(2) -> warehouse The transformation reads a record from `topic(1)`, processes it, and writes it to `topic(2)`. Without transactions, an intermittent error can cause a message to be lost or processed several times. With transactions, Redpanda guarantees exactly-once semantics. For example: Show exactly-once processing example: ```java var source = "source-topic"; var target = "target-topic"; Properties pprops = new Properties(); pprops.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "..."); pprops.put(ProducerConfig.ACKS_CONFIG, "all"); pprops.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); pprops.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString()); Properties cprops = new Properties(); cprops.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "..."); cprops.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false); cprops.put(ConsumerConfig.GROUP_ID_CONFIG, "app-id"); cprops.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); cprops.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed"); Consumer consumer = null; Producer producer = null; boolean should_reset = false; while (true) { if (should_reset) { should_reset = false; if (consumer != null) { try { consumer.close(); } catch(Exception e) {} consumer = null; } if (producer != null) { try { producer.close(); } catch (Exception e2) {} producer = null; } } try { if (consumer == null) { consumer = new KafkaConsumer<>(cprops); consumer.subscribe(Collections.singleton(source)); } } catch (Exception e1) { // TIP: log error for further analysis should_reset = true; continue; } try { if (producer == null) { producer = new KafkaProducer<>(pprops); producer.initTransactions(); } } catch (Exception e1) { // TIP: log error for further analysis should_reset = true; continue; } ConsumerRecords records = null; try { records = consumer.poll(Duration.ofMillis(10000)); } catch (Exception e1) { // TIP: log error for further analysis should_reset = true; continue; } var it = records.iterator(); while (it.hasNext()) { var record = it.next(); // transformation var old_value = record.value(); var new_value = old_value.toUpperCase(); try { producer.beginTransaction(); producer.send(new ProducerRecord(target, record.key(), new_value)); var offsets = new HashMap(); offsets.put(new TopicPartition(source, record.partition()), new OffsetAndMetadata(record.offset() + 1)); producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata()); } catch (Exception e1) { // TIP: log error for further analysis try { producer.abortTransaction(); } catch (Exception e2) { } should_reset = true; break; } try { producer.commitTransaction(); } catch (Exception e1) { // TIP: log error for further analysis should_reset = true; break; } } } ``` #### [](#exactly-once-processing-configuration-requirements)Exactly-once processing configuration requirements Redpanda’s default configuration supports exactly-once processing. To preserve this capability, ensure the following settings are maintained: - `enable_idempotence = true` - `enable_transactions = true` - `transaction_coordinator_delete_retention_ms` is greater than or equal to `transactional_id_expiration_ms` ## [](#best-practices)Best practices To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda: ### [](#tune-producer-id-limits)Tune producer ID limits For production environments with heavy producer usage, configure both [`max_concurrent_producer_ids`](../../reference/properties/cluster-properties/#max_concurrent_producer_ids) and [`transactional_id_expiration_ms`](../../reference/properties/cluster-properties/#transactional_id_expiration_ms) to prevent out-of-memory (OOM) crashes. Setting limits on producer IDs helps manage memory usage in high-throughput environments, particularly when using transactions or idempotent producers. If you have\`kafka\_connections\_max\` configured, you can determine an appropriate value for `max_concurrent_producer_ids` based on your connection patterns. - Lower bound: `kafka_connections_max` / `number_of_shards`, assuming each producer connects to only one shard. - Upper bound: `topic_partitions_per_shard` \* `kafka_connections_max`, assuming producers connect to all shards. If `kafka_connections_max` is not configured, estimate the value for `max_concurrent_producer_ids` based on your application patterns. A conservative approach is to start with 1000-5000 per shard, then monitor and adjust as needed. Applications with many partitions per producer typically require higher values, such as 10000 or more per shard. Tune `transactional_id_expiration_ms` based on your application’s transaction patterns. Calculate this value by taking your longest expected transaction time and adding a safety buffer. For example, if transactions typically run for 30 minutes, consider setting this to 2-4 hours. Short-lived transactions can use values between 1-4 hours, while batch processing applications should match their batch interval plus buffer time. Interactive applications may benefit from shorter values to free up memory faster. Client applications should minimize producer ID churn. Reuse producer instances when possible, instead of creating new ones for each operation. Avoid using random transactional IDs, as some Flink configurations do, because this creates excessive producer ID churn. Instead, use consistent transactional IDs that can be resumed across application restarts. ### [](#configure-transaction-timeouts-and-limits)Configure transaction timeouts and limits - If a consumer is configured to use the read\_committed isolation level, it can only process successfully committed transactions. As a result, an ongoing transaction with a large timeout that becomes stuck could prevent the consumer from processing other committed transactions. To avoid this, don’t set the transaction timeout client setting (`transaction.timeout.ms` in the Kafka Java client implementation) to a value that is too high. The longer the timeout, the longer consumers may be blocked. ## [](#handle-transaction-failures)Handle transaction failures Different transactions require different approaches to handling failures within the application. Consider the approaches to failed or timed-out transactions in the provided use cases: - Publishing of multiple messages: The request came from outside the system, and it is the application’s responsibility to discover the true status of a timed-out transaction. (This example doesn’t use consumer groups to distribute partitions between consumers.) - Exactly-once streaming (consume-transform-loop): This is a closed system. Upon re-initialization of the consumer and producer, the system automatically discovers the moment it was interrupted and continues from that place. Additionally, this automatically scales by the number of partitions. Run another instance of the application, and it starts processing its share of partitions in the source topic. ## [](#transactions-with-compacted-segments)Transactions with compacted segments Transactions are supported on topics with compaction configured. The compaction process removes aborted transaction data from the log. The resulting compacted segment contains only committed data batches (and potentially harmless gaps in the offsets due to skipped batches). ## [](#suggested-reading)Suggested reading - [Kafka-compatible fast distributed transactions](https://redpanda.com/blog/fast-transactions) --- # Page 384: Get Started **URL**: https://docs.redpanda.com/redpanda-cloud/get-started.md --- # Get Started --- title: Get Started latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/index.adoc description: Get Started index page. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-06-07" --- - [What’s New in Redpanda Cloud](whats-new-cloud/) Summary of new features in Redpanda Cloud. - [Redpanda Cloud Overview](cloud-overview/) Learn about the Redpanda Agentic Data Plane (ADP) and deployment options including BYOC, Dedicated, and Serverless clusters. - [BYOC Architecture](byoc-arch/) Learn about the control plane - data plane architecture in BYOC. --- # Page 385: How Redpanda Works **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/architecture.md --- # How Redpanda Works --- title: How Redpanda Works latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: architecture page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: architecture.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/architecture.adoc description: Learn specifics about Redpanda architecture. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- At its core, Redpanda is a fault-tolerant transaction log for storing event streams. Producers and consumers interact with Redpanda using the Kafka API. To achieve high scalability, producers and consumers are fully decoupled. Redpanda provides strong guarantees to producers that events are stored durably within the system, and consumers can subscribe to Redpanda and read the events asynchronously. Redpanda achieves this decoupling by organizing events into topics. Topics represent a logical grouping of events that are written to the same log. A topic can have multiple producers writing events to it and multiple consumers reading events from it. This page provides details about how Redpanda works. For a high-level overview, see [Introduction to Redpanda](../intro-to-events/). ## [](#tiered-storage)Tiered Storage Redpanda Tiered Storage is a multi-tiered object storage solution that provides the ability to offload log segments to object storage in near real time. Tiered Storage can be combined with local storage to provide long-term data retention and disaster recovery on a per-topic basis. Consumers that read from more recent offsets continue to read from local storage, and consumers that read from historical offsets read from object storage, all with the same API. Consumers can read and reread events from any point within the maximum retention period, whether the events reside on local or object storage. As data in object storage grows, the metadata for it grows. To support efficient long-term data retention, Redpanda splits the metadata in object storage, maintaining metadata of only recently-updated segments in memory or local disk, while safely archiving the remaining metadata in object storage and caching it locally on disk. Archived metadata is then loaded only when historical data is accessed. This allows Tiered Storage to handle partitions of virtually any size or retention length. ## [](#partitions)Partitions To scale topics, Redpanda shards them into one or more partitions that are distributed across the nodes in a cluster. This allows for concurrent writing and reading from multiple nodes. When producers write to a topic, they route events to one of the topic’s partitions. Events with the same key (like a stock ticker) are always routed to the same partition, and Redpanda guarantees the order of events at the partition level. Consumers read events from a partition in the order that they were written. If a key is not specified, then events are sent to all topic partitions in a round-robin fashion. ## [](#raft-consensus-algorithm)Raft consensus algorithm Redpanda provides strong guarantees for data safety and fault tolerance. Events written to a topic partition are appended to a log file on disk. They can be replicated to other nodes in the cluster and appended to their copies of the log file on disk to prevent data loss in the event of failure. The [Raft consensus algorithm](https://raft.github.io/) is used for data replication. Every topic partition forms a Raft group consisting of a single elected leader and zero or more followers (as specified by the topic’s replication factor). A Raft group can tolerate ƒ failures given 2ƒ+1 nodes. For example, in a cluster with five nodes and a topic with a replication factor of five, the topic remains fully operational if two nodes fail. Raft is a majority vote algorithm. For a leader to acknowledge that an event has been committed to a partition, a majority of its replicas must have written that event to their copy of the log. When a majority (quorum) of responses have been received, the leader can make the event available to consumers and acknowledge receipt of the event when `acks=all (-1)`. [Producer acknowledgement settings](../../develop/produce-data/configure-producers/#producer-acknowledgement-settings) define how producers and leaders communicate their status while transferring data. As long as the leader and a majority of the replicas are stable, Redpanda can tolerate disturbances in a minority of the replicas. If [gray failures](https://blog.acolyer.org/2017/06/15/gray-failure-the-achilles-heel-of-cloud-scale-systems/) cause a minority of replicas to respond slower than normal, then the leader does not have to wait for their responses to progress, and any additional latency is not passed on to the clients. The result is that Redpanda is less sensitive to faults and can deliver predictable performance. ## [](#partition-leadership-elections)Partition leadership elections [Raft](https://raft.github.io/) uses a heartbeat mechanism to maintain leader authority and to trigger leader elections. The partition leader sends a periodic heartbeat to all followers to assert its leadership in the current term (default = 150 milliseconds). A term is an arbitrary period of time that starts when a leader election is triggered. If a follower does not receive a heartbeat over a period of time (default = 1.5 seconds), then it triggers an election to choose a new partition leader. The follower increments its term and votes for itself to be the leader for that term. It then sends a vote request to the other nodes and waits for one of the following scenarios: - It receives a majority of votes and becomes the leader. Raft guarantees that at most one candidate can be elected the leader for a given term. - Another follower establishes itself as the leader. While waiting for votes, the candidate may receive communication from another node in the group claiming to be the leader. The candidate only accepts the claim if its term is greater than or equal to the candidate’s term; otherwise, the communication is rejected and the candidate continues to wait for votes. - No leader is elected over a period of time. If multiple followers timeout and become election candidates at the same time, it’s possible that no candidate gets a majority of votes. When this happens, each candidate increments its term and triggers a new election round. Raft uses a random timeout between 150-300 milliseconds to ensure that split votes are rare and resolved quickly. As long as there is a timing inequality between heartbeat time, election timeout, and mean time between node failures (MTBF), then Raft can elect and maintain a steady leader and make progress. A leader can maintain its position as long as one of the ten heartbeat messages it sends to all of its followers every 1.5 seconds is received; otherwise, a new leader is elected. If a follower triggers an election, but the incumbent leader subsequently springs back to life and starts sending data again, then it’s too late. As part of the election process, the follower (now an election candidate) incremented the term and rejects requests from the previous term, essentially forcing a leadership change. If a cluster is experiencing wider network infrastructure problems that result in latencies above the heartbeat timeout, then back-to-back election rounds can be triggered. During this period, unstable Raft groups may not be able to form a quorum. This results in partitions rejecting writes, but data previously written to disk is not lost. Redpanda has a Raft-priority implementation that allows the system to settle quickly after network outages. ## [](#controller-partition-and-snapshots)Controller partition and snapshots Redpanda stores metadata update commands (such as creating and deleting topics or users) in a system partition called the controller partition. A new snapshot is created after each controller command is added, or, with rapid updates, after a set period of time (default is 60 seconds). Controller snapshots save the current cluster metadata state to disk, so startup is fast. For example, with a partition that has moved several times, a snapshot can restore the latest state without replaying every move command. Each broker has a snapshot file stored in the controller log directory, such as `/var/lib/redpanda/data/redpanda/controller/0_0/snapshot`. The controller partition is replicated by a Raft group that includes all cluster brokers, and the controller snapshot is the Raft snapshot for this group. Snapshots are hydrated when a broker joins the cluster or restarts. Snapshots are enabled by default for all clusters, both new and upgraded. ## [](#optimized-platform-performance)Optimized platform performance Redpanda is designed to exploit advances in modern hardware, from the network down to the disks. Network bandwidth has increased considerably, especially in object storage, and spinning disks have been replaced by SSD devices that deliver better I/O performance. CPUs are faster too, but this is largely due to the increased core counts as opposed to the increase in single-core speeds. Redpanda has tuners that detect your hardware configuration to automatically optimize itself. Examples of platform and kernel features that Redpanda uses to optimize its performance: - Direct Memory Access (DMA) for disk I/O - Sparse file system support with XFS - Distribution of interrupt request (IRQ) processing between CPU cores - Isolated processes with control groups (cgroups) - Disabled CPU power-saving modes - Upfront memory allocation, partitioned and pinned to CPU cores ## [](#tpc)Thread-per-core model Redpanda implements a thread-per-core programming model through its use of the [Seastar](https://seastar.io/) library. This allows Redpanda to pin each of its application threads to a CPU core to avoid context switching and blocking. It combines this with structured message passing (SMP) to asynchronously communicate between the pinned threads. With this, Redpanda avoids the overhead of context switching and expensive locking operations to improve processing performance and efficiency. From a sizing perspective, Redpanda’s ability to efficiently use all available hardware enables it to scale up to get the most out of your infrastructure, before you’re forced to scale out to meet the demands of your workload. Redpanda delivers better performance with a smaller footprint, resulting in reduced operational costs and complexity. --- # Page 386: BYOC Architecture **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/byoc-arch.md --- # BYOC Architecture --- title: BYOC Architecture latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc-arch page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc-arch.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/byoc-arch.adoc description: Learn about the control plane - data plane architecture in BYOC. page-git-created-date: "2025-04-01" page-git-modified-date: "2026-04-07" --- With Bring Your Own Cloud (BYOC) clusters, you deploy Redpanda in your own cloud (AWS, Azure, or GCP), and all data is contained in your own environment. This provides an additional layer of security and isolation. Redpanda handles provisioning, operations, and maintenance of the underlying infrastructure, including Kubernetes. ## [](#control-plane-data-plane)Control plane - data plane For high availability, Redpanda Cloud uses the following control plane - data plane architecture: ![Control plane and data plane](../../shared/_images/control_d_plane.png) - **Control plane**: This is a Redpanda Cloud managed service that manages provisioning, operations, and maintenance of clusters with Kubernetes under the hood, including Kubernetes version upgrades and infrastructure maintenance. The control plane enforces rules in the data plane. You can use [RBAC](../../security/authorization/rbac/rbac/) or [GBAC](../../security/authorization/gbac/gbac/) in the control plane to manage access to organization-level resources like clusters, resource groups, and networks. - **Data plane**: This is where your cluster lives. The term _data plane_ is sometimes used interchangeably with _cluster_. The data plane is where you manage topics, consumer groups, connectors, and schemas. You can use [RBAC](../../security/authorization/rbac/rbac_dp/) or [GBAC](../../security/authorization/gbac/gbac_dp/) in the data plane to configure cluster-level permissions for provisioned users at scale. IAM permissions allow the Redpanda Cloud agent to access the cloud provider API to create and manage cluster resources. The permissions follow the principle of least privilege, limiting access to only what is necessary. Clusters are configured and maintained in the control plane, but they remain available even if the network connection to the control plane is lost. > 💡 **TIP** > > In the Redpanda Cloud UI, you can identify which plane you’re in by the side navigation: > > - **Control Plane:** Visible after login at the organization level. Here you can select, create, and delete clusters, networks, and resource groups. > > - **Data Plane:** Visible after selecting a specific cluster. Here you can work with topics, consumer groups, connectors, and schemas. ## [](#byoc-setup)BYOC setup In a BYOC architecture, you deploy the data plane in your own VPC. All network connections into the data plane take place through either a public endpoint, or for private clusters, through Redpanda Cloud network connections such as VPC peering, AWS PrivateLink, Azure Private Link, or GCP Private Service Connect. Customer data never leaves the data plane. A BYOC cluster is initially set up from the control plane. This is a two-step process performed by `rpk cloud byoc apply`: 1. You bootstrap a virtual machine (VM) in your VPC. This VM launches the agent and bootstraps the necessary infrastructure. Redpanda then assigns fine-grained IAM policies following least privilege, creating dedicated IAM roles per workload with only the permissions each requires. 2. The agent communicates with the control plane to pull the cluster specifications. After the agent is up and running, it connects to the control plane and starts dequeuing and applying cluster specifications that provision, configure, and maintain clusters. The agent is in constant communication with the control plane, receiving and applying cluster specifications and exchanging cluster metadata. Agents are authenticated and authorized through opaque and ephemeral tokens, and they have dedicated job queues in the control plane. Agents also manage VPC peering networks. ![cloud_byoc_apply](../../shared/_images/byoc_apply.png) > 📝 **NOTE** > > To create a Redpanda cluster in your virtual private cloud (VPC), follow the instructions in the Redpanda Cloud UI. The UI contains the parameters necessary to successfully run `rpk cloud byoc apply` with your cloud provider. > 📝 **NOTE** > > Redpanda Cloud does not support customer access or modifications to any of the internal data plane resources. This restriction allows Redpanda Data to manage all configuration changes internally to ensure a 99.99% service level agreement (SLA) for BYOC clusters. --- # Page 387: Redpanda Cloud Overview **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cloud-overview.md --- # Redpanda Cloud Overview --- title: Redpanda Cloud Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cloud-overview.adoc description: Learn about the Redpanda Agentic Data Plane (ADP) and deployment options including BYOC, Dedicated, and Serverless clusters. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-04-07" --- Redpanda Cloud is a complete data streaming and agentic data plane platform delivered as a fully-managed service. It provides automated upgrades and patching, data balancing, and support while continuously monitoring your data to meet strict performance, availability, reliability, and security requirements. All Redpanda Cloud clusters are deployed with an integrated [Redpanda Console](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#redpanda-console), and all clusters have access to unlimited retention and 300+ data connectors with [Redpanda Connect](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#redpanda-connect). ## [](#redpanda-agentic-data-plane-adp)Redpanda Agentic Data Plane (ADP) Redpanda ADP is enterprise-grade infrastructure for building, deploying, and governing [AI agents](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#ai-agent) at scale. It combines Redpanda’s streaming-native immutable log, 300+ data connectors, and declarative agent definitions into a unified platform with built-in governance, cost controls, and compliance-grade audit trails. Redpanda ADP includes the following key components: - **AI agents**: Declare the behavior you want instead of writing code. Redpanda powers declarative definitions with 300+ connectors. - **MCP servers**: Translate agent intent into connections to your business systems using proven connectors, no glue code required. - **Transcripts**: End-to-end execution records built on an immutable log with formal correctness guarantees. Transcripts are the keystone of agent governance. - **AI Gateway**: High-availability model routing with fiscal controls and per-tenant cost attribution across LLM providers. For more information, see [Redpanda Agentic Data Plane Overview](../../ai-agents/adp-overview/). > ❗ **IMPORTANT** > > Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability). ## [](#redpanda-cloud-deployment-options)Redpanda Cloud deployment options Redpanda Cloud applications are supported by three fully-managed deployment options: - **[Serverless](#serverless)**: Fastest way to get started with automatic scaling - **[Dedicated](#dedicated)**: Production clusters in Redpanda’s cloud with enhanced isolation - **[Bring Your Own Cloud (BYOC)](#bring-your-own-cloud-byoc)**: Maximum control and security by deploying in your own cloud environment ### [](#quick-comparison)Quick comparison | | Serverless | Dedicated | BYOC | | --- | --- | --- | --- | | Best for | Starter projects and applications with low or variable traffic | Production clusters requiring cloud hosting, higher throughput, and extra isolation | Production clusters requiring data sovereignty, the highest throughput, and added security | | Deployment | Redpanda’s cloud (AWS/GCP) | Redpanda’s cloud (AWS/Azure/GCP) | Your cloud account (AWS/Azure/GCP) | | Redpanda ADP | ✗ | ✗ | ✓ | | Tenancy | Multi-tenant | Single-tenant | Single-tenant | | Cloud SLA | 99.9% | 99.99%, multi-AZ | 99.99%, multi-AZ | | Max throughput (write, read) | Up to 100 MB/s, 300 MB/s | Up to 400 MB/s, 800 MB/s | Up to 2 GB/s, 4 GB/s | | Partitions, pre-replication | Up to 5,000 | Up to 45,600 | Up to 112,500 | | Max message size (MiB) | 8 (default), 20 (max) | 20 (default), 32 (max) | 20 (default), 32 (max) | | Private networking | ✓ | ✓ | ✓ | | SSO authentication | ✓ (GitHub, Google) | ✓ (GitHub, Google, OIDC) | ✓ (GitHub, Google, OIDC) | | Redpanda Connect | ✓ | ✓ | ✓ | | Role-based access control (RBAC) & audit logs | ✗ | ✓ | ✓ | | Group-based access control (GBAC) | ✗ | ✓ | ✓ | | Prometheus/OpenMetrics endpoint for cluster metrics | ✓ | ✓ | ✓ | | Multiple availability zones (AZs) | ✗ | ✓ | ✓ | | Cluster properties editing | ✗ | ✓ (AWS/GCP) | ✓ (AWS/GCP) | | Kafka Connect | ✗ | ✓ (disabled by default) | ✓ (disabled by default) | | Redpanda Support | Enterprise support with annual contracts | Enterprise support | Enterprise support for BYOC; Premium support required for BYOVPC/BYOVNet | > 📝 **NOTE** > > - The partition limit is the number of logical partitions before replication occurs. Redpanda Cloud uses a replication factor of three. > > - Enterprise support provides access to streaming experts 24/5, with 24/7 priority escalation for production outages. Premium support provides an enhanced Support SLA. > > - See also: [Serverless vs BYOC/Dedicated](#serverless-vs-byocdedicated) ### [](#serverless)Serverless Serverless is the fastest and easiest way to start data streaming. With Serverless clusters, you host your data in Redpanda’s VPC, and Redpanda handles automatic scaling, provisioning, operations, and maintenance. This is a production-ready deployment option with a cluster available instantly, and you only pay for what you consume. > 📝 **NOTE** > > - Serverless on GCP is currently in a [beta](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#beta) release. #### [](#sign-up-for-serverless)Sign up for Serverless ##### Free trial A [free trial on AWS](https://www.redpanda.com/try-redpanda) is the fastest way to get started with Serverless. Each free-trial customer qualifies for $100 (USD) in credits to spend in the first 14 days. This should be enough to run Redpanda with reasonable throughput. No credit card is required. To continue using Serverless after your trial expires, you can enter a credit card and pay as you go. Any remaining credit balance is used before you are charged. When either the credits expire or the days in the trial expire, the clusters move into a suspended state, and you won’t be able to access your data in either the Redpanda Cloud Console or with the Kafka API. There is a seven-day grace period following the end of the trial when you can add your credit card and restore service. After that, the data is permanently deleted. For questions about the trial, use the **#serverless** [Community Slack](https://redpandacommunity.slack.com/) channel. After you start a trial, Redpanda instantly prepares an account for you. Your account includes a `welcome` cluster with a `hello-world` demo topic you can explore. It includes sample data so you can see how real-time messaging works before sending your own data. [Get started](../cluster-types/serverless/#interact-with-your-cluster) by creating a Redpanda Connect [pipeline](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#pipeline), or by following the steps in the Console to use `rpk` to interact with your cluster from the command line: 1. Log in with `rpk cloud login`. 2. Consume from the `hello-world` topic with `rpk topic consume hello-world`. 3. In the [Redpanda Cloud Console](https://cloud.redpanda.com), navigate to the **Topics** page and open the `hello-world` topic to see the included messages. ##### Redpanda Sales To request a private offer with possible discounts for annual committed use, contact [Redpanda Sales](https://www.redpanda.com/price-estimator). When you subscribe to Serverless through Redpanda Sales, you gain immediate access to Enterprise support. Redpanda creates a cloud organization for you and sends you a welcome email. ##### AWS Marketplace New subscriptions to Redpanda Cloud through [AWS Marketplace](../../billing/aws-pay-as-you-go/) receive $300 (USD) in free credits to spend in the first 30 days. AWS Marketplace charges for anything beyond $300, unless you cancel the subscription. After your free credits have been used, you can continue using your cluster without any commitment, only paying for what you consume and canceling anytime. > 📝 **NOTE** > > When you subscribe to Redpanda through AWS Marketplace, you do not have immediate access to Enterprise support, only the [Community Slack](https://redpandacommunity.slack.com/) channel. For Enterprise support, contact [Redpanda Sales](https://www.redpanda.com/price-estimator) Redpanda creates a cloud organization for you and sends you a welcome email. ### [](#dedicated)Dedicated With Dedicated clusters, you host your data on Redpanda Cloud resources (AWS, GCP, or Azure), and Redpanda handles provisioning, operations, and maintenance. When you create a Dedicated cluster, you select the supported [tier](../../reference/tiers/dedicated-tiers/) that meets your compute and storage needs. #### [](#sign-up-for-dedicated)Sign up for Dedicated ##### Redpanda Sales To request a private offer with possible discounts for monthly or annual committed use, contact [Redpanda Sales](https://www.redpanda.com/price-estimator). With a usage-based billing commitment, you sign up for a minimum spend amount through [AWS Marketplace](../../billing/aws-commit/), [Azure Marketplace](../../billing/azure-commit/), or [Google Cloud Marketplace](../../billing/gcp-commit/). Redpanda creates a cloud organization for you and sends you a welcome email. You can then provision Dedicated clusters in Redpanda Cloud, and you can view invoices and manage your subscription in the marketplace. ##### AWS Marketplace New subscriptions to Redpanda Cloud through [AWS Marketplace](../../billing/aws-pay-as-you-go/) receive $300 (USD) in free credits to spend in the first 30 days. AWS Marketplace charges for anything beyond $300, unless you cancel the subscription. After your free credits have been used, you can continue using your cluster without any commitment, only paying for what you consume and canceling anytime. Redpanda creates a cloud organization for you and sends you a welcome email. ### [](#bring-your-own-cloud-byoc)Bring Your Own Cloud (BYOC) With BYOC clusters, the Redpanda data plane (including Redpanda ADP components and Redpanda brokers) deploys into your existing VPC or VNet, ensuring all data remains in your environment. With BYOC clusters, you deploy the Redpanda [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) into your existing VPC (for AWS and GCP) or VNet (for Azure), and all data is contained in your own environment. This provides an additional layer of security and isolation. (See [BYOC Architecture](../byoc-arch/).) Redpanda manages provisioning, monitoring, upgrades, and security policies, including the underlying infrastructure and Kubernetes used to run the cluster. Redpanda also manages required resources in your VPC or VNet, including subnets (subnetworks in GCP), IAM roles, and object storage resources (for example, S3 buckets or Azure Storage accounts). For full details, see [Upgrades and Maintenance](../../manage/maintenance/). #### [](#bring-your-own-vpcvnet-byovpcbyovnet)Bring Your Own VPC/VNet (BYOVPC/BYOVNet) With BYOVPC/BYOVNet clusters, you take full control of the networking lifecycle. Compared to standard BYOC, BYOVPC/BYOVNet provides more security, but the configuration is more complex. See the [shared responsibility model](#shared-responsibility-model) to understand what you manage versus what Redpanda manages. The BYOC infrastructure that Redpanda manages should not be used to deploy any other workloads. For details about the control plane - data plane framework in BYOC, see [BYOC architecture](../byoc-arch/). #### [](#sign-up-for-byoc)Sign up for BYOC To start using BYOC, contact [Redpanda sales](https://redpanda.com/try-redpanda?section=enterprise-trial) to request a private offer with possible discounts. You are billed directly or through Google Cloud Marketplace or AWS Marketplace. ### [](#serverless-vs-byocdedicated)Serverless vs BYOC/Dedicated Serverless clusters are a good fit for the following use cases: - Quick setup for development or testing - Variable or unpredictable traffic patterns - No upfront cost commitment - Isolated environments for different applications Consider BYOC or Dedicated if you need more control over the deployment or if you have workloads with consistently-high throughput. BYOC and Dedicated clusters offer the following features: - Redpanda Agentic Data Plane (ADP): BYOC only - Multiple availability zones (AZs). A multi-AZ cluster provides higher resiliency in the event of a failure in one of the zones. - Role-based access control (RBAC) in the data plane - Group-based access control (GBAC) - Kafka Connect - Higher limits and quotas. See [BYOC usage tiers](../../reference/tiers/byoc-tiers/) and [Dedicated usage tiers](../../reference/tiers/dedicated-tiers/) compared to [Serverless limits](../cluster-types/serverless/#serverless-usage-limits). ## [](#redpanda-cloud-architecture)Redpanda Cloud architecture When you sign up for a Redpanda account, Redpanda creates an organization for you. Your organization contains all your Redpanda resources, including your clusters and networks. Within your organization, Redpanda creates a default resource group to contain your resources. You can rename this resource group, and you can create more resource groups. For example, you may want different resource groups for production and testing. > 💡 **TIP** > > For more detailed information about the Redpanda platform, see [Introduction to Redpanda](../intro-to-events/) and [How Redpanda Works](../architecture/). ## [](#shared-responsibility-model)Shared responsibility model The Redpanda Cloud shared responsibility model lists the security areas owned by Redpanda and the security areas owned by customers. Responsibilities depend on the type of deployment. ### BYOC | Resource | Redpanda responsibility | Customer responsibility | | --- | --- | --- | | Redpanda upgrades and hotfixes | ✓ | | | Cost management and attribution | ✓ | ✓ | | Software vulnerability remediation | ✓ | | | Infrastructure vulnerability remediation | ✓ | | | IAM (roles, service accounts, access segmentation) | ✓ | ✓ | | Compute | ✓ | | | Redpanda agent VM maintenance | ✓ | | | VPC (subnets, routing, firewall) | ✓ | ✓ | | VPC peering | | ✓ | | VPC private links (service endpoint) | ✓ | | | VPC private links (consumer endpoint) | | ✓ | | Local storage | ✓ | | | Tiered Storage | ✓ | | | Control plane | ✓ | | | Access controls and audit | ✓ | ✓ | | Managed disaster recovery | | ✓ | | Observability and monitoring (SLOs, SLIs, tracing, alerting, runbooks) | ✓ | | | Availability service-level agreement (SLA) | ✓ (subject to required access to customer resources) | | | Proactive threat detection | ✓ | ✓ | | Static secret rotation | ✓ | | | Incident response | ✓ | | | Resilience verification | ✓ | | | Kafka Connect infrastructure | ✓ | ✓ | | Kafka Connect tasks state | | ✓ | ### BYOVPC/BYOVNet | Resource | Redpanda responsibility | Customer responsibility | | --- | --- | --- | | Redpanda upgrades and hotfixes | ✓ | | | Cost management and attribution | ✓ | ✓ | | Software vulnerability remediation | ✓ | | | Infrastructure vulnerability remediation | ✓ | ✓ | | IAM (roles, service accounts, access segmentation) | | ✓ | | Compute | ✓ | | | Redpanda agent VM maintenance | ✓ | | | VPC (subnets, routing, firewall) | | ✓ | | VPC peering | | ✓ | | VPC private links (service endpoint) | ✓ | | | VPC private links (consumer endpoint) | | ✓ | | Local storage | ✓ | | | Tiered Storage | | ✓ | | Control plane | ✓ | | | Access controls and audit | ✓ | ✓ | | Managed disaster recovery | | ✓ | | Observability and monitoring (SLOs, SLIs, tracing, alerting, runbooks) | ✓ | ✓ (for VPC components and cloud storage buckets/containers managed by customer) | | Availability SLA | ✓ (subject to required access to customer resources) | ✓ | | Proactive threat detection | ✓ | ✓ | | Static secret rotation | ✓ | ✓ | | Incident response | ✓ | | | Resilience verification | ✓ | | | Kafka Connect infrastructure | ✓ | ✓ | | Kafka Connect tasks state | | ✓ | ### Dedicated | Resource | Redpanda responsibility | Customer responsibility | | --- | --- | --- | | Redpanda upgrades and hotfixes | ✓ | | | Cost management and attribution | ✓ | | | Software vulnerability remediation | ✓ | | | Infrastructure vulnerability remediation | ✓ | | | IAM (roles, service accounts, access segmentation) | ✓ | | | Compute | ✓ | | | Redpanda agent VM maintenance | ✓ | | | VPC (subnets, routing, firewall) | ✓ | | | VPC peering | ✓ | | | VPC private links (service endpoint) | ✓ | | | VPC private links (consumer endpoint) | | ✓ | | Local storage | ✓ | | | Tiered Storage | ✓ | | | Control plane | ✓ | | | Access controls and audit | ✓ | | | Managed disaster recovery | | ✓ | | Observability and monitoring (SLOs, SLIs, tracing, alerting, runbooks) | ✓ | | | Availability SLA | ✓ | | | Proactive threat detection | ✓ | | | Static secret rotation | ✓ | | | Incident response | ✓ | | | Resilience verification | ✓ | | | Kafka Connect infrastructure | ✓ | | | Kafka Connect tasks state | | ✓ | ## [](#redpanda-connect-and-kafka-connect)Redpanda Connect and Kafka Connect [Redpanda Connect](../../develop/connect/about/) lets you compose pipelines from a rich library of inputs, processors, and outputs with strong metrics, logging, and per-pipeline scaling. To try it, see the [quickstart](../../develop/connect/connect-quickstart/). [Kafka Connect](../../develop/managed-connectors/) is disabled by default on all new clusters. To unlock this feature for your BYOC or Dedicated cluster, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). When enabled, a Kafka Connect node runs even if no connectors are configured. | | Data transforms | Redpanda Connect | | --- | --- | --- | | Best for | Simple, stateless, per-record normalization inside Redpanda | Enrichment/lookup with external services; multi-stage flows | | External I/O | Not permitted (sandboxed) | Native (HTTP/database/object storage) | | Topology | 1:1 or 1:N (no cross-topic fan-in) | Fan-in and fan-out; multi-step pipelines | | Ordering | Preserves per-partition order | Per-partition order can be preserved; configure parallelism and batching accordingly | | Scale & isolation | Shares broker CPU/memory; best for lightweight operations | Scales independently; isolates heavy work from brokers | | Failure handling | You code routing/error behavior | Built-in retries/backoff and DLQ patterns | > 💡 **TIP** > > - Use data transforms for simple, in-broker, per-record changes with minimal latency. > > - Use Redpanda Connect if your pipeline must talk to external systems (HTTP services, databases, cloud storage), or when you need advanced flow control, such as batching and windowed processing. ### [](#redpanda-connect-vs-data-transforms)Redpanda Connect vs data transforms [Data transforms](../../develop/data-transforms/how-transforms-work/) (Wasm) provide lightweight, per-record changes between Redpanda topics with minimal latency. Transforms run inside the broker, map one input topic to one or more output topics, and are intentionally sandboxed (no external network or disk access). They’re ideal for validation, redaction, format/schema conversion, and simple routing. ## [](#redpanda-cloud-vs-self-managed-feature-compatibility)Redpanda Cloud vs Self-Managed feature compatibility Because Redpanda Cloud is a fully-managed service that provides maintenance, data and partition balancing, upgrades, and recovery, much of the cluster maintenance required for Self-Managed users is not necessary for Redpanda Cloud users. Also, Redpanda Cloud is opinionated about Kafka configurations. For example, automatic topic creation is disabled. Some systems expect the Kafka service to automatically create topics when a message is produced to a topic that doesn’t exist. (You can enable this for BYOC and Dedicated clusters with the `auto_create_topics_enabled` cluster property.) New clusters in Redpanda Cloud generally include functionality added in Self-Managed versions immediately. Existing clusters include new functionality when they get upgraded to the latest version. Redpanda Cloud deployments do not support the following functionality available in Redpanda Self-Managed deployments: - Kafka API OIDC authentication. However, Redpanda Cloud does support [SSO to the Redpanda Cloud UI](../../security/cloud-authentication/#single-sign-on). - Admin API. - FIPS-compliance mode. - Kerberos authentication. - Redpanda debug bundles. - Redpanda Console topic documentation. - Manual deserialization of Schema Registry - Configuring access to object storage with customer-managed encryption key. - Kubernetes Helm chart and Redpanda Operator functionality. - The following `rpk` commands: - `rpk cluster health` - `rpk cluster license` - `rpk cluster maintenance` - `rpk cluster partitions` - `rpk cluster self-test` - `rpk cluster storage restore` (But `rpk cluster storage` and subcommands for mountable topics are supported in BYOC and Dedicated clusters) - `rpk connect` - `rpk container` - `rpk debug` - `rpk generate app` (This is supported in Serverless clusters only.) - `rpk iotune` - `rpk redpanda` - `rpk topic describe-storage` (All other `rpk topic` commands are supported on both Redpanda Cloud and Self Managed.) > 📝 **NOTE** > > The `rpk cloud` commands are not supported in Self-Managed deployments. ## [](#features-in-limited-availability)Features in limited availability Features in limited availability are production-ready and are covered by Redpanda Support for early adopters. The following features are currently in limited availability in Redpanda Cloud: - [Redpanda ADP](../../ai-agents/adp-overview/) including AI agents, AI Gateway, and transcripts - Dedicated for Azure ## [](#features-in-beta)Features in beta Features in beta are available for testing and feedback. They are not covered by Redpanda Support and should not be used in production environments. The following features are currently in beta in Redpanda Cloud: - BYOVNet for Azure - Secrets management for BYOVPC on GCP - Several Redpanda Connect components ## [](#suggested-videos)Suggested videos - [YouTube - What is Redpanda BYOC? (3 mins)](https://www.youtube.com/watch?v=gVlzsJAYT64&ab_channel=RedpandaData) ## [](#next-steps)Next steps - [Build AI agents with Redpanda ADP](../../ai-agents/) - [Learn about upgrades and maintenance](../../manage/maintenance/) - [Create a Serverless cluster](../cluster-types/serverless/) - [Create a BYOC cluster](../cluster-types/byoc/) --- # Page 388: Redpanda Cloud Deployment **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types.md --- # Redpanda Cloud Deployment --- title: Redpanda Cloud Deployment latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/index.adoc description: Learn about Redpanda Cloud deployments. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- - [Serverless](serverless/) Learn how to create a Serverless cluster and start streaming. - [BYOC](byoc/) Learn how to create a Bring Your Own Cloud (BYOC), Bring Your Own Virtual Private Cloud (BYOVPC), or Bring Your Own Virtual Network (BYOVNet) cluster. - [Dedicated](create-dedicated-cloud-cluster/) Learn how to create a Dedicated cluster and start streaming. --- # Page 389: BYOC **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc.md --- # BYOC --- title: BYOC latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/index.adoc description: Learn how to create a Bring Your Own Cloud (BYOC), Bring Your Own Virtual Private Cloud (BYOVPC), or Bring Your Own Virtual Network (BYOVNet) cluster. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-16" --- Bring Your Own Cloud (BYOC) lets you run Redpanda in your own cloud environment while using managed services provided by Redpanda. With BYOC clusters, Redpanda deploys into your existing cloud network: - AWS and GCP: Virtual Private Cloud (VPC) - Azure: Virtual Network (VNet) Your data never leaves your environment, giving you extra security and control. See [BYOC architecture](../../byoc-arch/) for details. Redpanda manages provisioning, monitoring, upgrades, and security policies, and it manages required resources in your VPC or VNet, including subnets (subnetworks in GCP), IAM roles, and object storage resources (for example, S3 buckets or Azure Storage accounts). You get hands-off operations with a 99.99% uptime guarantee while keeping full control of your data. If you want to manage the networking infrastructure yourself, create a Bring Your Own Virtual Private Cloud (BYOVPC) or Bring Your Own Virtual Network (BYOVNet) cluster. With BYOVPC/BYOVNet, the Redpanda agent does not create or change resources in your account. This is ideal for organizations with stringent compliance requirements or existing network configurations, when you need full control over the network lifecycle. Compared to standard BYOC, BYOVPC/BYOVNet provides more security, but the configuration is more complex. See the [shared responsibility model](../../cloud-overview/#shared-responsibility-model) to understand what you manage versus what Redpanda manages. > ❗ **IMPORTANT** > > Don’t deploy other workloads on the BYOC infrastructure that Redpanda manages. - [BYOC: AWS](aws/) Learn how to create a BYOC or BYOVPC cluster on AWS. - [BYOC: Azure](azure/) Learn how to create a BYOC or BYOVNet cluster on Azure. - [BYOC: GCP](gcp/) Learn how to create a BYOC or BYOVPC cluster on GCP. - [Create Remote Read Replicas](remote-read-replicas/) Learn how to create a remote read replica topic with BYOC, which is a read-only topic that mirrors a topic on a different cluster. --- # Page 390: BYOC: AWS **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/aws.md --- # BYOC: AWS --- title: "BYOC: AWS" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/aws/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/aws/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/aws/index.adoc description: Learn how to create a BYOC or BYOVPC cluster on AWS. page-git-created-date: "2024-10-24" page-git-modified-date: "2025-05-07" --- - [Create a BYOC Cluster on AWS](create-byoc-cluster-aws/) Use the Redpanda Cloud UI to create a BYOC cluster on AWS. - [Create a BYOVPC Cluster on AWS](vpc-byo-aws/) Use the Redpanda BYOVPC Terraform module to deploy a BYOVPC cluster on AWS. --- # Page 391: Create a BYOC Cluster on AWS **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/aws/create-byoc-cluster-aws.md --- # Create a BYOC Cluster on AWS --- title: Create a BYOC Cluster on AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/aws/create-byoc-cluster-aws page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/aws/create-byoc-cluster-aws.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/aws/create-byoc-cluster-aws.adoc description: Use the Redpanda Cloud UI to create a BYOC cluster on AWS. page-git-created-date: "2024-10-24" page-git-modified-date: "2026-02-02" --- To create a Redpanda cluster in your virtual private cloud (VPC), follow the instructions in the Redpanda Cloud UI. The UI contains the parameters necessary to successfully run `rpk cloud byoc apply`. See also: [BYOC architecture](../../../../byoc-arch/). > 📝 **NOTE** > > With standard BYOC clusters, Redpanda manages security policies and resources for your VPC, including subnetworks, service accounts, IAM roles, firewall rules, and storage buckets. For the highest level of security, you can manage these resources yourself with a [BYOVPC cluster on AWS](../vpc-byo-aws/). ## [](#prerequisites)Prerequisites Before you deploy a BYOC cluster on AWS, check that the user creating the cluster has the following prerequisites: - A minimum version of Redpanda `rpk` v24.1. See [Install or Update rpk](../../../../../manage/rpk/rpk-install/). - The user authenticating to AWS has `AWSAdministratorAccess` access to create the IAM policies specified in [AWS IAM policies](../../../../../security/authorization/cloud-iam-policies/). - The user has the AWS variables necessary to authenticate. Use either: - `AWS_PROFILE` or - `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` To verify access, you should be able to successfully run `aws sts get-caller-identity` for your region. For more information, see the [AWS CLI reference](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sts/get-caller-identity.html). ## [](#create-a-byoc-cluster)Create a BYOC cluster 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. On the Clusters page, click **Create cluster**, then click **Create** for BYOC. 3. Enter a cluster name, then select the resource group, provider (AWS), [region, tier](../../../../../reference/tiers/byoc-tiers/), availability, and Redpanda version. > 📝 **NOTE** > > - If you plan to create a private network in your own VPC, select the region where your VPC is located. > > - Three availability zones provide two backups in case one availability zone goes down. Optionally, click **Advanced settings** to specify up to five key-value custom tags. After the cluster is created, the tags are applied to all AWS resources associated with this cluster. For more information, see the [AWS documentation](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html). After the cluster is created, you can [specify more tags with the Cloud API](#manage-custom-tags). 4. Click **Next**. 5. On the Network page, select the connection type: either public or private. For BYOC clusters, private is best-practice. - Your network name is used to identify this network. - For a [CIDR range](../../../../../networking/cidr-ranges/), choose one that does not overlap with your existing VPCs or your Redpanda network. - Clusters with private networking include a setting for API Gateway network access. Public access exposes endpoints for Redpanda Console, the Data Plane API, and the MCP Server API, but they remain protected by your authentication and authorization controls. Private access restricts endpoint access to your VPC only. > 📝 **NOTE** > > After the cluster is created, you can change the API Gateway access on the cluster settings page. If you change from public to private access, users without VPN access to the Redpanda VPC will lose access to these services. 6. Click **Next**. 7. On the Deploy page, follow the steps to log in to Redpanda Cloud and deploy the agent. As part of agent deployment: - Redpanda assigns the permission required to run the agent. For details about these permissions, see [AWS IAM policies](../../../../../security/authorization/cloud-iam-policies/). - Redpanda allocates one Elastic IP (EIP) address in AWS for each BYOC cluster. > 📝 **NOTE** > > Redpanda Cloud does not support customer access or modifications to any of the internal data plane resources. This restriction allows Redpanda Data to manage all configuration changes internally to ensure a 99.99% service level agreement (SLA) for BYOC clusters. ## [](#manage-custom-tags)Manage custom tags Your organization might require custom tags for cost allocation, audit compliance, or governance policies. After cluster creation, you can manage tags with the [Cloud Control Plane API](../../../../../manage/api/cloud-byoc-controlplane-api/). The Control Plane API allows up to 16 custom tags in AWS. Make sure you have: - The cluster ID. You can find this in the Redpanda Cloud UI, in the **Details** section of the cluster overview. - A valid bearer token for the Cloud Control Plane API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). 1. To refresh agent permissions so the Redpanda agent can update tags, run: ```bash export CLUSTER_ID="" rpk cloud byoc aws apply --redpanda-id="$CLUSTER_ID" ``` This step is required because tag management requires additional IAM permissions that may not have been granted during initial cluster creation: - `ec2:DescribeTags` - `ec2:DescribeVolumes` - `ec2:DescribeNetworkInterfaces` - `ec2:CreateTags` - `ec2:DeleteTags` - `iam:TagPolicy` - `iam:UntagPolicy` - `iam:TagInstanceProfile` - `iam:UntagInstanceProfile` 2. To update tags, invoke the Cloud API. First, set your authentication token: ```bash export AUTH_TOKEN="" ``` The `PATCH` call sets the tags specified under `"cloud_provider_tags"`. It replaces the existing tags with the specified tags. Include all desired tags in the request. To remove a single entry, omit it from the map you send. ```bash cluster_patch_body=$(cat <<'JSON' { "cloud_provider_tags": { "Environment": "production", "CostCenter": "engineering" } } JSON ) curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` To remove all tags, send an empty `cloud_provider_tags` object: ```bash cluster_patch_body='{"cloud_provider_tags": {}}' curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` ## [](#next-steps)Next steps [Configure private networking](../../../../../networking/byoc/aws/) --- # Page 392: Create a BYOVPC Cluster on AWS **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/aws/vpc-byo-aws.md --- # Create a BYOVPC Cluster on AWS --- title: Create a BYOVPC Cluster on AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/aws/vpc-byo-aws page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/aws/vpc-byo-aws.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/aws/vpc-byo-aws.adoc description: Use the Redpanda BYOVPC Terraform module to deploy a BYOVPC cluster on AWS. page-topic-type: how-to personas: platform_admin learning-objective-1: Deploy a BYOVPC cluster on AWS using the Redpanda Terraform module learning-objective-2: Configure the Redpanda network and cluster resources using module outputs learning-objective-3: Enable PrivateLink on a BYOVPC cluster page-git-created-date: "2024-12-02" page-git-modified-date: "2026-03-09" --- > ❗ **IMPORTANT** > > BYOVPC/BYOVNet is an add-on feature that requires Premium support. To unlock this feature for your account, contact your Redpanda account team or [Redpanda Sales](https://www.redpanda.com/price-estimator). A Bring Your Own Virtual Private Cloud (BYOVPC) cluster allows you to deploy the Redpanda [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) into your existing VPC and manage the networking lifecycle yourself. Compared to a standard Bring Your Own Cloud (BYOC) setup, where Redpanda manages the networking lifecycle for you, BYOVPC provides more control. For background on the architecture, see [BYOC architecture](../../../../byoc-arch/). When you create a BYOVPC cluster, you specify your VPC and the IAM role (instance profile) that the Redpanda agent will assume. The Redpanda Cloud agent doesn’t create any new resources or alter any settings in your account. With BYOVPC: - You provide your own VPC in your AWS account. - You maintain more control over your account, because Redpanda requires fewer permissions than standard BYOC clusters. - You control your security resources and policies, including subnets, service accounts, IAM roles, firewall rules, and storage buckets. The [Redpanda BYOVPC Terraform Module](https://registry.terraform.io/modules/redpanda-data/redpanda-byovpc/aws/latest) contains [Terraform](https://developer.hashicorp.com/terraform) code that deploys the resources required for a BYOVPC cluster on AWS. You need to create these resources in advance and provide them to Redpanda during cluster creation. Variables are provided in the code so you can exclude resources that already exist in your environment, such as the VPC. > 📝 **NOTE** > > Secrets management is enabled by default with the Terraform module. It allows you to store and read secrets in your cluster, for example to integrate a REST catalog with Iceberg-enabled topics. > > For existing BYOVPC clusters, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new) to enable secrets management. ## [](#prerequisites)Prerequisites - Access to an AWS account in which you create your cluster. - Minimum permissions in that AWS account. For the actions required by the user who will create the cluster with `terraform apply`, see [`iam_rpk_user.tf`](https://github.com/redpanda-data/terraform-aws-redpanda-byovpc/blob/main/iam_rpk_user.tf). - Each BYOVPC cluster requires one allocated Elastic IP (EIP) address in AWS. - [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) version 1.8.5 or later. - The [Redpanda Terraform provider](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs) configured with valid credentials. For setup details, see the provider documentation. ## [](#limitations)Limitations - Existing clusters cannot be converted to BYOVPC clusters. - After creating a BYOVPC cluster, you cannot change to a different VPC. - Only primary CIDR ranges are supported for the VPC. > 📝 **NOTE** > > For simplicity, the instructions are based on the assumption that Terraform is configured to use local state. You may want to configure [remote state](https://developer.hashicorp.com/terraform/language/state/remote). ## [](#configure-the-redpanda-byovpc-terraform-module)Configure the Redpanda BYOVPC Terraform module The following example uses the Redpanda BYOVPC Terraform Module to create the resources required to create a BYOVPC cluster. > 📝 **NOTE** > > Redpanda recommends using a VPC in AWS with a CIDR block (10.0.0.0/16) to allow for enough address space. The subnets must be set to /24. ```hcl locals { common_prefix = "abc-stg" region = "us-east-2" zones = ["use2-az1", "use2-az2", "use2-az3"] enable_private_link = false # see example below for enabling private link force_destroy_cloud_storage = false # see example below if using pre-existing VPC and subnets, # otherwise when provided with these cidrs the module will # attempt to create the VPC and subnets vpc_cidr_block = "10.0.0.0/16" public_subnet_cidrs = [ "10.0.1.0/24", "10.0.3.0/24", "10.0.5.0/24", "10.0.7.0/24", "10.0.9.0/24", "10.0.11.0/24" ] private_subnet_cidrs = [ "10.0.0.0/24", "10.0.2.0/24", "10.0.4.0/24", "10.0.6.0/24", "10.0.8.0/24", "10.0.10.0/24" ] # condition_tags restrict the IAM permissions granted by the # module to only those resources with these tags, when using # condition_tags these tags must also be provided to the # redpanda_cluster so that all resources created are given # these tags condition_tags = { "redpanda-managed" : "true" } # default_tags are applied to all resources created by the # module or redpanda_cluster resource default_tags = { "env" : "staging" } # when using a brand new AWS account that has never hosted an # EKS cluster before the EKS node group service linked role # must be created, if it already exists this may be set to false create_eks_nodegroup_service_linked_role = true } module "redpanda_byovpc" { source = "redpanda-data/redpanda-byovpc/aws" common_prefix = local.common_prefix region = local.region zones = local.zones create_rpk_user = false enable_private_link = local.enable_private_link force_destroy_cloud_storage = local.force_destroy_cloud_storage enable_redpanda_connect = true vpc_cidr_block = local.vpc_cidr_block private_subnet_cidrs = local.private_subnet_cidrs public_subnet_cidrs = local.public_subnet_cidrs condition_tags = local.condition_tags default_tags = local.default_tags create_eks_nodegroup_service_linked_role = local.create_eks_nodegroup_service_linked_role } ``` > 📝 **NOTE** > > - To send telemetry back to the Redpanda control plane, the cluster needs outbound internet access. You can provide this through at least one public subnet, or through network peering or a transit gateway to another VPC that routes traffic through a public subnet. The example configuration includes multiple public subnets to allow for future scaling. > > - The example creates an Internet Gateway and an associated Route Table rule that routes traffic into the VPC, which allows the Redpanda control plane to access the cluster. To disable creation of the Internet Gateway, either remove the configuration and value for `create_internet_gateway` or set `"create_internet_gateway": false`. > > - When using a pre-existing VPC, at least one public subnet must already exist in that VPC. Setting `public_subnet_cidrs = []` only prevents the module from creating new ones. > 💡 **TIP** > > See the full list of zones and tiers available with each provider in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). ## [](#configure-the-redpanda-network-and-cluster)Configure the Redpanda network and cluster After provisioning the AWS infrastructure, configure the Redpanda network and cluster resources using the module outputs. ```hcl locals { resource_group_name = "staging" throughput_tier = "tier-1-aws-v3-arm" } data "redpanda_resource_group" "staging" { name = local.resource_group_name } resource "redpanda_network" "network" { name = "${local.common_prefix}-network" resource_group_id = data.redpanda_resource_group.staging.id cloud_provider = "aws" region = local.region cluster_type = "byoc" customer_managed_resources = { aws = { management_bucket = { arn = module.redpanda_byovpc.management_bucket_arn } dynamodb_table = { arn = module.redpanda_byovpc.dynamodb_table_arn } vpc = { arn = module.redpanda_byovpc.vpc_arn } private_subnets = { arns = module.redpanda_byovpc.private_subnet_arns } } } depends_on = [module.redpanda_byovpc] } resource "redpanda_cluster" "cluster" { name = "${local.common_prefix}-cluster" resource_group_id = data.redpanda_resource_group.staging.id cloud_provider = "aws" region = redpanda_network.network.region zones = local.zones network_id = redpanda_network.network.id cluster_type = "byoc" connection_type = "private" throughput_tier = local.throughput_tier allow_deletion = false tags = merge(local.condition_tags, local.default_tags) customer_managed_resources = { aws = { agent_instance_profile = { arn = module.redpanda_byovpc.agent_instance_profile_arn } cloud_storage_bucket = { arn = module.redpanda_byovpc.cloud_storage_bucket_arn } cluster_security_group = { arn = module.redpanda_byovpc.cluster_security_group_arn } connectors_node_group_instance_profile = { arn = module.redpanda_byovpc.connectors_node_group_instance_profile_arn } connectors_security_group = { arn = module.redpanda_byovpc.connectors_security_group_arn } k8s_cluster_role = { arn = module.redpanda_byovpc.k8s_cluster_role_arn } node_security_group = { arn = module.redpanda_byovpc.node_security_group_arn } permissions_boundary_policy = { arn = module.redpanda_byovpc.permissions_boundary_policy_arn } redpanda_agent_security_group = { arn = module.redpanda_byovpc.redpanda_agent_security_group_arn } redpanda_node_group_instance_profile = { arn = module.redpanda_byovpc.redpanda_node_group_instance_profile_arn } redpanda_node_group_security_group = { arn = module.redpanda_byovpc.redpanda_node_group_security_group_arn } utility_node_group_instance_profile = { arn = module.redpanda_byovpc.utility_node_group_instance_profile_arn } utility_security_group = { arn = module.redpanda_byovpc.utility_security_group_arn } redpanda_connect_node_group_instance_profile = { arn = module.redpanda_byovpc.redpanda_connect_node_group_instance_profile_arn } redpanda_connect_security_group = { arn = module.redpanda_byovpc.redpanda_connect_security_group_arn } } } depends_on = [redpanda_network.network] } ``` ## [](#apply-the-terraform-configuration)Apply the Terraform configuration Initialize, plan, and apply Terraform to set up the AWS infrastructure: ```bash terraform init && terraform plan && terraform apply ``` Cluster provisioning can take up to 45 minutes. When provisioning completes, the cluster status updates to `Running`. If the cluster stays in `Creating` status, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). ### [](#validation-checks)Validation checks The `redpanda_cluster` resource performs validation checks before proceeding with provisioning: - RPK user: Checks if the user running the command has sufficient privileges to provision the agent. Any missing permissions are displayed in the output. - IAM instance profile: Checks that `agent_instance_profile`, `connectors_node_group_instance_profile`, `redpanda_node_group_instance_profile`, `redpanda_connect_node_group_instance_profile`, `utility_node_group_instance_profile`, and `k8s_cluster_role` have the minimum required permissions. Any missing permissions are displayed in the output. - Storage: Checks that the `management_bucket` exists and is versioned, checks that the `cloud_storage_bucket` exists and is not versioned, and checks that the `dynamodb_table` exists. - Network: Checks that the VPC exists, checks that the subnets exist and have the expected tags, and checks that the security groups exist and have the desired ingress and egress rules. ## [](#delete-the-cluster)Delete the cluster To delete the cluster and all associated resources, run `terraform destroy`. > ⚠️ **WARNING** > > This also deletes the customer-managed resources created by the module. ```bash terraform destroy ``` ## [](#enable-privatelink)Enable PrivateLink PrivateLink can be enabled during cluster creation or on an already existing cluster. Start by enabling PrivateLink in the Redpanda BYOVPC Terraform module. This adds the permissions required for PrivateLink. ```hcl module "redpanda_byovpc" { # ... enable_private_link = true # ... } ``` Enable PrivateLink on the `redpanda_cluster` resource: ```hcl resource "redpanda_cluster" "cluster" { # ... aws_private_link = { allowed_principals = ["arn:aws:iam::${var.aws_account_id}:root"] enabled = true connect_console = false } # ... } ``` ## [](#deploy-with-pre-existing-vpc-and-subnets)Deploy with pre-existing VPC and subnets If you already have a VPC and subnets in your AWS account, provide their IDs to the module instead of CIDR blocks. ```hcl module "redpanda_byovpc" { # ... # vpc_cidr_block = local.vpc_cidr_block # private_subnet_cidrs = local.private_subnet_cidrs # public_subnet_cidrs = local.public_subnet_cidrs vpc_id = "vpc-0c79b236047faa1ab" private_subnet_ids = [ "subnet-0e58df59b5eb037c3", "subnet-0c74559ab372f5123", "subnet-0525df35c467cad1c", "subnet-09c301e004e96c803", "subnet-0f67e76738572cb8e", "subnet-0cca6892cf789f6ec", ] public_subnet_cidrs = [] # when empty the module will not create any public subnets # ... } ``` ## [](#next-steps)Next steps - [Configure AWS PrivateLink](../../../../../networking/aws-privatelink/) - [Review AWS IAM policies](../../../../../security/authorization/cloud-iam-policies/) - [Learn about `rpk` commands](../../../../../reference/rpk/) --- # Page 393: BYOC: Azure **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/azure.md --- # BYOC: Azure --- title: "BYOC: Azure" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/azure/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/azure/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/azure/index.adoc description: Learn how to create a BYOC or BYOVNet cluster on Azure. page-git-created-date: "2024-10-24" page-git-modified-date: "2025-07-30" --- - [Create a BYOC Cluster on Azure](create-byoc-cluster-azure/) Use the Redpanda Cloud UI to create a BYOC cluster on Azure. - [Create a BYOVNet Cluster on Azure](vnet-azure/) Use Terraform to deploy a BYOVNet cluster on Azure. --- # Page 394: Create a BYOC Cluster on Azure **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/azure/create-byoc-cluster-azure.md --- # Create a BYOC Cluster on Azure --- title: Create a BYOC Cluster on Azure latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/azure/create-byoc-cluster-azure page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/azure/create-byoc-cluster-azure.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/azure/create-byoc-cluster-azure.adoc description: Use the Redpanda Cloud UI to create a BYOC cluster on Azure. page-git-created-date: "2024-10-24" page-git-modified-date: "2026-02-02" --- To create a Redpanda cluster in your virtual network (VNet), follow the instructions in the Redpanda Cloud UI. The UI contains the parameters necessary to successfully run `rpk cloud byoc apply`. See also: [BYOC architecture](../../../../byoc-arch/). > 📝 **NOTE** > > With standard BYOC clusters, Redpanda manages security policies and resources for your virtual network (VNet), including subnetworks, managed identities, IAM roles, security groups, and storage accounts. For the most security, you can manage these resources yourself with a [BYOVNet cluster on Azure](../vnet-azure/). ## [](#prerequisites)Prerequisites Before you deploy a BYOC cluster on Azure, check all prerequisites to ensure that your Azure subscription meets requirements. ### [](#configure-azure-cli)Configure Azure CLI - [Install the Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). - [Sign in](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli) with the Azure CLI: ```none az login ``` - Set the desired subscription for the Azure CLI: ```none az account set --subscription ``` ### [](#verify-rpk-version)Verify rpk version Confirm you have a minimum version of Redpanda `rpk` v24.1. See [Install or Update rpk](../../../../../manage/rpk/rpk-install/). ### [](#prepare-your-azure-subscription)Prepare your Azure subscription In the [Azure Portal](https://login.microsoftonline.com/), confirm that the dedicated subscription you intend to use with Redpanda includes the following: - **Role**: The Azure user must have the _Owner_ role in the subscription. - **Resources**: The subscription must be registered for the following resource providers (AKS + common dependencies). See the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-providers-and-types). - Microsoft.Compute - Microsoft.ManagedIdentity - Microsoft.Storage - Microsoft.KeyVault - Microsoft.Network - Microsoft.ContainerService To check if a resource provider is registered, run the following command using the Azure CLI or in the Azure Cloud Shell: ```none az provider show -n Microsoft.Compute --query registrationState -o tsv az provider show -n Microsoft.ManagedIdentity --query registrationState -o tsv az provider show -n Microsoft.Storage --query registrationState -o tsv az provider show -n Microsoft.KeyVault --query registrationState -o tsv az provider show -n Microsoft.Network --query registrationState -o tsv az provider show -n Microsoft.ContainerService --query registrationState -o tsv ``` If a resource provider is not registered, run: ```none az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.ManagedIdentity az provider register --namespace Microsoft.Storage az provider register --namespace Microsoft.KeyVault az provider register --namespace Microsoft.Network az provider register --namespace Microsoft.ContainerService ``` - **Feature**: The subscription must be registered for Microsoft.Compute/EncryptionAtHost. See the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/disks-enable-host-based-encryption-cli#prerequisites). To register it, run: ```none az feature register --namespace Microsoft.Compute --name EncryptionAtHost # (optional) Wait and verify it shows as Registered az feature show --namespace Microsoft.Compute --name EncryptionAtHost --query properties.state -o tsv # Refresh the provider after enabling a feature az provider register --namespace Microsoft.Compute ``` - **Monitoring**: The subscription must have Azure Network Watcher enabled in the NetworkWatcherRG resource group and the region where you will use Redpanda. Network Watcher lets you monitor and diagnose conditions at a network level. See the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/network-watcher/network-watcher-create?tabs=portaly). To enable it, run: ```none # Create the NetworkWatcherRG resource group az group create --name 'NetworkWatcherRG' --location '' # Enable Network Watcher in az network watcher configure --resource-group 'NetworkWatcherRG' --locations '' --enabled ``` ### [](#check-azure-quota)Check Azure quota Confirm that the Azure subscription has enough virtual CPUs (vCPUs) per instance family and total regional vCPUs in the region where you will use Redpanda: - Standard Ddv5-series vCPUs: 12 (3 Redpanda broker nodes + extra capacity for 3 more nodes that could be utilized temporarily during tier 1 maintenance) - Standard Dadsv5-series vCPUs: 8 (2 Redpanda utility nodes) - Standard Dv3-series vCPUs: 2 (1 Redpanda agent node) See the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/quotas/view-quotas). ### [](#check-azure-sku-restrictions)Check Azure SKU restrictions Ensure your subscription has access to the required VM sizes in the region where you will use Redpanda. For example, using the Azure CLI or in the Azure Cloud Shell, run: ```bash # Replace eastus2 with your target region az vm list-skus -l eastus2 --zone --size Standard_D2d_v5 --output table ``` Example output (no restrictions: good) ```bash ResourceType Locations Name Zones Restrictions --------------- ----------- --------------- ------- ------------ virtualMachines eastus2 Standard_D2d_v5 1,2,3 None ``` Example output (with restrictions: needs attention) ```bash ResourceType Locations Name Zones Restrictions --------------- ----------- --------------- ------- ------------ virtualMachines eastus2 Standard_D2d_v5 1,2,3 NotAvailableForSubscription ``` If you see restrictions, [open a Microsoft support request](https://learn.microsoft.com/en-us/troubleshoot/azure/general/region-access-request-process) to remove them. ### [](#prerequisite-checklist)Prerequisite checklist - Verified `rpk` version - Verified Azure user has Owner role - Registered all required resource providers - Registered EncryptionAtHost feature - Enabled Network Watcher - Verified vCPU quota - Verified no SKU restrictions ## [](#create-a-byoc-cluster)Create a BYOC cluster To create a Redpanda cluster in your Azure VNet, follow the [prerequisites](#prerequisites) then follow the instructions in the Redpanda Cloud UI. The UI contains the parameters necessary to successfully run `rpk cloud byoc apply`. 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. On the Clusters page, click **Create cluster**, then click **Create** for BYOC. 3. Enter a cluster name, then select the resource group, provider (Azure), [region, tier](../../../../../reference/tiers/byoc-tiers/), availability, and Redpanda version. > 📝 **NOTE** > > - If you plan to create a private network in your own VNet, select the region where your VNet is located. > > - Multi-AZ is the default configuration. Three AZs provide two backups in case one availability zone goes down. Optionally, click **Advanced settings** to specify up to five key-value custom tags. After the cluster is created, the tags are applied to all Azure resources associated with this cluster. For details, see the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/tag-resources). After the cluster is created, you can [specify more tags with the Cloud API](#manage-custom-tags). 4. Click **Next**. 5. On the Network page, select the connection type: either public or private. For BYOC clusters, private using Azure Private Link is best-practice. - Your network name is used to identify this network. - For a [CIDR range](../../../../../networking/cidr-ranges/), choose one that does not overlap with your existing VPCs or your Redpanda network. - Clusters with private networking include a setting for API Gateway network access. Public access exposes endpoints for Redpanda Console, the Data Plane API, and the MCP Server API, but they remain protected by your authentication and authorization controls. Private access restricts endpoint access to your VNet only. Private access incurs an additional cost, since it involves deploying two network load balancers, instead of one. > 📝 **NOTE** > > After the cluster is created, you can change the API Gateway access on the cluster settings page. If you change from public to private access, users without VPN access to the Redpanda VPC will lose access to these services. 6. Click **Next**. 7. On the Deploy page, follow the steps to log in to Redpanda Cloud and deploy the agent. As part of agent deployment, Redpanda assigns the permissions required to run the agent. For details about these permissions, see [Azure IAM policies](../../../../../security/authorization/cloud-iam-policies-azure/). ## [](#manage-custom-tags)Manage custom tags Your organization might require custom tags for cost allocation, audit compliance, or governance policies. After cluster creation, you can manage tags with the [Cloud Control Plane API](../../../../../manage/api/cloud-byoc-controlplane-api/). The Control Plane API allows up to 16 custom tags in Azure. Make sure you have: - The cluster ID. You can find this in the Redpanda Cloud UI, in the **Details** section of the cluster overview. - A valid bearer token for the Cloud Control Plane API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). 1. To refresh Redpanda agent permissions in the target subscription, run: ```bash export CLUSTER_ID="" export SUBSCRIPTION_ID="" rpk cloud byoc azure apply --redpanda-id="$CLUSTER_ID" --subscription-id="$SUBSCRIPTION_ID" ``` 2. To update tags, invoke the Cloud API. First, set your authentication token: ```bash export AUTH_TOKEN="" ``` The `PATCH` call sets the tags specified under `"cloud_provider_tags"`. It replaces the existing tags with the specified tags. Include all desired tags in the request. To remove a single entry, omit it from the map you send. ```bash cluster_patch_body=$(cat <<'JSON' { "cloud_provider_tags": { "Environment": "production", "CostCenter": "engineering" } } JSON ) curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` To remove all tags, send an empty `cloud_provider_tags` object: ```bash cluster_patch_body='{"cloud_provider_tags": {}}' curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` ### [](#limitations)Limitations - Nodepool Application Security Groups (ASG): Custom tags are set only when the cluster is created. Tags cannot be updated on these resources after cluster creation. - Private Link network interfaces (Kubernetes API server, Tiered Storage, and Private Link service): Custom tags are set only during cluster creation and cannot be changed later. --- # Page 395: Create a BYOVNet Cluster on Azure **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/azure/vnet-azure.md --- # Create a BYOVNet Cluster on Azure --- title: Create a BYOVNet Cluster on Azure page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/azure/vnet-azure page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/azure/vnet-azure.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/azure/vnet-azure.adoc description: Use Terraform to deploy a BYOVNet cluster on Azure. # Beta release status page-beta: "true" page-topic-type: how-to personas: platform_admin learning-objective-1: Deploy a BYOVNet cluster on Azure using Terraform learning-objective-2: Configure the Redpanda network and cluster resources using the Cloud API learning-objective-3: Manage the lifecycle of a BYOVNet cluster, including creation and deletion page-git-created-date: "2024-11-15" page-git-modified-date: "2026-03-09" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta > ❗ **IMPORTANT** > > BYOVPC/BYOVNet is an add-on feature that requires Premium support. To unlock this feature for your account, contact your Redpanda account team or [Redpanda Sales](https://www.redpanda.com/price-estimator). A Bring Your Own Virtual Network (BYOVNet) cluster allows you to deploy the Redpanda [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) into your existing VNet and manage the networking lifecycle. Compared to a standard Bring Your Own Cloud (BYOC) setup, where Redpanda manages the networking lifecycle for you, BYOVNet provides more control. For background on the architecture, see [BYOC architecture](../../../../byoc-arch/). When you create a BYOVNet cluster, you specify your VNet and managed identities. The Redpanda Cloud agent doesn’t create any new resources or alter any settings in your account. With a customer-managed VNet: - You provide your own VNet in your Azure account. - You maintain more control over your account, because Redpanda requires fewer permissions than standard BYOC clusters. - You control your security resources and policies, including subnets, user-assigned identities, IAM roles and assignments, security groups, storage accounts, and key vaults. The [Redpanda Cloud Examples repository](https://github.com/redpanda-data/cloud-examples/tree/main/customer-managed/azure/README.md) contains [Terraform](https://developer.hashicorp.com/terraform) code that deploys the resources required for a BYOVNet cluster on Azure. You need to create these resources in advance and provide them to Redpanda during cluster creation. Variables are provided in the code so you can exclude resources that already exist in your environment, such as the VNet. See the code for the complete list of resources required to create and deploy a Redpanda cluster. Customer-managed resources can be broken down into the following groups: - Resource group resources - User-assigned identities - IAM roles and assignments - Network - Storage - Key vaults ## [](#prerequisites)Prerequisites - Access to an Azure subscription where you want to create your cluster - Knowledge of your internal VNet and subnet configuration - Permission to call the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) - Permission to create, modify, and delete the resources described by Terraform - [Terraform](https://developer.hashicorp.com/terraform/install) version 1.8.5 or later - [jq](https://jqlang.org/download/), which is used to parse JSON values from API responses ## [](#limitations)Limitations - Existing clusters cannot be moved to a BYOVNet cluster. - After creating a BYOVNet cluster, you cannot change to a different VNet. - Only primary CIDR ranges are supported for the VNet. ## [](#set-environment-variables)Set environment variables Set environment variables for the resource group, VNet name, and Azure region. For example: ```bash export AZURE_RESOURCE_GROUP_NAME=sample-redpanda-rg export AZURE_VNET_NAME="sample-vnet" export AZURE_REGION=centralus ``` ## [](#create-azure-resource-group-and-vnet)Create Azure resource group and VNet 1. Create a resource group to contain all resources, and then create a VNet with your address and subnet prefixes. The following example uses the environment variables to create the `sample-redpanda-rg` resource group and the `sample-vnet` virtual network with an address space of `10.0.0.0/16`. ```bash az group create --name ${AZURE_RESOURCE_GROUP_NAME} --location ${AZURE_REGION} az network vnet create \ --name ${AZURE_VNET_NAME} \ --resource-group ${AZURE_RESOURCE_GROUP_NAME} \ --location ${AZURE_REGION} \ --address-prefix 10.0.0.0/16 ``` 2. Set additional environment variables for Azure resources. For example: ```bash export AZURE_SUBSCRIPTION_ID= export AZURE_TENANT_ID= export AZURE_ZONES='["centralus-az1", "centralus-az2", "centralus-az3"]' export AZURE_RESOURCE_PREFIX=sample- export REDPANDA_CLUSTER_NAME= export REDPANDA_RG_ID= export REDPANDA_THROUGHPUT_TIER=tier-1-azure-v3-x86 export REDPANDA_VERSION=25.2 export REDPANDA_MANAGEMENT_STORAGE_ACCOUNT_NAME=rpmgmtsa export REDPANDA_MANAGEMENT_STORAGE_CONTAINER_NAME=rpmgmtsc export REDPANDA_0_PODS_SUBNET_NAME=snet-rp-0-pods export REDPANDA_0_VNET_SUBNET_NAME=snet-rp-0-vnet export REDPANDA_1_PODS_SUBNET_NAME=snet-rp-1-pods export REDPANDA_1_VNET_SUBNET_NAME=snet-rp-1-vnet export REDPANDA_2_PODS_SUBNET_NAME=snet-rp-2-pods export REDPANDA_2_VNET_SUBNET_NAME=snet-rp-2-vnet export REDPANDA_CONNECT_PODS_SUBNET_NAME=snet-connect-pods export REDPANDA_CONNECT_VNET_SUBNET_NAME=snet-connect-vnet export KAFKA_CONNECT_PODS_SUBNET_NAME=snet-kafka-connect-pods export KAFKA_CONNECT_VNET_SUBNET_NAME=snet-kafka-connect-vnet export SYSTEM_PODS_SUBNET_NAME=snet-system-pods export SYSTEM_VNET_SUBNET_NAME=snet-system-vnet export REDPANDA_AGENT_SUBNET_NAME=snet-agent-private export REDPANDA_EGRESS_SUBNET_NAME=snet-agent-public export REDPANDA_MANAGEMENT_KEY_VAULT_NAME=redpanda-vault export REDPANDA_CONSOLE_KEY_VAULT_NAME=rp-console-vault export REDPANDA_AKS_SUBNET_CIDR="10.0.15.0/24" export REDPANDA_IAM_RESOURCE_GROUP_NAME=sample-redpanda-rg export REDPANDA_NETWORK_RESOURCE_GROUP_NAME=sample-redpanda-rg export REDPANDA_RESOURCE_GROUP_NAME=sample-redpanda-rg export REDPANDA_STORAGE_RESOURCE_GROUP_NAME=sample-redpanda-rg export REDPANDA_SECURITY_GROUP_NAME=redpanda-nsg export REDPANDA_TIERED_STORAGE_ACCOUNT_NAME=tieredsa export REDPANDA_TIERED_STORAGE_CONTAINER_NAME=tieredsc export REDPANDA_AGENT_USER_ASSIGNED_IDENTITY_NAME=agent-uai export REDPANDA_AKS_USER_ASSIGNED_IDENTITY_NAME=aks-uai export REDPANDA_CERT_MANAGER_USER_ASSIGNED_IDENTITY_NAME=cert-manager-uai export REDPANDA_EXTERNAL_DNS_USER_ASSIGNED_IDENTITY_NAME=external-dns-uai export REDPANDA_CLUSTER_USER_ASSIGNED_IDENTITY_NAME=cluster-uai export REDPANDA_CONSOLE_USER_ASSIGNED_IDENTITY_NAME=console-uai export KAFKA_CONNECT_USER_ASSIGNED_IDENTITY_NAME=kafka-connect-uai export REDPANDA_CONNECT_USER_ASSIGNED_IDENTITY_NAME=redpanda-connect-uai export REDPANDA_CONNECT_API_USER_ASSIGNED_IDENTITY_NAME=redpanda-connect-api-uai export REDPANDA_OPERATOR_USER_ASSIGNED_IDENTITY_NAME=redpanda-operator-uai ``` ## [](#configure-terraform)Configure Terraform > 📝 **NOTE** > > For simplicity, these instructions assume that Terraform is configured to use local state. You may want to configure [remote state](https://developer.hashicorp.com/terraform/language/state/remote). Create a JSON file called `byovnet.auto.tfvars.json` inside the Terraform directory to configure variables for your specific needs: Show script ```bash cat > byovnet.auto.tfvars.json < 💡 **TIP** > > To get the Redpanda authentication credentials, follow the [authentication guide](/api/doc/cloud-controlplane/topic/authentication). ## [](#create-the-network)Create the network To create the Redpanda network: 1. Define a JSON file called `redpanda-network.json` to configure the network for Redpanda with details about VNet, subnets, and storage. Show script ```bash cat > redpanda-network.json < redpanda-cluster.json < 💡 **TIP** > > See the full list of zones and tiers available with each provider in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). 2. Make a Cloud API call to create a Redpanda cluster and get the network ID from the response in JSON `.operation.metadata.network_id`. ```bash export REDPANDA_ID=$(curl -X POST "https://api.redpanda.com/v1/clusters" \ -H "accept: application/json"\ -H "content-type: application/json" \ -H "authorization: Bearer ${BEARER_TOKEN}" \ --data-binary @redpanda-cluster.json | jq -r '.operation.resource_id') ``` ## [](#create-the-cluster-resources)Create the cluster resources To create the initial cluster resources, first log in to Redpanda Cloud, then run `rpk cloud byoc azure apply`: ```bash rpk cloud login \ --save \ --client-id=${REDPANDA_CLIENT_ID} \ --client-secret=${REDPANDA_CLIENT_SECRET} \ --no-profile ``` ```bash rpk cloud byoc azure apply --redpanda-id="${REDPANDA_ID}" --subscription-id="${AZURE_SUBSCRIPTION_ID}" ``` The Redpanda Cloud agent now is running and handles the remaining steps. This can take up to 45 minutes. When provisioning completes, the cluster status updates to `Running`. If the cluster remains in `Creating` status after 45 minutes, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#check-the-cluster-status)Check the cluster status Cluster creation is an example of an operation that can take a longer period of time to complete. You can check the operation state with the Cloud API, or check the Redpanda Cloud UI for cluster status. Example using the returned `operation_id`: ```bash curl -X GET "https://api.redpanda.com/v1/operations/" \ -H "accept: application/json"\ -H "content-type: application/json" \ -H "authorization: Bearer ${BEARER_TOKEN}" ``` Example retrieving cluster: ```bash curl -X GET "https://api.redpanda.com/v1/clusters/" \ -H "accept: application/json"\ -H "content-type: application/json" \ -H "authorization: Bearer ${BEARER_TOKEN}" ``` ## [](#delete-the-cluster)Delete the cluster To delete the cluster, first send a DELETE request to the Cloud API, and retrieve the `resource_id` of the DELETE operation. Then run the `rpk` command to destroy the cluster identified by the `resource_id`. ```bash export REDPANDA_ID=$(curl -X DELETE "https://api.redpanda.com/v1/clusters/${REDPANDA_ID}" \ -H "accept: application/json"\ -H "content-type: application/json" \ -H "authorization: Bearer ${BEARER_TOKEN}" | jq -r '.operation.resource_id') ``` After that completes, run: ```bash rpk cloud byoc azure destroy --redpanda-id ${REDPANDA_ID} ``` > 📝 **NOTE** > > Redpanda Cloud does not support customer access or modifications to any of the internal data plane resources. This restriction allows Redpanda Data to manage all configuration changes internally to ensure a 99.99% service level agreement (SLA) for BYOC clusters. ## [](#manage-custom-tags)Manage custom tags Your organization might require custom tags for cost allocation, audit compliance, or governance policies. After cluster creation, you can manage tags with the [Cloud Control Plane API](../../../../../manage/api/cloud-byoc-controlplane-api/). The Control Plane API allows up to 16 custom tags in Azure. Make sure you have: - The cluster ID. You can find this in the Redpanda Cloud UI, in the **Details** section of the cluster overview. - A valid bearer token for the Cloud Control Plane API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). 1. To refresh Redpanda agent permissions in the target subscription, run: ```bash export CLUSTER_ID="" export SUBSCRIPTION_ID="" rpk cloud byoc azure apply --redpanda-id="$CLUSTER_ID" --subscription-id="$SUBSCRIPTION_ID" ``` 2. To update tags, invoke the Cloud API. First, set your authentication token: ```bash export AUTH_TOKEN="" ``` The `PATCH` call sets the tags specified under `"cloud_provider_tags"`. It replaces the existing tags with the specified tags. Include all desired tags in the request. To remove a single entry, omit it from the map you send. ```bash cluster_patch_body=$(cat <<'JSON' { "cloud_provider_tags": { "Environment": "production", "CostCenter": "engineering" } } JSON ) curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` To remove all tags, send an empty `cloud_provider_tags` object: ```bash cluster_patch_body='{"cloud_provider_tags": {}}' curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` ### [](#limitations-2)Limitations - Nodepool Application Security Groups (ASG): Custom tags are set only when the cluster is created. Tags cannot be updated on these resources after cluster creation. - Private Link network interfaces (Kubernetes API server, Tiered Storage, and Private Link service): Custom tags are set only during cluster creation and cannot be changed later. > 📝 **NOTE** > > For BYOVNet clusters, custom tags are not applied to the customer-managed resources that are deployed by the customer. ## [](#next-steps)Next steps - [Configure Azure Private Link](../../../../../networking/azure-private-link/) - [Review Azure IAM policies](../../../../../security/authorization/cloud-iam-policies-azure/) - [Learn about `rpk` commands](../../../../../reference/rpk/) --- # Page 396: BYOC: GCP **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/gcp.md --- # BYOC: GCP --- title: "BYOC: GCP" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/gcp/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/gcp/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/gcp/index.adoc description: Learn how to create a BYOC or BYOVPC cluster on GCP. page-git-created-date: "2024-10-24" page-git-modified-date: "2025-05-07" --- - [Create a BYOC Cluster on GCP](create-byoc-cluster-gcp/) Use the Redpanda Cloud UI to create a BYOC cluster on GCP. - [Create a BYOVPC Cluster on GCP](vpc-byo-gcp/) Connect Redpanda Cloud to your existing VPC for additional security. - [Enable Redpanda Connect on an Existing BYOVPC Cluster on GCP](enable-rpcn-byovpc-gcp/) Add Redpanda Connect to your existing BYOVPC cluster. - [Enable Secrets Management on an Existing BYOVPC Cluster on GCP](enable-secrets-byovpc-gcp/) Store and read secrets in your existing BYOVPC cluster. --- # Page 397: Create a BYOC Cluster on GCP **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/gcp/create-byoc-cluster-gcp.md --- # Create a BYOC Cluster on GCP --- title: Create a BYOC Cluster on GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/gcp/create-byoc-cluster-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/gcp/create-byoc-cluster-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/gcp/create-byoc-cluster-gcp.adoc description: Use the Redpanda Cloud UI to create a BYOC cluster on GCP. page-git-created-date: "2024-10-24" page-git-modified-date: "2026-04-08" --- To create a Redpanda cluster in your virtual private cloud (VPC), follow the instructions in the Redpanda Cloud UI. The UI contains the parameters necessary to successfully run `rpk cloud byoc apply`. See also: [BYOC architecture](../../../../byoc-arch/). > 📝 **NOTE** > > With standard BYOC clusters, Redpanda manages security policies and resources for your VPC, including subnetworks, service accounts, IAM roles, firewall rules, and storage buckets. For the highest level of security, you can manage these resources yourself with a [BYOVPC cluster on GCP](../vpc-byo-gcp/). If your clients need to connect from different GCP regions than where your cluster will be deployed, you must enable global access during cluster creation using the Cloud API. To create a BYOC cluster with global access enabled, see [Enable Global Access](../../../../../networking/byoc/gcp/enable-global-access/). ## [](#prerequisites)Prerequisites Before you deploy a BYOC cluster on GCP, verify the following prerequisites: - A minimum version of Redpanda `rpk` v24.1. See [Install or Update rpk](../../../../../manage/rpk/rpk-install/). - Assign the `roles/editor` role (or higher, such as `roles/owner`) to the GCP user or service account that runs the bootstrap on the target GCP project. This grants the permissions needed to create VPC networks, GKE clusters, service accounts, and other infrastructure during the initial bootstrap. These bootstrap permissions are separate from the [agent permissions](../../../../../security/authorization/cloud-iam-policies-gcp/) that Redpanda assigns after bootstrap. - The user has the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) installed and authenticated, with the target project selected. To verify, run: ```bash gcloud auth list gcloud config get-value project ``` ### [](#gcp-quotas)GCP quotas Ensure at least three nodes of headroom in the relevant GCP quotas in the same region as your cluster. During maintenance, Redpanda may temporarily create extra nodes. Quotas such as vCPUs per VM family (for example, N2D) and Local SSD total per VM family (quota key: `LOCAL_SSD_TOTAL_GB_PER_VM_FAMILY`) are listed for each tier on the **Create BYOC cluster** page in the Redpanda Cloud UI. Headroom formulas: - vCPU spare = `3 x (vCPUs per node)` - Local SSD spare (GB) = `3 x (Storage size per node in GB)` For example, with per-node storage **1500 GB** (4 × 375 GB Local SSD) and machine type **n2d-standard-4** (4 vCPUs), keep **4500 GB** Local SSD and **12 vCPUs** of spare quota. ## [](#create-a-byoc-cluster)Create a BYOC cluster 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. On the Clusters page, click **Create cluster**, then click **Create** for BYOC. Enter a cluster name, then select the resource group, provider (GCP), [region, tier](../../../../../reference/tiers/byoc-tiers/), availability, and Redpanda version. > 📝 **NOTE** > > - If you plan to create a private network in your own VPC, select the region where your VPC is located. > > - Three availability zones provide two backups in case one availability zone goes down. Optionally, click **Advanced settings** to specify up to five key-value custom GCP labels. If a label key starts with `gcp.network-tag.`, then the agent interprets it as a request to apply the `` [network tag](https://cloud.google.com/vpc/docs/add-remove-network-tags) to GCE instances in the cluster. Use labels for organization/metadata; use network tags to target firewall rules and routes. After the cluster is created, labels are applied to applicable GCP resources (for example, instances and disks), and network tags are applied to instances. For more information, see the [GCP documentation](https://cloud.google.com/compute/docs/labeling-resources). After the cluster is created, you can [specify more labels with the Cloud API](#manage-custom-resource-labels-and-network-tags). 3. Click **Next**. 4. On the Network page, select the connection type: either public or private. For BYOC clusters, private is best-practice. - Your network name is used to identify this network. - For a [CIDR range](../../../../../networking/cidr-ranges/), choose one that does not overlap with your existing VPCs or your Redpanda network. - Clusters with private networking include a setting for API Gateway network access. Public access exposes endpoints for Redpanda Console, the Data Plane API, but they remain protected by your authentication and authorization controls. Private access restricts endpoint access to your VPC only. > 📝 **NOTE** > > After the cluster is created, you can change the API Gateway access on the cluster settings page. If you change from public to private access, users without VPN access to the Redpanda VPC will lose access to these services. 5. Click **Next**. 6. On the Deploy page, follow the steps to log in to Redpanda Cloud and deploy the agent. As part of agent deployment, Redpanda assigns the permissions required to run the agent. For details about these permissions, see [GCP IAM permissions](../../../../../security/authorization/cloud-iam-policies-gcp/). > 📝 **NOTE** > > Redpanda Cloud does not support customer access or modifications to any of the internal data plane resources. This restriction allows Redpanda Data to manage all configuration changes internally to ensure a 99.99% service level agreement (SLA) for BYOC clusters. ## [](#manage-custom-resource-labels-and-network-tags)Manage custom resource labels and network tags Your organization might require custom resource labels and network tags for cost allocation, audit compliance, or governance policies. After cluster creation, you can manage this with the [Cloud Control Plane API](../../../../../manage/api/cloud-byoc-controlplane-api/). The Control Plane API allows up to 16 custom resource labels and network tags in GCP. Make sure you have: - The cluster ID. You can find this in the Redpanda Cloud UI, in the **Details** section of the cluster overview. - A valid bearer token for the Cloud Control Plane API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). 1. To refresh agent permissions so the Redpanda agent can update labels and network tags, run: ```bash export CLUSTER_ID="" export PROJECT_ID="" rpk cloud byoc gcp apply --redpanda-id="$CLUSTER_ID" --project-id="$PROJECT_ID" ``` This step is required because label/tag management requires additional IAM permissions that may not have been granted during initial cluster creation: - `compute.disks.get` - `compute.disks.list` - `compute.disks.setLabels` - `compute.instances.setLabels` 2. To update labels and network tags, invoke the Cloud API. First, set your authentication token: ```bash export AUTH_TOKEN="" ``` The `PATCH` call sets the labels and network tags specified under `"cloud_provider_tags"`. It replaces the existing labels and tags with the specified labels and tags. Include all desired labels and tags in the request. To remove a single entry, omit it from the map you send. ```bash cluster_patch_body=$(cat <<'JSON' { "cloud_provider_tags": { "environment": "production", "cost-center": "engineering", "gcp.network-tag.web-servers": "true", "gcp.network-tag.database-access": "true" } } JSON ) curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` To remove all labels and network tags, send an empty `cloud_provider_tags` object: ```bash cluster_patch_body='{"cloud_provider_tags": {}}' curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` ## [](#next-steps)Next steps [Configure private networking](../../../../../networking/byoc/gcp/) --- # Page 398: Enable Redpanda Connect on an Existing BYOVPC Cluster on GCP **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/gcp/enable-rpcn-byovpc-gcp.md --- # Enable Redpanda Connect on an Existing BYOVPC Cluster on GCP --- title: Enable Redpanda Connect on an Existing BYOVPC Cluster on GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/gcp/enable-rpcn-byovpc-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/gcp/enable-rpcn-byovpc-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/gcp/enable-rpcn-byovpc-gcp.adoc description: Add Redpanda Connect to your existing BYOVPC cluster. page-git-created-date: "2025-04-04" page-git-modified-date: "2025-08-20" --- > ❗ **IMPORTANT** > > BYOVPC is an add-on feature that may require an additional purchase. To unlock this feature for your account, contact your Redpanda account team or [Redpanda Sales](https://www.redpanda.com/price-estimator). To enable Redpanda Connect on an existing BYOVPC cluster, you must update your configuration. You can also create [a new BYOVPC cluster](../vpc-byo-gcp/) with Redpanda Connect already enabled. Replace all `` with your own values. 1. Create two new service accounts with the necessary permissions and roles. Show commands ```bash # Account used to check for and read secrets, which are required to create Redpanda Connect pipelines. gcloud iam service-accounts create redpanda-connect-api \ --display-name="Redpanda Connect API Service Account" cat << EOT > redpanda-connect-api.role { "name": "redpanda_connect_api_role", "title": "Redpanda Connect API Role", "description": "Redpanda Connect API Role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_connect_api_role --project= --file redpanda-connect-api.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-connect-api@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_connect_api_role" ``` ```bash # Account used to retrieve secrets and create Redpanda Connect pipelines. gcloud iam service-accounts create redpanda-connect \ --display-name="Redpanda Connect Service Account" cat << EOT > redpanda-connect.role { "name": "redpanda_connect_role", "title": "Redpanda Connect Role", "description": "Redpanda Connect Role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_connect_role --project= --file redpanda-connect.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-connect@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_connect_role" ``` 2. Bind the service accounts. The account ID of the GCP service account is used to configure service account bindings. This account ID is the local part of the email address for the GCP service account. For example, if the GCP service account is `my-gcp-sa@my-project.iam.gserviceaccount.com`, then the account ID is `my-gcp-sa`. Show commands ```none gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-connect/]" ``` ```none gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-connect/]" ``` 3. Make a [`PATCH /v1/clusters/{cluster-id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster configuration. Show request ```bash export CLUSTER_PATCH_BODY=`cat << EOF { "customer_managed_resources": { "gcp": { "redpanda_connect_api_service_account": { "email": "@.iam.gserviceaccount.com" }, "redpanda_connect_service_account": { "email": "@.iam.gserviceaccount.com" } } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/ ``` 4. Check Redpanda Connect is available in the Cloud UI. 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the **Connect** page and you should see Redpanda Connect. ## [](#next-steps)Next steps - Choose [connectors for your use case](../../../../../develop/connect/components/about/). - Learn how to [configure, test, and run a data pipeline locally](../../../../../../redpanda-connect/get-started/quickstarts/rpk/). - Try the [Redpanda Connect quickstart](../../../../../develop/connect/connect-quickstart/). - Try one of our [Redpanda Connect cookbooks](../../../../../develop/connect/cookbooks/). - Learn how to [add secrets to your pipeline](../../../../../develop/connect/configuration/secret-management/). --- # Page 399: Enable Secrets Management on an Existing BYOVPC Cluster on GCP **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/gcp/enable-secrets-byovpc-gcp.md --- # Enable Secrets Management on an Existing BYOVPC Cluster on GCP --- title: Enable Secrets Management on an Existing BYOVPC Cluster on GCP page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/gcp/enable-secrets-byovpc-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/gcp/enable-secrets-byovpc-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/gcp/enable-secrets-byovpc-gcp.adoc description: Store and read secrets in your existing BYOVPC cluster. # Beta release status page-beta: "true" page-git-created-date: "2025-06-06" page-git-modified-date: "2025-08-20" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta > ❗ **IMPORTANT** > > BYOVPC is an add-on feature that may require an additional purchase. To unlock this feature for your account, contact your Redpanda account team or [Redpanda Sales](https://www.redpanda.com/price-estimator). Storing secrets in your cluster allows you to keep your cloud infrastructure secure as you integrate your data across different systems, for example, REST catalogs with your Iceberg-enabled topics. If you do not have secrets management enabled on an existing BYOVPC cluster, you can do so by following the steps on this page to update your cluster configuration. You can also create [a new BYOVPC cluster](../vpc-byo-gcp/) with secrets management already enabled. Replace all `` with your own values. 1. Create one new service account with the necessary permissions and roles. Show commands ```bash # Account used to check for and read secrets gcloud iam service-accounts create redpanda-operator \ --display-name="Redpanda Operator Service Account" cat << EOT > redpanda-operator.role { "name": "redpanda_operator_role", "title": "Redpanda Operator Role", "description": "Redpanda Operator Role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_operator_role --project= --file redpanda-operator.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-operator@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_operator_role" ``` 2. Update the existing Redpanda cluster service account with the necessary permissions to read secrets. Show commands ```bash cat << EOT > redpanda-cluster.role { "name": "redpanda_cluster_role", "title": "Redpanda Cluster Role", "description": "Redpanda Cluster Role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_cluster_role --project= --file redpanda-cluster.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-cluster@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_cluster_role" ``` 3. Bind the new service account. The account ID of the GCP service account is used to configure service account bindings. This account ID is the local part of the email address for the GCP service account. For example, if the GCP service account is `my-gcp-sa@my-project.iam.gserviceaccount.com`, then the account ID is `my-gcp-sa`. Show commands ```none gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-system/]" ``` 4. Make a [`PATCH /v1/clusters/{cluster-id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster configuration. Show request ```bash export CLUSTER_PATCH_BODY=`cat << EOF { "customer_managed_resources": { "gcp": { "redpanda_operator_service_account": { "email": "@.iam.gserviceaccount.com" } } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/ ``` 5. Check secrets management is available in the Cloud UI. 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Go to the **Secrets Store** page of your cluster. You should be able to create a new secret. ## [](#next-steps)Next steps - [Reference a secret in a cluster property](../../../../../manage/cluster-maintenance/config-cluster/#set-cluster-configuration-properties). - [Integrate a catalog](../../../../../manage/iceberg/use-iceberg-catalogs/) for querying Iceberg topics in your cluster. --- # Page 400: Create a BYOVPC Cluster on GCP **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/gcp/vpc-byo-gcp.md --- # Create a BYOVPC Cluster on GCP --- title: Create a BYOVPC Cluster on GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/gcp/vpc-byo-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/gcp/vpc-byo-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/gcp/vpc-byo-gcp.adoc description: Connect Redpanda Cloud to your existing VPC for additional security. page-git-created-date: "2024-10-24" page-git-modified-date: "2025-09-26" --- > ❗ **IMPORTANT** > > BYOVPC/BYOVNet is an add-on feature that requires Premium support. To unlock this feature for your account, contact your Redpanda account team or [Redpanda Sales](https://www.redpanda.com/price-estimator). A Bring Your Own Virtual Private Cloud (BYOVPC) cluster allows you to deploy the Redpanda [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) into your existing VPC and manage the networking lifecycle. Compared to a standard Bring Your Own Cloud (BYOC) setup, where Redpanda manages the networking lifecycle for you, BYOVPC provides more control. See also: [BYOC architecture](../../../../byoc-arch/). When you create a BYOVPC cluster, you specify your VPC and service account. The Redpanda Cloud agent doesn’t create any new resources or alter any settings in your account. With BYOVPC: - You provide your own VPC in your Google Cloud account. - You maintain more control of your Google Cloud account, because Redpanda requires fewer permissions than standard BYOC clusters. - You control your security resources and policies, including subnets, service accounts, IAM roles, firewall rules, and storage buckets. If your clients need to connect from different GCP regions than where your cluster will be deployed, you must enable global access during cluster creation. To create a BYOVPC cluster with global access enabled, see [Enable Global Access](../../../../../networking/byoc/gcp/enable-global-access/). ## [](#prerequisites)Prerequisites - A standalone GCP project is recommended. If your host project (where your VPC project is created) and your service project (where your Redpanda cluster is created) are in different projects, you must first provision a shared VPC in Google Cloud. For more information, see the [Google shared VPC documentation](https://cloud.google.com/vpc/docs/provisioning-shared-vpc). - Redpanda creates a private Google Kubernetes Engine (GKE) cluster in your VPC. The subnet and secondary IP ranges you provide must allow public internet access. The configuration requires you to provide reserved CIDR ranges for the subnet and GKE Pods, Services, and master IP addresses. See the [GKE service account documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/service-accounts) and [Configure your VPC](#configure-your-vpc). - Only primary CIDR ranges are supported for the VPC. - Redpanda requires access to certain Google APIs, storage buckets, and service accounts. See [Configure the service project](#configure-the-service-project). ### [](#gcp-quotas)GCP quotas Ensure at least three nodes of headroom in the relevant GCP quotas in the same region as your cluster. During maintenance, Redpanda may temporarily create extra nodes. Quotas such as vCPUs per VM family (for example, N2D) and Local SSD total per VM family (quota key: `LOCAL_SSD_TOTAL_GB_PER_VM_FAMILY`) are listed for each tier on the **Create BYOC cluster** page in the Redpanda Cloud UI. Headroom formulas: - vCPU spare = `3 x (vCPUs per node)` - Local SSD spare (GB) = `3 x (Storage size per node in GB)` For example, with per-node storage **1500 GB** (4 × 375 GB Local SSD) and machine type **n2d-standard-4** (4 vCPUs), keep **4500 GB** Local SSD and **12 vCPUs** of spare quota. ## [](#limitations)Limitations - Existing clusters cannot be moved to a BYOVPC cluster. - After creating a BYOVPC cluster, you cannot change to a different VPC. ## [](#configure-your-vpc)Configure your VPC 1. Create the primary and secondary subnets in your VPC using CIDR notation. Redpanda clusters require one subnet, and that subnet should have two secondary IP ranges: - Subnet IP range should be at least /24 CIDR, such as 10.0.0.0/24. - Secondary IP for GKE Pods is a /21 CIDR, such as 10.0.8.0/21. - Secondary IP for GKE Services is a /24 CIDR, such as 10.0.1.0/24. Replace all `` with your own values. ```bash gcloud compute networks subnets create \ --project \ --network \ --range 10.0.0.0/24 \ --region \ --secondary-range =10.0.8.0/21,=10.0.1.0/24 ``` Additionally, a /28 CIDR is required for the GKE master IP addresses. This CIDR is not used in the GCP networking configuration, but is input into the Redpanda UI; for example, 10.0.7.240/28. 2. To enable egress, create a cloud router and NAT at the host project: ```bash gcloud compute routers create \ --project \ --region \ --network gcloud compute addresses create --region gcloud compute routers nats create \ --project \ --router \ --region \ --nat-all-subnet-ip-ranges \ --nat-external-ip-pool \ --enable-endpoint-independent-mapping ``` 3. Create VPC firewall rules. - Redpanda ingress: ```bash gcloud compute firewall-rules create redpanda-ingress \ --description="Allow access to Redpanda cluster" \ --network="" \ --project="" \ --direction="INGRESS" \ --target-tags="redpanda-node" \ --source-ranges="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,100.64.0.0/10" \ --allow="tcp:9092-9094,tcp:30081,tcp:30082,tcp:30092" ``` - Master webhooks: ```bash gcloud compute firewall-rules create gke-redpanda-cluster-webhooks \ --description="Allow master to hit pods for admission controllers/webhooks" \ --network="" \ --project="" \ --direction="INGRESS" \ --source-ranges="" \ --allow="tcp:9443,tcp:8443,tcp:6443" ``` Replace `` with a /28 CIDR. For example: 172.16.0.32/28. For information about the master CIDR, and how to set it using `--master-ipv4-cidr`, see the **gcloud** tab in [Creating a private cluster with no client access to the public endpoint](https://cloud.google.com/kubernetes-engine/docs/how-to/legacy/network-isolation#private_cp) 4. Grant permission to read the VPC and related resources. If the host project and service project are in different projects, it’s helpful for the Redpanda team to have read access to the VPC and related resources in the host project. If your host project and service project are the same, you can skip this step. - Redpanda Agent custom role: ```bash cat << EOT > redpanda-agent.role { "name": "redpanda_agent_role", "title": "Redpanda Agent Role", "description": "A role granting the redpanda agent permissions to view network resources in the project of the vpc.", "includedPermissions": [ "compute.firewalls.get", "compute.subnetworks.get", "resourcemanager.projects.get", "compute.networks.getRegionEffectiveFirewalls", "compute.networks.getEffectiveFirewalls" ] } EOT gcloud iam roles create redpanda_agent_role --project= --file redpanda-agent.role ``` ## [](#configure-the-service-project)Configure the service project 1. Enable Google APIs in the service project: ```bash gcloud services enable cloudresourcemanager.googleapis.com --project gcloud services enable dns.googleapis.com --project gcloud services enable secretmanager.googleapis.com --project gcloud services enable compute.googleapis.com --project gcloud services enable iam.googleapis.com --project gcloud services enable storage-api.googleapis.com --project gcloud services enable container.googleapis.com --project gcloud services enable serviceusage.googleapis.com --project ``` 2. Create storage buckets at the service project in the same region as the cluster: ```bash gcloud storage buckets create gs:// \ --location="" \ --uniform-bucket-level-access gcloud storage buckets create gs:// \ --location="" \ --uniform-bucket-level-access gcloud storage buckets update gs:// --versioning ``` - Redpanda uses the tiered storage bucket for writing log segments. This should not be versioned. - Redpanda uses the management storage bucket to store cluster metadata. This can have versioning enabled. 3. Create service accounts with necessary permissions and roles. - Redpanda Cloud agent service account Show commands ```bash gcloud iam service-accounts create redpanda-agent \ --display-name="Redpanda Agent Service Account" cat << EOT > redpanda-agent.role { "name": "redpanda_agent_role", "title": "Redpanda Agent Role", "description": "A role comprising general permissions allowing the agent to manage Redpanda cluster resources.", "includedPermissions": [ "compute.firewalls.get", "compute.disks.get", "compute.globalOperations.get", "compute.instanceGroupManagers.get", "compute.instanceGroupManagers.delete", "compute.instanceGroups.delete", "compute.instances.list", "compute.instanceTemplates.delete", "compute.networks.getRegionEffectiveFirewalls", "compute.networks.getEffectiveFirewalls", "compute.projects.get", "compute.subnetworks.get", "compute.zoneOperations.get", "compute.zoneOperations.list", "compute.zones.get", "compute.zones.list", "dns.changes.create", "dns.changes.get", "dns.changes.list", "dns.managedZones.create", "dns.managedZones.delete", "dns.managedZones.get", "dns.managedZones.list", "dns.managedZones.update", "dns.projects.get", "dns.resourceRecordSets.create", "dns.resourceRecordSets.delete", "dns.resourceRecordSets.get", "dns.resourceRecordSets.list", "dns.resourceRecordSets.update", "iam.roles.get", "iam.roles.list", "iam.serviceAccounts.actAs", "iam.serviceAccounts.get", "iam.serviceAccounts.getIamPolicy", "resourcemanager.projects.get", "resourcemanager.projects.getIamPolicy", "serviceusage.services.list", "storage.buckets.get", "storage.buckets.getIamPolicy", "compute.subnetworks.use", "compute.instances.use", "compute.networks.use", "compute.regionOperations.get", "compute.serviceAttachments.create", "compute.serviceAttachments.delete", "compute.serviceAttachments.get", "compute.serviceAttachments.list", "compute.serviceAttachments.update", "compute.forwardingRules.use", "compute.forwardingRules.create", "compute.forwardingRules.delete", "compute.forwardingRules.get", "compute.forwardingRules.setLabels", "compute.forwardingRules.setTarget", "compute.forwardingRules.pscCreate", "compute.forwardingRules.pscDelete", "compute.forwardingRules.pscSetLabels", "compute.forwardingRules.pscSetTarget", "compute.forwardingRules.pscUpdate", "compute.regionBackendServices.create", "compute.regionBackendServices.delete", "compute.regionBackendServices.get", "compute.regionBackendServices.use", "compute.regionNetworkEndpointGroups.create", "compute.regionNetworkEndpointGroups.delete", "compute.regionNetworkEndpointGroups.get", "compute.regionNetworkEndpointGroups.use", "compute.regionNetworkEndpointGroups.attachNetworkEndpoints", "compute.regionNetworkEndpointGroups.detachNetworkEndpoints", "compute.disks.list", "compute.disks.setLabels", "compute.instanceGroupManagers.update", "compute.instances.delete", "compute.instances.get", "compute.instances.setLabels" ] } EOT gcloud iam roles create redpanda_agent_role --project= --file redpanda-agent.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-agent@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_agent_role" gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-agent@.iam.gserviceaccount.com" \ --role="roles/container.admin" gcloud storage buckets add-iam-policy-binding gs:// \ --member="serviceAccount:redpanda-agent@.iam.gserviceaccount.com" \ --role="roles/storage.objectAdmin" # skip this step if host project and service project are the same gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-agent@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_agent_role" ``` - Redpanda cluster service account Show commands ```bash cat << EOT > redpanda-cluster.role { "name": "redpanda_cluster_role", "title": "Redpanda Cluster Role", "description": "Redpanda Cluster role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam service-accounts create redpanda-cluster \ --display-name="Redpanda Cluster Service Account" gcloud storage buckets add-iam-policy-binding gs:// \ --member="serviceAccount:redpanda-cluster@.iam.gserviceaccount.com" \ --role="roles/storage.objectAdmin" gcloud iam roles create redpanda_cluster_role --project= --file redpanda-cluster.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-cluster@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_cluster_role" ``` - Redpanda operator service account Show commands ```bash gcloud iam service-accounts create redpanda-operator \ --display-name="Redpanda Operator Service Account" cat << EOT > redpanda-operator.role { "name": "redpanda_operator_role", "title": "Redpanda Operator Role", "description": "Redpanda Operator role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_operator_role --project= --file redpanda-operator.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-operator@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_operator_role" ``` - Redpanda Connect service accounts Show commands ```bash # Account used to check for and read secrets, which are required to create Redpanda Connect pipelines. gcloud iam service-accounts create redpanda-connect-api \ --display-name="Redpanda Connect API Service Account" cat << EOT > redpanda-connect-api.role { "name": "redpanda_connect_api_role", "title": "Redpanda Connect API Role", "description": "Redpanda Connect API role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.secrets.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_connect_api_role --project= --file redpanda-connect-api.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-connect-api@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_connect_api_role" ``` ```bash # Account used to retrieve secrets and create Redpanda Connect pipelines. gcloud iam service-accounts create redpanda-connect \ --display-name="Redpanda Connect Service Account" cat << EOT > redpanda-connect.role { "name": "redpanda_connect_role", "title": "Redpanda Connect Role", "description": "Redpanda Connect role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_connect_role --project= --file redpanda-connect.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-connect@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_connect_role" ``` - Redpanda Cloud secret manager Show commands ```bash gcloud iam service-accounts create redpanda-console \ --display-name="Redpanda Cloud Secret Manager" cat << EOT > redpanda-console.role { "name": "redpanda_console_secret_manager_role", "title": "Redpanda Cloud Secret Manager Writer", "description": "Redpanda Cloud Secret Manager Writer", "includedPermissions": [ "secretmanager.secrets.get", "secretmanager.secrets.create", "secretmanager.secrets.delete", "secretmanager.secrets.list", "secretmanager.secrets.update", "secretmanager.versions.add", "secretmanager.versions.destroy", "secretmanager.versions.disable", "secretmanager.versions.enable", "secretmanager.versions.list", "iam.serviceAccounts.getAccessToken" ] } EOT gcloud iam roles create redpanda_console_secret_manager_role --project= --file redpanda-console.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-console@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_console_secret_manager_role" ``` - Kafka Connect service account Show commands ```bash gcloud iam service-accounts create redpanda-connectors \ --display-name="Kafka Connect Service Account" cat << EOT > redpanda-connectors.role { "name": "redpanda_connectors_role", "title": "Kafka Connect Custom Role", "description": "Kafka Connect custom role", "includedPermissions": [ "resourcemanager.projects.get", "secretmanager.versions.access" ] } EOT gcloud iam roles create redpanda_connectors_role --project= --file redpanda-connectors.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-connectors@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_connectors_role" ``` - Redpanda GKE cluster service account Show commands ```bash gcloud iam service-accounts create redpanda-gke \ --display-name="Redpanda GKE cluster default node service account" cat << EOT > redpanda-gke.role { "name": "redpanda_gke_utility_role", "title": "Redpanda cluster utility node role", "description": "Redpanda cluster utility node role", "includedPermissions": [ "artifactregistry.dockerimages.get", "artifactregistry.dockerimages.list", "artifactregistry.files.get", "artifactregistry.files.list", "artifactregistry.locations.get", "artifactregistry.locations.list", "artifactregistry.mavenartifacts.get", "artifactregistry.mavenartifacts.list", "artifactregistry.npmpackages.get", "artifactregistry.npmpackages.list", "artifactregistry.packages.get", "artifactregistry.packages.list", "artifactregistry.projectsettings.get", "artifactregistry.pythonpackages.get", "artifactregistry.pythonpackages.list", "artifactregistry.repositories.downloadArtifacts", "artifactregistry.repositories.get", "artifactregistry.repositories.list", "artifactregistry.repositories.listEffectiveTags", "artifactregistry.repositories.listTagBindings", "artifactregistry.repositories.readViaVirtualRepository", "artifactregistry.tags.get", "artifactregistry.tags.list", "artifactregistry.versions.get", "artifactregistry.versions.list", "logging.logEntries.create", "logging.logEntries.route", "monitoring.metricDescriptors.create", "monitoring.metricDescriptors.get", "monitoring.metricDescriptors.list", "monitoring.monitoredResourceDescriptors.get", "monitoring.monitoredResourceDescriptors.list", "monitoring.timeSeries.create", "cloudnotifications.activities.list", "monitoring.alertPolicies.get", "monitoring.alertPolicies.list", "monitoring.dashboards.get", "monitoring.dashboards.list", "monitoring.groups.get", "monitoring.groups.list", "monitoring.notificationChannelDescriptors.get", "monitoring.notificationChannelDescriptors.list", "monitoring.notificationChannels.get", "monitoring.notificationChannels.list", "monitoring.publicWidgets.get", "monitoring.publicWidgets.list", "monitoring.services.get", "monitoring.services.list", "monitoring.slos.get", "monitoring.slos.list", "monitoring.snoozes.get", "monitoring.snoozes.list", "monitoring.timeSeries.list", "monitoring.uptimeCheckConfigs.get", "monitoring.uptimeCheckConfigs.list", "opsconfigmonitoring.resourceMetadata.list", "resourcemanager.projects.get", "stackdriver.projects.get", "stackdriver.resourceMetadata.list", "dns.changes.create", "dns.changes.get", "dns.changes.list", "dns.managedZones.list", "dns.resourceRecordSets.create", "dns.resourceRecordSets.delete", "dns.resourceRecordSets.get", "dns.resourceRecordSets.list", "dns.resourceRecordSets.update", "secretmanager.versions.access", "stackdriver.resourceMetadata.write", "storage.objects.get", "storage.objects.list", "compute.instances.use", "iam.serviceAccounts.getAccessToken", "compute.regionNetworkEndpointGroups.create", "compute.regionNetworkEndpointGroups.delete", "compute.regionNetworkEndpointGroups.get", "compute.regionNetworkEndpointGroups.use", "compute.regionNetworkEndpointGroups.attachNetworkEndpoints", "compute.regionNetworkEndpointGroups.detachNetworkEndpoints" ] } EOT gcloud iam roles create redpanda_gke_utility_role --project= --file redpanda-gke.role gcloud projects add-iam-policy-binding \ --member="serviceAccount:redpanda-gke@.iam.gserviceaccount.com" \ --role="projects//roles/redpanda_gke_utility_role" ``` 4. Bind the service accounts. The account ID of the GCP service account is used to configure service account bindings. This account ID is the local part of the email address for the GCP service account. For example, if the GCP service account is `my-gcp-sa@my-project.iam.gserviceaccount.com`, then the account ID is `my-gcp-sa`. - Redpanda cluster service account Show command ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda/rp-]" ``` - Redpanda operator service account Show command ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-system/]" ``` - Redpanda Console service account Show command ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda/console-]" ``` - Redpanda Connect service accounts Show command ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-connect/]" ``` ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-connect/]" ``` - Kafka Connect service account Show command ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-connectors/connectors-]" ``` - Cert-manager and external-DNS service accounts Show commands ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[cert-manager/cert-manager]" gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[external-dns/external-dns]" ``` - Private Service Connect Controller service account Show commands ```bash gcloud iam service-accounts add-iam-policy-binding @.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:.svc.id.goog[redpanda-psc/psc-controller]" ``` ## [](#create-cluster)Create cluster Log in to the [Redpanda Cloud UI](https://cloud.redpanda.com), and follow the steps to [create a BYOC cluster](../create-byoc-cluster-gcp/), with the following exceptions: 1. On the **Network** page, select the **BYOVPC** connection type, and enter the network, service account, storage bucket information, and GKE master CIDR range you created. 2. With customer-managed networks, you must grant yourself (the user deploying the cluster with `rpk`) the following permissions: Expand permissions - `compute.disks.create` - `compute.disks.setLabels` - `compute.instanceGroupManagers.create` - `compute.instanceGroupManagers.delete` - `compute.instanceGroupManagers.get` - `compute.instanceGroups.create` - `compute.instanceGroups.delete` - `compute.instanceTemplates.create` - `compute.instanceTemplates.delete` - `compute.instanceTemplates.get` - `compute.instanceTemplates.useReadOnly` - `compute.instances.create` - `compute.instances.setLabels` - `compute.instances.setMetadata` - `compute.instances.setTags` - `compute.subnetworks.get` - `compute.subnetworks.use` - `compute.zones.list` - `iam.roles.get` - `iam.serviceAccounts.actAs` - `iam.serviceAccounts.get` - `resourcemanager.projects.get` - `resourcemanager.projects.getIamPolicy` - `serviceusage.services.list` - `storage.buckets.get` - `storage.buckets.getIamPolicy` - `storage.objects.create` - `storage.objects.delete` - `storage.objects.get` - `storage.objects.list` This can be done through a Google account, a service account, or any principal identity supported by GCP. - If running `rpk` from a Google account, the user must acquire new user credentials to use for [Application Default Credentials](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login). - If running `rpk` from a service account, the user must create a [service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating), then [export GOOGLE\_APPLICATION\_CREDENTIALS](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [set the account as the default in gcloud](https://cloud.google.com/sdk/gcloud/reference/config/set): ```bash export GOOGLE_APPLICATION_CREDENTIALS= gcloud config set account $SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com ``` 3. To validate your configuration, run: ```bash rpk cloud byoc gcp apply --redpanda-id='' --project-id='' --validate-only ``` 4. Click **Next**. 5. On the **Deploy** page, similar to standard BYOC clusters, log in to Redpanda Cloud and deploy the agent. > 📝 **NOTE** > > Redpanda Cloud does not support customer access or modifications to any of the internal data plane resources. This restriction allows Redpanda Data to manage all configuration changes internally to ensure a 99.99% service level agreement (SLA) for BYOC clusters. ## [](#delete-cluster)Delete cluster You can delete the cluster in the Cloud UI. 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Select your cluster. 3. Go to the **Cluster settings** page and click **Delete**, then confirm your deletion. ## [](#manage-custom-resource-labels-and-network-tags)Manage custom resource labels and network tags Your organization might require custom resource labels and network tags for cost allocation, audit compliance, or governance policies. After cluster creation, you can manage this with the [Cloud Control Plane API](../../../../../manage/api/cloud-byoc-controlplane-api/). The Control Plane API allows up to 16 custom resource labels and network tags in GCP. Make sure you have: - The cluster ID. You can find this in the Redpanda Cloud UI, in the **Details** section of the cluster overview. - A valid bearer token for the Cloud Control Plane API. For details, see [Authenticate to the API](/api/doc/cloud-controlplane/authentication). > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). 1. To refresh agent permissions so the Redpanda agent can update labels and network tags, run: ```bash export CLUSTER_ID="" export PROJECT_ID="" rpk cloud byoc gcp apply --redpanda-id="$CLUSTER_ID" --project-id="$PROJECT_ID" ``` This step is required because label/tag management requires additional IAM permissions that may not have been granted during initial cluster creation: - `compute.disks.get` - `compute.disks.list` - `compute.disks.setLabels` - `compute.instances.setLabels` 2. To update labels and network tags, invoke the Cloud API. First, set your authentication token: ```bash export AUTH_TOKEN="" ``` The `PATCH` call sets the labels and network tags specified under `"cloud_provider_tags"`. It replaces the existing labels and tags with the specified labels and tags. Include all desired labels and tags in the request. To remove a single entry, omit it from the map you send. ```bash cluster_patch_body=$(cat <<'JSON' { "cloud_provider_tags": { "environment": "production", "cost-center": "engineering", "gcp.network-tag.web-servers": "true", "gcp.network-tag.database-access": "true" } } JSON ) curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` To remove all labels and network tags, send an empty `cloud_provider_tags` object: ```bash cluster_patch_body='{"cloud_provider_tags": {}}' curl -X PATCH "https://api.redpanda.com/v1/clusters/$CLUSTER_ID" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$cluster_patch_body" ``` > 📝 **NOTE** > > For BYOVPC clusters, custom labels are not applied to the customer-managed resources that are deployed by the customer. ## [](#next-steps)Next steps [Configure private networking](../../../../../networking/byoc/gcp/) --- # Page 401: Create Remote Read Replicas **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/byoc/remote-read-replicas.md --- # Create Remote Read Replicas --- title: Create Remote Read Replicas latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/byoc/remote-read-replicas page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/byoc/remote-read-replicas.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/byoc/remote-read-replicas.adoc description: Learn how to create a remote read replica topic with BYOC, which is a read-only topic that mirrors a topic on a different cluster. page-git-created-date: "2024-08-01" page-git-modified-date: "2026-03-31" --- A remote read replica topic is a read-only topic that mirrors a topic on a different cluster. You can create a separate remote cluster just for consumers of this topic and populate its topics from object storage. A read-only topic on a remote cluster can serve any consumer, without increasing the load on the source cluster. Because these read-only topics access data directly from object storage, there’s no impact to the performance of the cluster. Remote read replica topics do not store any data. When a cluster running a remote read replica is terminated, the topic data only exists on the origin cluster. Redpanda Cloud supports remote read replica topics in BYOC clusters on AWS or GCP. These clusters can be ephemeral; that is, created temporarily to handle specific or transient workloads, but they don’t have to be. The ability to make them ephemeral provides flexibility and cost efficiency: you can scale resources up or down as needed and pay only for what you use. ## [](#prerequisites)Prerequisites To use remote read replicas, you need: - A BYOC reader cluster in Ready state. This separate reader cluster must exist in the same Redpanda organization as the source cluster. - AWS: The reader cluster can be in the same or a different region as the origin cluster’s S3 bucket. For cross-region remote read replica topics, see [Create a cross-region remote read replica topic on AWS](#create-cross-region-rrr-topic). - GCP: The reader cluster can be in the same or a different region as the source cluster. The reader cluster must be in the same project as the source cluster. - Azure: Remote read replicas are not supported. ### [](#byovpc-grant-storage-permissions)BYOVPC: Grant storage permissions > 📝 **NOTE** > > This prerequisite only applies to BYOVPC deployments. Skip this step if you’re enabling remote read replicas on standard BYOC clusters. #### GCP To grant additional permissions to the cloud storage manager of the reader cluster, run: ```bash gcloud storage buckets add-iam-policy-binding \ gs:// \ --member="serviceAccount:" \ --role="roles/storage.objectViewer" ``` #### AWS To grant additional permissions to the cloud storage manager of the reader cluster, set the `source_cluster_bucket_names` and `reader_cluster_id` variables in [cloud-examples](https://github.com/redpanda-data/cloud-examples/blob/main/customer-managed/aws/terraform/variables.tf). This should be done in the Terraform of the reader cluster. ## [](#configure-remote-read-replica)Configure remote read replica Add or remove reader clusters to a source cluster in Redpanda Cloud with the [Cloud Control Plane API](../../../../manage/api/controlplane/). For information on accessing the Cloud API, see the [authentication guide](/api/doc/cloud-controlplane/authentication). 1. To update your source cluster to add one or more reader cluster IDs, make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request. The full list of clusters is expected on every call. If an ID is removed from the list, it is removed as a reader cluster. ```bash export SOURCE_CLUSTER_ID=....... export READER_CLUSTER_ID=....... curl -X PATCH $API_HOST/v1/clusters/$SOURCE_CLUSTER_ID \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_TOKEN" \ -d @- << EOF { "read_replica_cluster_ids": ["$READER_CLUSTER_ID"] } EOF ``` 2. Optional: To see the list of reader clusters on a given source cluster, make a [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) request: ```bash export SOURCE_CLUSTER_ID=....... curl -X GET $API_HOST/v1/clusters/$SOURCE_CLUSTER_ID \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_TOKEN" ``` > 📝 **NOTE** > > A source cluster cannot be deleted if it has remote read replica topics. When you delete a reader cluster, that cluster’s ID is removed from any existing source cluster `read_replica_cluster_ids` lists. ## [](#create-remote-read-replica-topic)Create remote read replica topic To create a remote read replica topic, run: ```bash rpk topic create -c redpanda.remote.readreplica= --tls-enabled ``` - For ``, use the same name as the original topic. - For ``, use the bucket specified in the `cloud_storage_bucket` properties for the origin cluster. For standard BYOC clusters, the source cluster bucket name follows the pattern: `redpanda-cloud-storage-${SOURCE_CLUSTER_ID}` ### [](#create-cross-region-rrr-topic)Create a cross-region remote read replica topic on AWS Use this configuration only when the remote cluster is in a **different AWS region** than the origin cluster’s S3 bucket. For same-region AWS or GCP deployments, use the standard [topic creation command](#create-remote-read-replica-topic). #### [](#create-the-topic)Create the topic To create a cross-region remote read replica topic, append `region` and `endpoint` query-string parameters to the bucket name. In the following example, replace the placeholders: - ``: The name of the topic in the cluster hosting the remote read replica. - ``: The S3 bucket configured on the origin cluster (`cloud_storage_bucket`). - ``: The AWS region of the origin cluster’s S3 bucket (not the remote cluster’s region). ```bash rpk topic create \ -c redpanda.remote.readreplica=?region=&endpoint=s3..amazonaws.com --tls-enabled ``` For example, if the origin cluster stores data in a bucket called `my-bucket` in `us-east-1`: ```bash rpk topic create my-topic \ -c redpanda.remote.readreplica=my-bucket?region=us-east-1&endpoint=s3.us-east-1.amazonaws.com --tls-enabled ``` > 📝 **NOTE** > > The `endpoint` value must not include the bucket name. When using `virtual_host` URL style, Redpanda automatically prepends the bucket name to the endpoint. When using `path` URL style, Redpanda appends the bucket name as a path segment. #### [](#limits)Limits Each unique combination of region and endpoint creates a separate object storage target on the remote cluster. A cluster supports a maximum of 10 targets. How targets are counted depends on `cloud_storage_url_style`: - `virtual_host`: Each unique combination of bucket, region, and endpoint counts as one target. You can create up to 10 distinct cross-region remote read replica topics for each cluster. - `path`: Each unique combination of region and endpoint counts as one target (the bucket name is not part of the key). You can create cross-region remote read replica topics for multiple buckets using the same region/endpoint combination, with a maximum of 10 distinct region/endpoint combinations for each cluster. ## [](#optional-tune-for-live-topics)Optional: Tune for live topics For remote read replicas reading from a live topic (that is, a topic that’s being actively written to by a source cluster), it may be advantageous to control how often segments are flushed to object storage. By default, this is set to 60 minutes. To tune `cloud_storage_segment_max_upload_interval_sec` on the source cluster, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). (For cold topics, where segments are closed and older than 60 minutes, this configuration is unnecessary: the data is already uploaded to object storage.) --- # Page 402: Dedicated **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/create-dedicated-cloud-cluster.md --- # Dedicated --- title: Dedicated latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/create-dedicated-cloud-cluster page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/create-dedicated-cloud-cluster.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/create-dedicated-cloud-cluster.adoc description: Learn how to create a Dedicated cluster and start streaming. page-git-created-date: "2025-04-01" page-git-modified-date: "2026-02-02" --- After you log in to [Redpanda Cloud](https://cloud.redpanda.com), you land on the **Clusters** page. This page lists all the clusters in your organization. ## [](#create-a-dedicated-cluster)Create a Dedicated cluster 1. On the Clusters page, click **Create cluster**, then click **Create** for Dedicated. Enter a cluster name, then select the resource group, cloud provider (AWS, GCP, or Azure), [region, tier](../../../reference/tiers/dedicated-tiers/), availability, and Redpanda version. > 📝 **NOTE** > > - If you plan to create a private network in your own VPC, select the region where your VPC is located. > > - Three availability zones provide two backups in case one availability zone goes down. 2. Click **Next**. 3. On the Network page, enter the connection type: public or private. For private networks: - Your network name is used to identify this network. - For a [CIDR range](../../../networking/cidr-ranges/), choose one that does not overlap with your existing VPCs or your Redpanda network. Private networks require either a VPC peering connection or a private connectivity service, such as [AWS PrivateLink](../../../networking/configure-privatelink-in-cloud-ui/), [GCP Private Service Connect](../../../networking/configure-private-service-connect-in-cloud-ui/), or [Azure Private Link](../../../networking/azure-private-link/). - Clusters with private networking include a setting for API Gateway network access. Public access exposes endpoints for Redpanda Console, the Data Plane API, and the MCP Server API, but they remain protected by your authentication and authorization controls. Private access restricts endpoint access to your VPC/VNet only. On Azure, private access incurs an additional cost, since it involves deploying two network load balancers, instead of one. > 📝 **NOTE** > > After the cluster is created, you can change the API Gateway access on the cluster settings page. If you change from public to private access, users without VPN access to the Redpanda VPC will lose access to these services. 4. Click **Create**. After the cluster is created, you can select the cluster on the **Clusters** page to see the overview for it. ## [](#start-streaming-example)Start streaming: example Use `rpk`, Redpanda’s CLI, to build a basic streaming application that creates a topic, produces messages to it, and consumes messages from it. To learn about `rpk`, see the [Introduction to rpk](../../../manage/rpk/intro-to-rpk/). 1. Login to Redpanda Cloud, and select your resource group using the interactive prompt. ```bash rpk cloud login ``` 2. On the **Overview** page, copy your bootstrap server address and set it as an environment variable on your local machine: ```bash export REDPANDA_BROKERS="" ``` 3. Go to the **Security** page, and create a user called **redpanda-chat-account** that uses the SCRAM-SHA-256 mechanism. 4. Copy the password, and set the following environment variables on your local machine: ```bash export REDPANDA_SASL_USERNAME="redpanda-chat-account" export REDPANDA_SASL_PASSWORD="" export REDPANDA_SASL_MECHANISM="SCRAM-SHA-256" ``` 5. Click the name of your user, and add the following permissions to the ACL (access control list): - **Host**: \* - **Topic name**: `chat-room` - **Operations**: All 6. Click **Create**. 7. Use `rpk` on your local machine to authenticate to Redpanda as the **redpanda-chat-account** user and get information about the cluster: ```bash rpk cluster info -X tls.enabled=true ``` 8. Create a topic called `chat-room`. You granted permissions to the **redpanda-chat-account** user to access only this topic. ```bash rpk topic create chat-room -X tls.enabled=true ``` Output: TOPIC STATUS chat-room OK 9. Produce a message to the topic: ```bash rpk topic produce chat-room -X tls.enabled=true ``` 10. Enter a message, then press Enter: ```text Pandas are fabulous! ``` Example output: Produced to partition 0 at offset 0 with timestamp 1663282629789. 11. Press Ctrl+C to finish producing messages to the topic. 12. Consume one message from the topic: ```bash rpk topic consume chat-room --num 1 -X tls.enabled=true ``` Your message is displayed along with its metadata: ```json { "topic": "chat-room", "value": "Pandas are fabulous!", "timestamp": 1663282629789, "partition": 0, "offset": 0 } ``` ### [](#explore-your-topic)Explore your topic In Redpanda Cloud, go to **Topics** > **chat-room**. The message that you produced to the topic is displayed along with some other details about the topic. ### [](#clean-up)Clean up If you don’t want to continue experimenting with your cluster, you can delete it. Go to **Cluster settings** and click **Delete cluster**. ## [](#next-steps)Next steps - [Learn more about Redpanda Cloud](../../cloud-overview/) - [Learn about private networking](../../../networking/dedicated/) --- # Page 403: Serverless **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/cluster-types/serverless.md --- # Serverless --- title: Serverless latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-types/serverless page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-types/serverless.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/cluster-types/serverless.adoc description: Learn how to create a Serverless cluster and start streaming. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-04-07" --- Serverless is the fastest and easiest way to start data streaming. With Serverless clusters, you host your data in Redpanda’s VPC, and Redpanda handles automatic scaling, provisioning, operations, and maintenance. This is a production-ready deployment option with a cluster available instantly, and you only pay for what you consume. You can view detailed billing activity for each cluster and edit payment methods on the **Billing** page. > 📝 **NOTE** > > Serverless on GCP is currently in a [beta](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#beta) release. ## [](#serverless-usage-limits)Serverless usage limits Each Serverless cluster has the following maximum usage limits: - **Ingress**: 100 MB/s - **Egress**: 300 MB/s - **Partitions**: 5,000 - **Message size**: 20 MiB - **Retention**: unlimited - **Storage**: unlimited - **Users**: 30 - **ACLs**: 120 - **Consumer groups**: 200 - **Connections**: 10,000 - **Producer IDs**: 250 - **Schema Registry**: - **Max schemas**: 500 - **Max subjects**: 500 - **Rate limit**: 100 requests/s - **Redpanda Connect pipelines**: 100 - **MCP servers**: 100 - **AI agents**: 10 > 📝 **NOTE** > > The partition limit is the number of logical partitions before replication occurs. Redpanda Cloud uses a replication factor of 3. ## [](#prerequisites)Prerequisites Make sure you have the latest version of `rpk`, the Redpanda CLI. See [Install or Update rpk](../../../manage/rpk/rpk-install/). ## [](#get-started-with-serverless)Get started with Serverless Choose the option that fits how you want to subscribe: ### Free trial A [free trial on AWS](https://www.redpanda.com/try-redpanda) is the fastest way to get started with Serverless. Each free-trial customer qualifies for $100 (USD) in credits to spend in the first 14 days. This should be enough to run Redpanda with reasonable throughput. No credit card is required. To continue using Serverless after your trial expires, you can enter a credit card and pay as you go. Any remaining credit balance is used before you are charged. When either the credits expire or the days in the trial expire, the clusters move into a suspended state, and you won’t be able to access your data in either the Redpanda Cloud Console or with the Kafka API. There is a seven-day grace period following the end of the trial when you can add your credit card and restore service. After that, the data is permanently deleted. For questions about the trial, use the **#serverless** [Community Slack](https://redpandacommunity.slack.com/) channel. After you start a trial, Redpanda instantly prepares an account for you. Your account includes a `welcome` cluster with a `hello-world` demo topic you can explore. It includes sample data so you can see how real-time messaging works before sending your own data. [Get started](#interact-with-your-cluster) by creating a Redpanda Connect [pipeline](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#pipeline), or by following the steps in the Console to use `rpk` to interact with your cluster from the command line: 1. Log in with `rpk cloud login`. 2. Consume from the `hello-world` topic with `rpk topic consume hello-world`. 3. In the [Redpanda Cloud Console](https://cloud.redpanda.com), navigate to the **Topics** page and open the `hello-world` topic to see the included messages. ### Redpanda Sales To request a private offer with possible discounts for annual committed use, contact [Redpanda Sales](https://www.redpanda.com/price-estimator). When you subscribe to Serverless through Redpanda Sales, you gain immediate access to Enterprise support. Redpanda creates a cloud organization for you and sends you a welcome email. ### AWS Marketplace New subscriptions to Redpanda Cloud through [AWS Marketplace](../../../billing/aws-pay-as-you-go/) receive $300 (USD) in free credits to spend in the first 30 days. AWS Marketplace charges for anything beyond $300, unless you cancel the subscription. After your free credits have been used, you can continue using your cluster without any commitment, only paying for what you consume and canceling anytime. > 📝 **NOTE** > > When you subscribe to Redpanda through AWS Marketplace, you do not have immediate access to Enterprise support, only the [Community Slack](https://redpandacommunity.slack.com/) channel. For Enterprise support, contact [Redpanda Sales](https://www.redpanda.com/price-estimator) Redpanda creates a cloud organization for you and sends you a welcome email. ## [](#create-a-serverless-cluster)Create a Serverless cluster To create a Serverless cluster: 1. In the [Redpanda Cloud Console](https://cloud.redpanda.com), on the **Clusters** page, click **Create cluster**, then click **Create** for Serverless. 2. Enter a cluster name, then select the resource group. If you don’t have an existing resource group, you can create one. Refresh the page to see newly-created resource groups. 3. Select a cloud provider and [region](../../../reference/tiers/serverless-regions/). For best performance, select the region closest to your applications. Redpanda expects your applications to be deployed in the same cloud provider and region as your Serverless cluster. Clusters on AWS can enable private access between their VPC and Redpanda, so data does not traverse the public internet. Private connectivity is implemented using AWS PrivateLink for secure traffic. - When you enable both public access and private access on the cluster, you can choose between the public address or the private address. When the public address is used the data flows over the public internet. - You can either create a new PrivateLink or use an existing one from the same resource group. - You can enable or disable private access at any time on the cluster’s **Settings** page. - Enabling private access incurs additional charges. > 📝 **NOTE** > > After private access is disabled, attempts to reach the private endpoints will fail. However, the PrivateLink endpoint in your AWS account and the PrivateLink resource in Redpanda Cloud both remain provisioned and continue to incur charges until you explicitly delete them. 4. Click **Create cluster**. 5. To start working with your cluster, go to the **Topics** page to create a topic and produce messages to it. Add team members and grant them access with ACLs on the **Security** page. ## [](#interact-with-your-cluster)Interact with your cluster > 💡 **TIP** > > The cluster’s **Overview** page includes a **Get Started** guide to help you start streaming data into and out of Redpanda. See also: [Redpanda Connect Quickstart](../../../develop/connect/connect-quickstart/) The **Overview** page lists your bootstrap server URL and security settings in the **How to connect - Kafka API** tab. Here you can add a Kafka client to interact with your cluster. Or, Redpanda can generate a sample application to interact with your cluster. Run [`rpk generate app`](../../../reference/rpk/rpk-generate/rpk-generate-app/), and select Go as the language. Follow the commands in the terminal to run the application, create a demo topic, produce to the topic, and consume the data back. Follow the steps in the Console to use `rpk` to interact with your cluster from the command line. Here are some helpful commands: - [`rpk cloud login`](../../../reference/rpk/rpk-cloud/rpk-cloud-login/): Use this to log in to Redpanda Cloud or to refresh the session. - [`rpk topic`](../../../reference/rpk/rpk-topic/rpk-topic/): Use this to manage topics, produce data, and consume data. - [`rpk profile print`](../../../reference/rpk/rpk-profile/rpk-profile-print/): Use this to view your `rpk` configuration and see the URL for your Serverless cluster. - [`rpk security user`](../../../reference/rpk/rpk-security/rpk-security-user/): Use this to manage users and permissions. > 📝 **NOTE** > > Redpanda Serverless is opinionated about Kafka configurations. For example, automatic topic creation is disabled. Some systems expect the Kafka service to automatically create topics when a message is produced to a topic that doesn’t exist. Create topics on the **Topics** page or with `rpk topic create`. ## [](#supported-features)Supported features - Redpanda Serverless supports the Kafka API. Serverless clusters work with all Kafka clients. See [Kafka Compatibility](../../../develop/kafka-clients/). - Serverless clusters support all major Apache Kafka messages for managing topics, producing/consuming data (including transactions), managing groups, managing offsets, and managing ACLs. (User management is available in the [Redpanda Cloud Console](https://cloud.redpanda.com) or with `rpk security acl`.) ### [](#unsupported-features)Unsupported features Not all features included in BYOC clusters are available in Serverless. For example, the following features are not supported: - HTTP Proxy API - Multiple availability zones (AZs) - Role-based access control (RBAC) in the data plane and mTLS authentication for Kafka API clients - Group-based access control (GBAC) - Kafka Connect ## [](#next-steps)Next steps - [Set up private access for Serverless clusters](../../../networking/serverless/aws/) - [Manage Redpanda Cloud with Terraform](../../../manage/terraform-provider/) - [Learn more about Redpanda Cloud](../../cloud-overview/) - [Manage topics](../../../develop/topics/config-topics/) - [Learn about billing](../../../billing/billing/) --- # Page 404: Introduction to Redpanda **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/intro-to-events.md --- # Introduction to Redpanda --- title: Introduction to Redpanda latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: intro-to-events page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: intro-to-events.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/intro-to-events.adoc description: Learn about Redpanda event streaming. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Distributed systems often require data and system updates to happen as quickly as possible. In software architecture, these updates can be handled with either messages or events. - With messages, updates are sent directly from one component to another to trigger an action. - With events, updates indicate that an action occurred at a specific time, and are not directed to a specific recipient. An event is simply a record of something changing state. For example, the event of a credit card transaction includes the product purchased, the payment, the delivery, and the time of the purchase. The event occurred in the purchasing component, but it also impacted the inventory, the payment processing, and the shipping components. In an event-driven architecture, all actions are defined and packaged as events to precisely identify individual actions and how they’re processed throughout the system. Instead of processing updates in consecutive order, event-driven architecture lets components process events at their own pace. This helps developers build fast and scalable systems. ## [](#what-is-redpanda)What is Redpanda? Redpanda is an event streaming platform: it provides the infrastructure for streaming real-time data. Producers are client applications that send data to Redpanda in the form of events. Redpanda safely stores these events in sequence and organizes them into topics, which represent a replayable log of changes in the system. Consumers are client applications that subscribe to Redpanda topics to asynchronously read events. Consumers can store, process, or react to the events. Redpanda decouples producers from consumers to allow for asynchronous event processing, event tracking, event manipulation, and event archiving. Producers and consumers interact with Redpanda using the Apache Kafka® API. ![Producers and consumers in a cluster](../../shared/_images/cluster.png) | Event-driven architecture (Redpanda) | Message-driven architecture | | --- | --- | | Producers send events to an event processing system (Redpanda) that acknowledges receipt of the write. This guarantees that the write is durable within the system and can be read by multiple consumers. | Producers send messages directly to each consumer. The producer must wait for acknowledgement that the consumer received the message before it can continue with its processes. | Event streaming lets you extract value out of each event by analyzing, mining, or transforming it for insights. You can: - Take one event and consume it in multiple ways. - Replay events from the past and route them to new processes in your application. - Run transformations on the data in real-time or historically. - Integrate with other event processing systems that use the Kafka API. ## [](#redpanda-differentiators)Redpanda differentiators Redpanda is less complex and less costly than any other commercial mission-critical event streaming platform. It’s fast, it’s easy, and it keeps your data safe. - Redpanda is designed for maximum performance on any data streaming workload. It can scale up to use all available resources on a single machine and scale out to distribute performance across multiple nodes. Built on C++, Redpanda delivers greater throughput and up to 10x lower p99 latencies than other platforms. This enables previously unimaginable use cases that require high throughput, low latency, and a minimal hardware footprint. - Redpanda is packaged as a single binary: it doesn’t rely on any external systems. It’s compatible with the Kafka API, so it works with the full ecosystem of tools and integrations built on Kafka. Redpanda can be deployed on bare metal, containers, or virtual machines in a data center or in the cloud. And Redpanda Console makes it easy to set up, manage, and monitor your clusters. Additionally, Tiered Storage lets you offload log segments to object storage in near real-time, providing long-term data retention and topic recovery. - Redpanda uses the [Raft consensus algorithm](https://raft.github.io/) throughout the platform to coordinate writing data to log files and replicating that data across multiple servers. Raft facilitates communication between the nodes in a Redpanda cluster to make sure that they agree on changes and remain in sync, even if a minority of them are in a failure state. This allows Redpanda to tolerate partial environmental failures and deliver predictable performance, even at high loads. - Redpanda provides data sovereignty. With the Bring Your Own Cloud (BYOC) offering, you deploy Redpanda in your own virtual private cloud, and all data is contained in your environment. Redpanda handles provisioning, monitoring, and upgrades, but you manage your streaming data without Redpanda’s control plane ever seeing it. ## [](#redpanda-self-managed-versions)Redpanda Self-Managed versions You can deploy Redpanda in a self-hosted environment (Redpanda Self-Managed) or as a fully managed cloud service (Redpanda Cloud). Redpanda Self-Managed version numbers follow the convention AB.C.D, where AB is the two-digit year, C is the feature release, and D is the patch release. For example, version 22.3.1 indicates the first patch release on the third feature release of the year 2022. Patch releases include bug fixes and minor improvements, with no change to user-facing behavior. New and enhanced features are documented with each feature release. Redpanda Cloud releases on a continuous basis and uptakes Redpanda Self-Managed versions. --- # Page 405: Partner Integrations **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/partner-integration.md --- # Partner Integrations --- title: Partner Integrations latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: partner-integration page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: partner-integration.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/partner-integration.adoc description: Learn about Redpanda integrations built and supported by our partners. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Learn about Redpanda integrations built and supported by our partners. | Partner | Description | More information | | --- | --- | --- | | Superstream | Superstream optimizes and improves Redpanda (and other Kafka platforms) for cost reduction, increased reliability, and improved visibility. | Superstream for Redpanda | | Aklivity Zilla | Zilla is a multi-protocol proxy that abstracts Redpanda for non-native clients, such as browsers and IoT devices, by exposing Redpanda topics using user-defined REST, Server-Sent Events (SSE), MQTT, or gRPC API entry points. | Modern Eventing with CQRS, Redpanda and Zilla | | Bytewax | Bytewax is an open source framework and distributed stream processing engine in Python. | Enriching streaming data with Bytewax and Redpanda | | ClickHouse | ClickHouse is a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP). | Building an OLAP database with ClickHouse and Redpanda | | Conduktor | Conduktor provides simple, flexible, and powerful tooling for Kafka developers and infrastructure. | Conduktor & Redpanda: Best of breed Kafka experience | | Decodable | Decodable is a real-time data processing platform powered by Apache Flink and Debezium. | Decodable + Redpanda | | ElastiFlow | ElastiFlow captures and analyzes flow and SNMP data to provide detailed insights into network performance and security. | Leveraging Redpanda for Enhanced Network Observability: ElastiFlow Integration | | Materialize | Materialize is a data warehouse purpose-built for operational workloads where an analytical data warehouse would be too slow, and a stream processor would be too complicated. | Ingesting data from Redpanda with Materialize | | PeerDB | PeerDB provides a fast, simple, and cost-effective way to replicate data from Postgres to data warehouses, queues and storage. | Quickstart guide | | Pinecone | Pinecone is a vector database for building accurate and performant AI applications at scale. The Pinecone connector for Redpanda Connect provides a production-ready integration from many existing data sources through simple YAML configuration. | Redpanda Connect integration | | RisingWave | RisingWave is a distributed SQL streaming database that enables simple, efficient, and reliable processing of streaming data. | Ingesting data from Redpanda with Risingwave | | Timeplus | Timeplus is a stream processor that provides powerful end-to-end capabilities, leveraging the open source streaming engine Proton. | Realizing low latency streaming analytics with Timeplus and Redpanda | | Tinybird | Tinybird is a data platform for data and engineering teams to solve complex real-time, operational, and user-facing analytics use cases at any scale. | Building a complete IoT backend with Redpanda and Tinybird | | Quix | Quix is a complete platform for building, deploying, and monitoring stream processing pipelines in Python. | Integrating Redpanda with Quix | | Yugabyte | YugabyteDB is an open-source, distributed SQL database that combines the capabilities of relational databases with the scalability of NoSQL systems. | How to Integrate Yugabyte CDC Connector with Redpanda | ## [](#how-to-contribute-to-this-page)How to contribute to this page To request a partner integration with Redpanda Data, reach out to ([partners@redpanda.com](mailto:partners@redpanda.com\)). Provide a link to your product documentation or a blogpost explaining how your product integrates with Redpanda. After meeting these requirements, you can [contribute to this page](https://github.com/redpanda-data/docs/edit/main/modules/get-started/pages/partner-integration.adoc). --- # Page 406: What’s New in Redpanda Cloud **URL**: https://docs.redpanda.com/redpanda-cloud/get-started/whats-new-cloud.md --- # What’s New in Redpanda Cloud --- title: What’s New in Redpanda Cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: whats-new-cloud page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: whats-new-cloud.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/get-started/pages/whats-new-cloud.adoc description: Summary of new features in Redpanda Cloud. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-04-09" --- This page lists new features added to Redpanda Cloud. ## [](#april-2026)April 2026 ### [](#group-based-access-control-gbac)Group-based access control (GBAC) - With [GBAC in the control plane](../../security/authorization/gbac/gbac/), you can manage access to organization-level resources using OIDC groups from your identity provider. Assign OIDC groups to roles so that users inherit access based on their group membership. - With [GBAC in the data plane](../../security/authorization/gbac/gbac_dp/), you can configure cluster-level permissions for provisioned users at scale using OIDC groups. Because group membership is managed by your identity provider, onboarding and offboarding require no changes in Redpanda. GBAC is available for BYOC and Dedicated clusters. In addition to the predefined roles (including Reader, Writer, and Admin) that you cannot modify or delete, you can now create custom roles. ### [](#increased-serverless-limits-for-redpanda-connect-pipelines-and-mcp-servers)Increased Serverless limits for Redpanda Connect pipelines and MCP servers Serverless clusters now support up to 100 Redpanda Connect pipelines and 100 MCP servers. See [Serverless usage limits](../cluster-types/serverless/#_serverless_usage_limits). ### [](#redpanda-connect-updates)Redpanda Connect updates - The Redpanda Connect pipeline creation and editing workflow has been simplified. The new UI replaces the previous multi-page wizard with a visual pipeline diagram, an IDE-like configuration editor, slash commands for inserting variables, and inline links to component documentation. See the [Redpanda Connect quickstart](../../develop/connect/connect-quickstart/) to try it out. - Processors: - [string\_split](../../develop/connect/components/processors/string_split/): Splits strings into multiple parts using a delimiter, creating new messages or fields for each part. ## [](#march-2026)March 2026 ### [](#redpanda-connect-updates-2)Redpanda Connect updates - Inputs: - [oracledb\_cdc](../../develop/connect/components/inputs/oracledb_cdc/): Stream changes from an Oracle database for Change Data Capture (CDC). - [aws\_cloudwatch\_logs](../../develop/connect/components/inputs/aws_cloudwatch_logs/): Consume log events from AWS CloudWatch Logs. Supports filtering by log streams, CloudWatch filter patterns, and configurable start times. - [aws\_dynamodb\_cdc](../../develop/connect/components/inputs/aws_dynamodb_cdc/): Consume item-level changes from DynamoDB Streams with automatic checkpointing and shard management. - Outputs: - [iceberg](../../develop/connect/components/outputs/iceberg/): Write data to Apache Iceberg tables using the REST catalog. - Bloblang methods: - [`escape_url_path`](../../develop/connect/guides/bloblang/methods/#escape_url_path): Escapes a string for safe use in URL path segments using percent-encoding. - [`parse_logfmt`](../../develop/connect/guides/bloblang/methods/#parse_logfmt): Parses a logfmt-encoded string into an object of key-value pairs. - [`unescape_url_path`](../../develop/connect/guides/bloblang/methods/#unescape_url_path): Unescapes a URL path segment, converting percent-encoded sequences back to their original characters. - Removed components: - `legacy_redpanda_migrator` input and output - `legacy_redpanda_migrator_offsets` input and output - `redpanda_migrator_bundle` input and output Use the unified [`redpanda_migrator`](../../develop/connect/components/inputs/redpanda_migrator/) input and [`redpanda_migrator`](../../develop/connect/components/outputs/redpanda_migrator/) output instead. ### [](#cloud-topics)Cloud Topics [Cloud Topics](../../develop/topics/cloud-topics/) are now available, making it possible to use durable cloud storage (S3, ADLS, GCS) as the primary backing store instead of local disk, eliminating over 90% of cross-AZ replication costs. This makes them ideal for latency-tolerant, high-throughput workloads such as observability streams, analytics pipelines, and AI/ML training data feeds, where cross-AZ networking charges are the dominant cost driver. You can use Cloud Topics exclusively or in combination with standard topics on a cluster supporting low-latency workloads. ### [](#user-based-throughput-quotas)User-based throughput quotas Redpanda now supports throughput quotas based on authenticated user principals. Unlike client-based quotas (which rely on self-declared `client-id` values), [user-based quotas](../../manage/cluster-maintenance/manage-throughput/#set-user-based-quotas) enforce limits using verified identities from SASL, mTLS, or OIDC authentication. You can set quotas for individual users, default users, or fine-grained user/client combinations. ### [](#iceberg-expanded-json-schema-support)Iceberg: Expanded JSON Schema support Redpanda now supports additional JSON Schema patterns when translating to Iceberg tables: - `$ref` support: Internal references using `$ref` (for example, `"$ref": "#/definitions/myType"`) are resolved from schema resources declared in the same document. External references are not yet supported. - Map type from `additionalProperties`: `additionalProperties` objects that contain subschemas now translate to Iceberg `map`. - `oneOf` nullable pattern: The `oneOf` keyword is now supported for the standard nullable pattern if exactly one branch is `{"type":"null"}` and the other is a non-null schema. See [Specify Iceberg Schema](../../manage/iceberg/specify-iceberg-schema/#how-iceberg-modes-translate-to-table-format) for JSON types mapping and updated requirements. ### [](#ordered-rack-preference-for-leader-pinning)Ordered rack preference for leader pinning [Leader pinning](../../develop/produce-data/leader-pinning/) now supports the `ordered_racks` configuration value, which lets you specify preferred racks in priority order. Unlike `racks`, which distributes leaders uniformly across all listed racks, `ordered_racks` places leaders in the highest-priority available rack and fails over to subsequent racks only when higher-priority racks become unavailable. ### [](#cross-region-remote-read-replicas-on-aws)Cross-region Remote Read Replicas on AWS [Remote read replica](../cluster-types/byoc/remote-read-replicas/) topics on AWS can now be deployed in a different region from the origin cluster’s S3 bucket. This enables cross-region disaster recovery and data locality scenarios while maintaining the read-only replication model. ### [](#byovpc-on-aws-ga)BYOVPC on AWS: GA [BYOVPC on AWS](../cluster-types/byoc/aws/vpc-byo-aws/) is now generally available (GA). With Bring Your Own VPC (BYOVPC), you deploy the Redpanda data plane into your own VPC and manage security policies and resources yourself, including subnets, IAM roles, firewall rules, and storage buckets. The Redpanda BYOVPC Terraform Module contains Terraform code that deploys the resources required for a BYOVPC cluster on AWS. Secrets management is enabled by default with the Terraform module. ### [](#iceberg-topics-with-snowflake-open-catalog-ga)Iceberg topics with Snowflake Open Catalog: GA The [Snowflake and Open Catalog integration](../../manage/iceberg/redpanda-topics-iceberg-snowflake-catalog/) for Iceberg topics is now generally available (GA). ### [](#billing-notifications)Billing notifications Redpanda Cloud now sends email notifications to organization admins when credit or commit balances reach spending thresholds (50%, 30%, 10%, and 0% remaining). You can manage your notification preferences or opt out at any time. See [Manage Billing Notifications](../../billing/billing-notifications/). ## [](#february-2026)February 2026 ### [](#agentic-data-plane-adp-la)Agentic Data Plane (ADP): LA Redpanda Agentic Data Plane (ADP) is now available in [limited availability](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#limited-availability) (LA). Redpanda ADP provides enterprise-grade infrastructure for building, deploying, and governing AI agents at scale. Key capabilities include declarative agents, MCP servers backed by 300+ connectors, an AI Gateway with model failover and fiscal controls, and compliance-grade transcripts built on Redpanda’s immutable log. See [Agentic Data Plane Overview](../../ai-agents/adp-overview/). ### [](#serverless-on-aws-ga)Serverless on AWS: GA [Serverless](../cluster-types/serverless/) on AWS is now generally available (GA). This release includes private networking with AWS PrivateLink. You can use the Cloud Console, the Cloud API, or the Redpanda Terraform provider to create and manage Serverless private links. Serverless is the easiest and fastest way to begin streaming data with Redpanda. ### [](#enable-schema-id-validation)Enable schema ID validation You can now enable [schema ID validation](../../manage/schema-reg/schema-id-validation/) by [configuring the `enable_schema_id_validation` cluster property](../../manage/cluster-maintenance/config-cluster/). This controls whether or not Redpanda validates schema IDs in records and which topic properties are enforced. Use caution when enabling this property, because it could cause decompression across topics and increase CPU load. ### [](#cross-region-aws-privatelink)Cross-region AWS PrivateLink AWS PrivateLink now supports cross-region connectivity, allowing clients in different AWS regions to connect to your Redpanda cluster through PrivateLink. Configure supported regions in the [Cloud UI](../../networking/configure-privatelink-in-cloud-ui/#cross-region-privatelink) or using the [Cloud API](../../networking/aws-privatelink/#cross-region-privatelink) to specify which regions can establish PrivateLink connections. This feature requires multi-AZ cluster deployments. ## [](#january-2026)January 2026 ### [](#redpanda-connect-updates-3)Redpanda Connect updates - Inputs: - [otlp\_grpc](../../develop/connect/components/inputs/otlp_grpc/): Receive OpenTelemetry traces, logs, and metrics via OTLP/gRPC protocol. Exposes an OpenTelemetry Collector gRPC receiver that accepts traces, logs, and metrics, converting them to individual Redpanda OTEL v1 protobuf messages optimized for Kafka partitioning. - [otlp\_http](../../develop/connect/components/inputs/otlp_http/): Receive OpenTelemetry traces, logs, and metrics via OTLP/HTTP protocol. Supports both protobuf and JSON formats at standard OTLP endpoints, converting telemetry data to individual messages with embedded Resource and Scope metadata. - Outputs: - [otlp\_grpc](../../develop/connect/components/outputs/otlp_grpc/): Send OpenTelemetry traces, logs, and metrics via OTLP/gRPC protocol. Accepts batches of Redpanda OTEL v1 protobuf messages and converts them to OTLP format for transmission to OpenTelemetry collectors. - [otlp\_http](../../develop/connect/components/outputs/otlp_http/): Send OpenTelemetry traces, logs, and metrics via OTLP/HTTP protocol. Supports both protobuf and JSON content types for flexible integration with OpenTelemetry backends. ### [](#redpanda-connect-and-roles-in-terraform-provider)Redpanda Connect and Roles in Terraform provider The [Redpanda Terraform provider](../../manage/terraform-provider/) now supports managing roles and Redpanda Connect pipelines. Use the provider to create and manage role-based access control and data pipelines in Redpanda Cloud. ## [](#december-2025)December 2025 ### [](#remote-mcp-ga)Remote MCP: GA You can now deploy managed MCP servers directly inside your Redpanda Cloud cluster with [Remote MCP](../../ai-agents/mcp/remote/overview/). Remote MCP servers give AI assistants streaming data capabilities, enabling use cases like real-time data generation, stream processing, and event publishing. ### [](#shadowing)Shadowing Redpanda Cloud now supports [Shadowing](../../manage/disaster-recovery/shadowing/overview/), a disaster recovery solution that provides asynchronous, offset-preserving replication between distinct Redpanda clusters. Shadowing enables cross-region data protection by replicating topic data, configurations, consumer group offsets, ACLs, and Schema Registry data with byte-level fidelity. The shadow cluster operates in read-only mode while continuously receiving updates from the source cluster. During a disaster, you can failover individual topics or an entire shadow link to make resources fully writable for production traffic. Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. ### [](#metrics-for-serverless)Metrics for Serverless You can now view and export metrics from Serverless clusters to third-party monitoring systems like Prometheus and Grafana. See [Monitor Redpanda Cloud](../../manage/monitor-cloud/) for details on configuring monitoring for your Serverless cluster and [Metrics Reference](../../reference/public-metrics-reference/) for a list of metrics available in Serverless. ### [](#user-impersonation)User impersonation BYOC and Dedicated clusters now support unified authentication and authorization between the Redpanda Cloud UI and Redpanda with [user impersonation](../../security/cloud-authentication/#user-impersonation). This means you can authenticate to fine-grained access within Redpanda using the same credentials you use to authenticate to Redpanda Cloud. With user impersonation, the topics users see in the UI are identical to what they can access with the Cloud API or `rpk`, ensuring consistent permissions across all interfaces and clear auditing of data plane user actions. ### [](#redpanda-connect-updates-4)Redpanda Connect updates - Tracers: - [Redpanda](../../develop/connect/components/tracers/redpanda/): The Redpanda tracer exports distributed tracing data to a Redpanda topic, enabling you to monitor and debug your Redpanda Connect pipelines. Traces are exported in OpenTelemetry format as JSON, allowing integration with observability platforms like Jaeger, Grafana Tempo, or custom trace consumers. ## [](#november-2025)November 2025 ### [](#serverless-on-gcp-beta)Serverless on GCP: beta You can now create [Serverless clusters](../cluster-types/serverless/) on Google Cloud Platform (GCP). Serverless on GCP is in a [beta](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#beta) release. ### [](#support-for-additional-regions)Support for additional regions [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions) on Azure now support the Sweden Central and Germany West Central regions. ### [](#connected-client-monitoring)Connected client monitoring You can view details about Kafka client connections using `rpk` or the Data Plane API. This allows you to view detailed information about active client connections on a cluster, and identify and troubleshoot problematic clients. For more information, see the [connected client details](../../manage/cluster-maintenance/manage-throughput/#view-connected-client-details) example in the Manage Throughput guide. ### [](#increased-message-size-limit)Increased message size limit Redpanda Cloud increased the [message size limit](../../develop/topics/create-topic/) on newly-created topics. BYOC and Dedicated clusters have a default message size limit of 20 MiB with a maximum of 32 MiB. Serverless clusters have a default message size limit of 8 MiB with a maximum of 20 MiB. Configure the message size limit with the `max_message_bytes` topic property. The message size setting on existing topics is not changed, but the message size limit on existing topics can only be updated to the new maximum. ### [](#redpanda-connect-updates-5)Redpanda Connect updates Redpanda Connect provides a simplified [quickstart](../../develop/connect/connect-quickstart/) experience in the UI that helps you to start building data pipelines. The quickstart creates pipelines to stream data into and out of Redpanda using the pipeline editor. ### [](#get-started-with-serverless)Get Started with Serverless A Serverless cluster’s **Overview** page now provides a **Get Started** guide to help you start streaming your own data with a [Redpanda Connect](../../develop/connect/connect-quickstart/) pipeline. It lets you stream data into and out of Redpanda without writing producer/consumer code. ### [](#remote-read-replicas-ga)Remote read replicas: GA [Remote read replicas](../cluster-types/byoc/remote-read-replicas/) are now generally available (GA) for BYOC clusters on AWS and GCP. This feature allows you to create read-only topics that mirror a topic on a different cluster, providing greater flexibility and scalability for your data streaming needs. ### [](#schema-registry-and-acls-in-terraform-provider)Schema Registry and ACLs in Terraform provider The [Redpanda Terraform provider](../../manage/terraform-provider/) now supports managing schemas and Schema Registry ACLs. You can use the provider to register schemas in formats such as Avro, Protobuf, or JSON Schema, and control access to Schema Registry subjects and operations through ACLs. ## [](#october-2025)October 2025 ### [](#remote-mcp-beta)Remote MCP: beta Deploy managed MCP servers directly inside your Redpanda Cloud cluster with [Remote MCP](../../ai-agents/mcp/remote/overview/). Unlike the Redpanda Cloud Management MCP Server, Remote MCP servers run within your cluster and can process data streams, generate synthetic data, and publish directly to Redpanda topics. Create custom AI tools using templates or write your own Redpanda Connect configurations to build event-driven workflows. Remote MCP servers provide AI assistants with streaming data capabilities, enabling use cases like real-time data generation, stream processing, and event publishing. Get started with the [quickstart guide](../../ai-agents/mcp/remote/quickstart/) or learn [best practices](../../ai-agents/mcp/remote/best-practices/) for building robust tools. ### [](#api-gateway-access)API Gateway access BYOC and Dedicated clusters with private networking now allow control of API Gateway network access, independent of the Redpanda cluster. When you create a cluster, you can choose either public or private access for the API Gateway: - Public access exposes Redpanda Console, Data Plane API, and MCP Server API endpoints over the internet, although they remain protected by your authentication and authorization controls. - Private access restricts endpoint access to your private network (VPC or VNet) only. After the cluster is created, you can change the API Gateway access on the cluster settings page. If you change from public to private access, users without VPN access to the Redpanda VPC will lose access to these services. ### [](#redpanda-connect-updates-6)Redpanda Connect updates - Inputs: - [Microsoft SQL Server CDC](../../develop/connect/components/inputs/microsoft_sql_server_cdc/): Streams change data from a Microsoft SQL Server database into Redpanda Connect using Change Data Capture (CDC). - Outputs: - [CyborgDB](../../develop/connect/components/outputs/cyborgdb/): Write vectors to a CyborgDB encrypted index. CyborgDB provides end-to-end encrypted vector storage with automatic dimension detection and index optimization. - Processors: - [`jira`](../../develop/connect/components/processors/jira/): Executes Jira API queries based on input messages and returns structured results. The processor handles pagination, retries, and field expansion automatically. - Deprecated components: - `redpanda_migrator` input and output (renamed to `legacy_redpanda_migrator`) - `redpanda_migrator_offsets` input and output (renamed to `legacy_redpanda_migrator_offsets`) Migrate from these deprecated components to the new unified `redpanda_migrator` input/output pair. For detailed migration instructions, see [Migrate to the Unified Redpanda Migrator](../../develop/connect/guides/migrate-unified-redpanda-migrator/). - `redpanda_migrator_bundle` input and output (these are part of the legacy migration architecture and internally depend on the deprecated `legacy_redpanda_migrator` and `legacy_redpanda_migrator_offsets` components) - `kafka`, `kafka_franz`, and `redpanda_common` inputs and outputs. These components have been consolidated into the unified `redpanda` input and output components. Migrate existing configurations to use the new `redpanda` components for continued support and access to the latest features. For detailed information about recent component updates, see [What’s New in Redpanda Connect](../../../redpanda-connect/get-started/whats-new/). ## [](#september-2025)September 2025 ### [](#multi-factor-authentication)Multi-factor authentication Enable multi-factor authentication (MFA) to add an extra layer of security to your Redpanda Cloud account. After you enable MFA, you’ll enter your credentials, then be prompted for a one-time code from your authenticator app when you log in. Administrators can also [enforce MFA](../../security/cloud-authentication/#multi-factor-authentication-mfa) for all members of an organization. ### [](#redpanda-cloud-management-mcp-server-beta)Redpanda Cloud Management MCP Server: beta Connect AI assistants like Claude directly to your Redpanda Cloud account with the new [Redpanda Cloud Management MCP Server](../../ai-agents/mcp/local/overview/). This server runs on your computer and provides AI tools for managing clusters, topics, and other cloud resources through natural language commands. Ask your AI assistant to "Create a new topic called user-events" or "List all clusters in my account" and it will handle the technical details automatically. Get started with the [quickstart guide](../../ai-agents/mcp/local/quickstart/). The Redpanda Cloud Management MCP Server uses the Model Context Protocol (MCP) to extend AI assistants with Redpanda-specific capabilities, making cloud operations more accessible through conversational interfaces. ### [](#automatic-topic-creation-and-topic-limit)Automatic topic creation and topic limit For BYOC and Dedicated clusters, you can now configure the `auto_create_topics_enabled` cluster property to automatically create a topic if a client produces to a non-existent topic. For all clusters: each cluster now has a limit of 40,000 topics. ## [](#august-2025)August 2025 ### [](#manage-custom-resource-tags-in-byoc)Manage custom resource tags in BYOC After cluster creation, you can manage custom cloud provider tags and labels on BYOC and BYOVPC/BYOVNet clusters for [AWS](../cluster-types/byoc/aws/create-byoc-cluster-aws/#manage-custom-tags), [Azure](../cluster-types/byoc/azure/create-byoc-cluster-azure/#manage-custom-tags), and [GCP](../cluster-types/byoc/gcp/create-byoc-cluster-gcp/#manage-custom-resource-labels-and-network-tags) using the Cloud Control Plane API. This involves refreshing Redpanda agent permissions with `rpk cloud byoc` due to new IAM permissions. ### [](#iceberg-topics-with-aws-glue)Iceberg topics with AWS Glue A new [integration with AWS Glue Data Catalog](../../manage/iceberg/iceberg-topics-aws-glue/) allows you to add Redpanda topics as Iceberg tables in your data lakehouse. The AWS Glue catalog integration is available in BYOC clusters with Redpanda version 25.2 and later. See [Integrate with REST Catalogs](../../manage/iceberg/rest-catalog/) for supported Iceberg REST catalog integrations. ### [](#manage-throughput)Manage throughput Redpanda Cloud now lets you [manage throughput](../../manage/cluster-maintenance/manage-throughput/) configuration at the broker and client levels. You can manage client quotas with [`rpk cluster quotas`](../../reference/rpk/rpk-cluster/rpk-cluster-quotas/) or with the Kafka API. When no quotas apply, the client has unlimited throughput. ## [](#july-2025)July 2025 ### [](#iceberg-topics-in-redpanda-cloud-ga)Iceberg topics in Redpanda Cloud: GA [Iceberg topics](../../manage/iceberg/about-iceberg-topics/) are now generally available (GA) in Redpanda Cloud. ### [](#byoc-on-azure-ga)BYOC on Azure: GA [BYOC for Azure](../cluster-types/byoc/azure/create-byoc-cluster-azure/) is now generally available (GA). ### [](#schema-registry-authorization)Schema Registry Authorization You can now use [Schema Registry Authorization](../../manage/schema-reg/schema-reg-authorization/) to control access to Schema Registry subjects and operations. Schema Registry Authorization offers more granular control over who can do what with your Redpanda Schema Registry resources. ACLs used for Schema Registry access also support RBAC roles. ### [](#kafka-connect-disabled-on-new-clusters)Kafka Connect disabled on new clusters [Kafka Connect](../../develop/managed-connectors/) is now disabled by default on all new clusters. To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). If you previously enabled Kafka Connect on a cluster and want to [disable it](../../develop/managed-connectors/disable-kc/), you can use the Cloud API. ### [](#allowlist-nat-gateway-ip)Allowlist NAT gateway IP The [Redpanda NAT gateway IP address](../../networking/cloud-security-network/#nat-gateways) is now provided in the Cloud UI and the Cloud API for BYOC and Dedicated clusters. If necessary, you can use this IP address to allowlist egress traffic from your Redpanda Connect data sources. ### [](#mtls-and-sasl-authentication-for-kafka-api-on-aws)mTLS and SASL authentication for Kafka API on AWS You can now enable mTLS and SASL authentication simultaneously for the Kafka API on AWS clusters. If you enable both mTLS and SASL on AWS clusters, Redpanda creates two distinct listeners: an mTLS listener operating on one port and a SASL listener operating on a different port. See [Authentication](../../security/cloud-authentication/#service-authentication) for details on available authentication methods in Redpanda Cloud. ### [](#azure-private-link-in-the-ui-ga)Azure Private Link in the UI: GA You can now [configure Azure Private Link](../../networking/azure-private-link-in-ui/) for a new BYOC or Dedicated cluster using the Cloud UI. The Azure Private Link service is generally available (GA) in both the Cloud UI and the Cloud API. ### [](#redpanda-connect-in-redpanda-cloud-ga)Redpanda Connect in Redpanda Cloud: GA [Redpanda Connect](../../develop/connect/about/) is now generally available (GA) in all Redpanda Cloud clusters: BYOC (including BYOVPC/BYOVNet), Dedicated, and Serverless. ### [](#redpanda-connect-updates-7)Redpanda Connect updates Redpanda Connect includes the following updates: - The [GCP Spanner CDC](../../develop/connect/components/inputs/gcp_spanner_cdc/) component lets you capture changes from Google Cloud Spanner and stream them into Redpanda. You can use it to ingest data from GCP Spanner databases, enabling real-time data processing and analytics. - The [Slack Reaction](../../develop/connect/components/outputs/slack_reaction/) component lets you send messages to a Slack channel in response to events in Redpanda. You can use it to create alerts, notifications, or other automated responses based on data changes in Redpanda. - The [Redpanda Cache](../../develop/connect/components/caches/redpanda/) component lets you cache data in Redpanda, improving performance and reducing latency for data access. You can use it to store frequently accessed data, such as configuration settings or user profiles, in Redpanda. For more detailed information about recent component updates, see [What’s New in Redpanda Connect](../../../redpanda-connect/get-started/whats-new/). ### [](#serverless-client-connections)Serverless client connections [Serverless](../cluster-types/serverless/) clusters have a new usage limit of 10,000 connections. ## [](#june-2025)June 2025 ### [](#schema-registry-ui-for-serverless)Schema Registry UI for Serverless The [Schema Registry UI](../../manage/schema-reg/schema-reg-ui/) is now available for Serverless clusters. ### [](#amazon-vpc-transit-gateway)Amazon VPC Transit Gateway For BYOC and BYOVPC clusters on AWS, you can set up an [Amazon VPC Transit Gateway](../../networking/byoc/aws/transit-gateway/) to connect VPCs to Redpanda services while maintaining control over network traffic. ### [](#support-for-additional-regions-2)Support for additional regions Serverless clusters now support the following new [regions on AWS](../../reference/tiers/serverless-regions/): ap-northeast-1 (Tokyo), ap-southeast-1 (Singapore), and eu-west-2 (London). ### [](#http-gateway)HTTP gateway The [`gateway`](../../develop/connect/components/inputs/gateway/) component is now available in Redpanda Connect for Redpanda Cloud. This component allows you to create an HTTP endpoint that can receive data from any HTTP client and stream it into Redpanda. You can use the gateway to ingest data from IoT devices, web applications, or any other HTTP-based source. See the [Ingest Real-Time Sensor Telemetry with the HTTP Gateway](../../develop/connect/guides/cloud/gateway/) guide for more information. ## [](#may-2025)May 2025 ### [](#redpanda-connect-for-byovnet-on-azure-beta)Redpanda Connect for BYOVNet on Azure: beta [Redpanda Connect](../../develop/connect/about/) is now enabled when you create a BYOVNet cluster on [Azure](../cluster-types/byoc/azure/vnet-azure/). ### [](#secrets-management-for-byovpc-clusters-on-aws-and-gcp)Secrets management for BYOVPC clusters on AWS and GCP You can now create new BYOVPC clusters with secrets management enabled by default on [AWS](../cluster-types/byoc/aws/vpc-byo-aws/) and [GCP](../cluster-types/byoc/gcp/vpc-byo-gcp/). You can also enable secrets management for existing BYOVPC clusters on AWS and GCP. For GCP, see [Enable Secrets Management for BYOVPC Clusters on GCP](../cluster-types/byoc/gcp/enable-secrets-byovpc-gcp/). For AWS, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). ### [](#serverless-standard-deprecated)Serverless Standard: deprecated Serverless Standard is deprecated. All existing clusters will be migrated to the new [Serverless](../cluster-types/serverless/) platform (with higher usage limits, 99.9% SLA, and additional regions) on August 31, 2025. - Retirement date: August 30, 2025 ### [](#cloud-api-beta-versions-deprecated)Cloud API beta versions: deprecated The Cloud Control Plane API versions v1beta1 and v1beta2, and Data Plane API versions v1alpha1 and v1alpha2 are deprecated. These Cloud API versions will be removed in a future release and are not recommended for use. The deprecation timeline is: - Announcement date: May 27, 2025 - End-of-support date: November 28, 2025 - Retirement date: May 28, 2026 See the [Cloud API Deprecation Policy](/api/doc/cloud-controlplane/topic/topic-deprecation-policy) for more information. ### [](#read-only-cluster-configuration-properties)Read-only cluster configuration properties You can now [view the value of read-only cluster configuration properties](../../manage/cluster-maintenance/config-cluster/#view-cluster-property-values) with `rpk cluster config` or with the Cloud API. Available properties are listed in [Cluster Properties](../../reference/properties/cluster-properties/) and [Object Storage Properties](../../reference/properties/object-storage-properties/). ### [](#iceberg-topics-in-azure-beta)Iceberg topics in Azure (beta) [Iceberg topics](../../manage/iceberg/about-iceberg-topics/) are now supported for BYOC clusters in Azure. ### [](#support-for-additional-region)Support for additional region [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions) on GCP now support the us-west2 (Los Angeles) region. ### [](#redpanda-terraform-provider-ga)Redpanda Terraform provider: GA The [Redpanda Terraform provider](../../manage/terraform-provider/) is now generally available (GA). The provider lets you create and manage resources in Redpanda Cloud, such as clusters, topics, users, ACLs, networks, and resource groups. ## [](#april-2025)April 2025 ### [](#mtls-and-sasl-authentication-for-kafka-api-on-gcp)mTLS and SASL authentication for Kafka API on GCP You can now enable mTLS and SASL authentication simultaneously for the Kafka API on GCP clusters. If you enable both mTLS and SASL on GCP clusters, Redpanda creates two distinct listeners: an mTLS listener operating on one port and a SASL listener operating on a different port. See [Authentication](../../security/cloud-authentication/#service-authentication) for details on available authentication methods in Redpanda Cloud. ### [](#increased-number-of-supported-partitions)Increased number of supported partitions The number of partitions (pre-replication) Redpanda Cloud supports for each [usage tier](../../reference/tiers/) has been doubled. For example, the number of supported partitions in tier 1 went from 1,000 to 2,000, and tier 5 went from 22,800 to 45,600. ### [](#iceberg-topics-beta)Iceberg topics: beta The [Iceberg integration for Redpanda](../../manage/iceberg/about-iceberg-topics/) allows you to store topic data in the cloud in the Iceberg open table format. This makes your streaming data immediately available in downstream analytical systems without setting up and maintaining additional ETL pipelines. You can also integrate your data directly into commonly-used big data processing frameworks, standardizing and simplifying the consumption of streams as tables in a wide variety of data analytics pipelines. Iceberg topics are supported for BYOC clusters in AWS and GCP. ### [](#cluster-configuration)Cluster configuration You can now [configure certain cluster properties](../../manage/cluster-maintenance/config-cluster/) with `rpk cluster config` or with the Cloud API. For example, you can enable and manage [Iceberg topics](../../manage/iceberg/about-iceberg-topics/), [data transforms](../../develop/data-transforms/), and [audit logging](../../manage/audit-logging/). Available properties are listed in [Cluster Configuration Properties](../../reference/properties/cluster-properties/). Iceberg topics properties are available for clusters running Redpanda version 25.1 or later. ### [](#manage-secrets-for-cluster-configuration)Manage secrets for cluster configuration Redpanda Cloud now supports managing secrets that you can reference in cluster properties, for example, to configure Iceberg topics. You can create, update, and delete secrets and reference a secret in cluster properties using `rpk` or the Cloud API. See also: - Manage secrets using [`rpk security secret`](../../reference/rpk/rpk-security/rpk-security-secret/) - Manage secrets using the [Data Plane API](../../manage/api/cloud-dataplane-api/#manage-secrets) - Reference a secret in a cluster property using [`rpk cluster config set`](../../reference/rpk/rpk-cluster/rpk-cluster-config-set/) - Reference a secret in a cluster property using the [Control Plane API](../../manage/cluster-maintenance/config-cluster/) ### [](#data-transforms-ga)Data transforms: GA WebAssembly [data transforms](../../develop/data-transforms/) are now generally available in Redpanda Cloud. Data transforms let you run common data streaming tasks within Redpanda, like filtering, scrubbing, and transcoding. Data transforms are supported for BYOC and Dedicated clusters running Redpanda version 24.3 and later. ### [](#ai-agents-beta)AI agents: beta Redpanda Cloud is starting to introduce beta versions of [AI agents](../../ai-agents/) for enterprise agentic applications driven by a continuous data feed. ### [](#redpanda-connect-for-byovpc-on-aws-and-gcp-beta)Redpanda Connect for BYOVPC on AWS and GCP: beta Redpanda Connect is now enabled when you create a BYOVPC cluster on [AWS](../cluster-types/byoc/aws/vpc-byo-aws/) or [GCP](../cluster-types/byoc/gcp/vpc-byo-gcp/). You can also add Redpanda Connect to an [existing BYOVPC GCP cluster](../cluster-types/byoc/gcp/enable-rpcn-byovpc-gcp/). ## [](#march-2025)March 2025 ### [](#serverless)Serverless For a better customer experience, the Serverless Standard and Serverless Pro products have merged into a single offering. [Serverless clusters](../cluster-types/serverless/) now include the higher usage limits, 99.9% SLA, additional AWS regions, and the free trial. ### [](#cloud-api-ga)Cloud API: GA The Cloud API is now generally available. It includes endpoints for [managing Serverless clusters](../../manage/api/cloud-serverless-controlplane-api/), configuring RBAC in [BYOC](../../manage/api/cloud-byoc-controlplane-api/#manage-rbac), [Serverless](../../manage/api/cloud-serverless-controlplane-api/#manage-rbac), and [Dedicated](../../manage/api/cloud-dedicated-controlplane-api/#manage-rbac) clusters, and [using Redpanda Connect](../../manage/api/cloud-dataplane-api/#use-redpanda-connect). To get started, see the [Redpanda Cloud API overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) or try the [Cloud API Quickstart](/api/doc/cloud-controlplane/topic/topic-quickstart). For full reference documentation, see [Control Plane API](/api/doc/cloud-controlplane/) and [Data Plane API](/api/doc/cloud-dataplane/). ### [](#support-for-additional-regions-3)Support for additional regions [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions) on GCP now support the europe-southwest1 (Madrid) region. ### [](#byovpc-support-in-the-redpanda-terraform-provider-0-14-0-beta)BYOVPC support in the Redpanda Terraform provider 0.14.0: Beta The [Redpanda Terraform provider](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/cluster#byovpc) now supports BYOVPC clusters on AWS and GCP. You can use the provider to create and manage BYOVPC clusters in Redpanda Cloud. ## [](#february-2025)February 2025 ### [](#role-based-access-control-rbac)Role-based access control (RBAC) With [RBAC in the control plane](../../security/authorization/rbac/rbac/), you can manage access to organization-level resources like clusters, resource groups, and networks. For example, you could grant everyone access to clusters in a development resource group while limiting access to clusters in a production resource group. Or, you could limit access to geographically-dispersed clusters in accordance with data residency laws. With [RBAC in the data plane](../../security/authorization/rbac/rbac_dp/), you can configure cluster-level permissions for provisioned users at scale. ### [](#improved-private-service-connect-support-with-az-affinity)Improved Private Service Connect support with AZ affinity The latest version of the Redpanda [GCP Private Service Connect](../../networking/gcp-private-service-connect/) service provides the ability to allow requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs. The service is now fully supported (GA). To upgrade, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). > ❗ **IMPORTANT** > > Deprecated: The original GCP Private Service Connect service is deprecated and will be removed in a future release. ### [](#serverless-pro-usage-limits-increased)Serverless Pro usage limits increased Usage limits for Serverless Pro clusters increased to: ingress = 100 MBps, egress = 300 MBps, partitions = 5000. ### [](#cloud-api-reference)Cloud API reference The Cloud API reference is now provided as separate references for the [Control Plane API](/api/doc/cloud-controlplane/) and [Data Plane APIs](/api/doc/cloud-dataplane/). The Control Plane API and Data Plane APIs follow separate OpenAPI specifications, so the reference is updated to better reflect the structure of the Cloud APIs and to improve usability of the documentation. See also: [Cloud API Overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview). ## [](#january-2025)January 2025 ### [](#new-tiers-and-regions-on-azure)New tiers and regions on Azure [Tiers 1-5](../../reference/tiers/) are now supported for BYOC and Dedicated clusters running on Azure. Also, the following [regions](../../reference/tiers/dedicated-tiers/#dedicated-supported-regions) were added for Dedicated clusters: Central US, East US 2, Norway East. ### [](#serverless-pro-la)Serverless Pro: LA Serverless Pro is a new enterprise-level cluster option. It is similar to Serverless Standard, but with higher usage limits and Enterprise support. This is a limited availability (LA) release. To start using Serverless Pro, contact [Redpanda Sales](https://redpanda.com/try-redpanda?section=enterprise-trial). ### [](#aws-privatelink-ga)AWS PrivateLink: GA AWS PrivateLink is now generally available for private networking in the [Cloud UI](../../networking/configure-privatelink-in-cloud-ui/) and the [Cloud API](../../networking/aws-privatelink/). ## [](#december-2024)December 2024 ### [](#support-for-additional-regions-4)Support for additional regions For [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions), Redpanda added support for the following regions: - GCP: europe-west9 (Paris), southamerica-west1 (Santiago) - AWS: ap-southeast-3 (Jakarta), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-west-3 (Paris) ### [](#redpanda-connect-updates-8)Redpanda Connect updates Redpanda Connect is now available on Dedicated clusters. This is a limited availability (LA) release. [Secret management](../../develop/connect/configuration/secret-management/) is also available on BYOC, Dedicated, and Serverless clusters so that you can add secrets to your pipelines without exposing them. ### [](#leader-pinning)Leader pinning For a Redpanda cluster deployed across multiple availability zones (AZs), [leader pinning](../../develop/produce-data/leader-pinning/) ensures that a topic’s partition leaders are geographically closer to clients. Leader pinning can lower networking costs and help guarantee lower latency by routing produce and consume requests to brokers located in certain AZs. ## [](#november-2024)November 2024 ### [](#byovpc-on-aws-beta)BYOVPC on AWS: beta With standard BYOC clusters, Redpanda manages security policies and resources for your VPC, including subnetworks, service accounts, IAM roles, firewall rules, and storage buckets. For the highest level of security, you can manage these resources yourself with a [BYOVPC on AWS](../cluster-types/byoc/aws/vpc-byo-aws/), previously known as _customer-managed VPC_. ### [](#customer-managed-vnet-on-azure-la)Customer-managed VNet on Azure: LA With standard BYOC clusters, Redpanda manages security policies and resources for your virtual network (VNet), including subnetworks, managed identities, IAM roles, security groups, and storage accounts. For the highest level of security, you can manage these resources yourself with a [customer-managed VNet on Azure](../cluster-types/byoc/azure/vnet-azure/). Because Azure functionality is provided in limited availability, to unlock this feature, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#october-2024)October 2024 ### [](#byoc-support-in-the-terraform-provider-0-10)BYOC support in the Terraform provider 0.10 The [Terraform provider](../../manage/terraform-provider/) now supports BYOC clusters. You can use the provider to create and manage BYOC clusters in Redpanda Cloud. ### [](#azure-marketplace-for-dedicated-clusters)Azure Marketplace for Dedicated clusters You can contact [Redpanda sales](https://redpanda.com/try-redpanda?section=enterprise-trial) to request a private offer for monthly or annual [committed use through the Azure Marketplace](../../billing/azure-commit/). You can then quickly provision Dedicated clusters in Redpanda Cloud, and you can view your bills and manage your subscription directly in Azure Marketplace. ### [](#support-for-aws-graviton3)Support for AWS Graviton3 Redpanda now supports compute-optimized tiers with AWS Graviton3 processors. This saves over 50% in instance costs in all [BYOC tiers](../../reference/tiers/byoc-tiers/). ### [](#redpanda-terraform-provider-for-redpanda-cloud-beta)Redpanda Terraform Provider for Redpanda Cloud: beta The [Redpanda Terraform provider](../../manage/terraform-provider/) lets you create and manage resources in Redpanda Cloud, such as clusters, topics, users, ACLs, networks, and resource groups. ## [](#september-2024)September 2024 ### [](#schedule-maintenance-windows)Schedule maintenance windows Redpanda Cloud now offers greater flexibility to schedule upgrades to your cluster. By default, Redpanda Cloud may run maintenance operations on any day at any time. You can override this default and \* [schedule a maintenance window](../../manage/maintenance/#maintenance-windows), which requires Redpanda Cloud to run operations on your specified day and time. ### [](#redpanda-connect-la-for-byoc-beta-for-serverless)Redpanda Connect: LA for BYOC, beta for Serverless [Redpanda Connect](../../develop/connect/about/) is now integrated into Redpanda Cloud and available as a fully-managed service. This is a limited availability (LA) release for BYOC and a beta release for Serverless. [Choose from a range of connectors, processors, and other components](../../develop/connect/components/about/) to quickly build and deploy streaming data pipelines or AI applications from the [Cloud UI](../../develop/connect/connect-quickstart/) or using the [Data Plane API](/api/doc/cloud-dataplane/group/endpoint-redpanda-connect-pipeline). Comprehensive metrics, monitoring, and per pipeline scaling are also available. To start using Redpanda Connect, [try this quickstart](../../develop/connect/connect-quickstart/). For more detailed information about recent component updates, see [What’s New in Redpanda Connect](../../../redpanda-connect/get-started/whats-new/). ### [](#dedicated-on-azure-la)Dedicated on Azure: LA Redpanda now supports [Dedicated clusters on Azure](../cluster-types/create-dedicated-cloud-cluster/). This is a limited availability (LA) release for Dedicated clusters. ### [](#remote-read-replicas-on-customer-managed-vpc)Remote read replicas on customer-managed VPC The beta release of [remote read replicas](../cluster-types/byoc/remote-read-replicas/) has been extended to support customer-managed VPC deployments. ## [](#july-2024)July 2024 ### [](#redpanda-cloud-docs)Redpanda Cloud docs The [Redpanda Docs site](https://docs.redpanda.com/home/) has been redesigned for an easier experience navigating Redpanda Cloud docs. We hope that our docs help and inspire our users. Please share your feedback with the links at the bottom of any doc page. ### [](#byoc-on-azure-la)BYOC on Azure: LA Redpanda now supports [BYOC clusters on Azure](../cluster-types/byoc/azure/create-byoc-cluster-azure/). This is a limited availability (LA) release for BYOC clusters. ### [](#enhancements-to-serverless-la)Enhancements to Serverless: LA - The [Redpanda Cloud API](../../manage/api/cloud-serverless-controlplane-api/) now includes support for [Serverless](../cluster-types/serverless/). - The Redpanda Schema Registry API is now exposed for Serverless. - Serverless subscriptions can now see detailed billing activity on the **Billing** page. - Serverless added a 99.5% uptime [SLA](https://www.redpanda.com/legal/redpanda-cloud-service-level-agreement) (service level agreement). ### [](#self-service-sign-up-for-dedicated-on-aws-marketplace)Self service sign up for Dedicated on AWS Marketplace To start using Dedicated, sign up on the [AWS Marketplace](../../billing/aws-pay-as-you-go/). New subscriptions receive $300 (USD) in free credits to spend in the first 30 days. AWS Marketplace charges for anything beyond $300, unless you cancel the subscription. After your credits have been used, you can continue using your cluster without any commitment, only paying for what you consume. ### [](#support-for-additional-regions-5)Support for additional regions For [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions) and [Dedicated clusters](../../reference/tiers/dedicated-tiers/#dedicated-supported-regions), Redpanda added support for the following regions: - GCP: asia-east1 (Taiwan), asia-northeast1 (Tokyo), southamerica-east1 (São Paulo) - AWS: ap-east-1 (Hong Kong), ap-northeast-1 (Tokyo), me-central-1 (UAE) ## [](#june-2024)June 2024 ### [](#remote-read-replica-topics-on-byoc-beta)Remote read replica topics on BYOC: beta You can now create [remote read replica topics](../cluster-types/byoc/remote-read-replicas/) on a BYOC cluster with the Cloud API. A remote read replica topic is a read-only topic that mirrors a topic on a different cluster. It can serve any consumer, without increasing the load on the source cluster. ### [](#higher-connection-limits-in-usage-tiers)Higher connection limits in usage tiers Redpanda has increased the number of client connections in all [tiers](../../reference/tiers/byoc-tiers/). For example, tier 1 now supports up to 9,000 maximum connections, and tier 9 supports up to 450,000 maximum connections. Connections are regulated per broker for best performance. ## [](#may-2024)May 2024 ### [](#cloud-api-beta)Cloud API: beta The Cloud API allows you to programmatically manage clusters and resources in your Redpanda Cloud organization. For more information, see the [Cloud API Quickstart](/api/doc/cloud-controlplane/topic/topic-quickstart), the [Cloud API Overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview), and the full [Control Plane API](/api/doc/cloud-controlplane/) and [Data Plane API](/api/doc/cloud-dataplane/) reference documentation. ### [](#mtls-authentication-for-kafka-api-clients)mTLS authentication for Kafka API clients mTLS authentication is now available for Kafka API clients. You can [enable mTLS](../../security/cloud-authentication/#mtls) for your cluster using the Cloud API. ### [](#manage-private-connectivity-in-the-ui)Manage private connectivity in the UI You can now manage GCP Private Service Connect and AWS PrivateLink connections to your BYOC or Dedicated cluster on the **Cluster settings** page in Redpanda Cloud. See the steps for [PrivateLink](../../networking/configure-privatelink-in-cloud-ui/) and [Private Service Connect](../../networking/configure-private-service-connect-in-cloud-ui/). ### [](#single-message-transforms)Single message transforms Redpanda now provides [single message transforms (SMTs)](../../develop/managed-connectors/transforms/) to help you modify data as it passes through a connector, without needing additional stream processors. ### [](#support-for-additional-regions-6)Support for additional regions - For [BYOC clusters](../../reference/tiers/byoc-tiers/#byoc-supported-regions), Redpanda added support for the GPC us-west1 region (Oregon) and the AWS ap-south-1 region (Mumbai). - For [Dedicated clusters](../../reference/tiers/dedicated-tiers/#dedicated-supported-regions), Redpanda added support for the AWS ap-south-1 region. ### [](#simplified-navigation-and-namespaces-renamed-resource-groups)Simplified navigation and namespaces renamed resource groups Redpanda Cloud has a simplified navigation, with clusters and networks available at the top level. It now has a global view of all resources in your organization. Namespaces are now called [resource groups](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#resource-group), although the functionality remains the same. ## [](#april-2024)April 2024 ### [](#additional-cloud-tiers-for-byoc)Additional cloud tiers for BYOC When you create a BYOC or Dedicated cluster, you select a [cloud tier](../../reference/tiers/byoc-tiers/) with the expected usage for your cluster, including the maximum ingress, egress, partitions (pre-replication), and connections. Redpanda has added tiers 8 and 9 for BYOC clusters, which provide higher supported configurations. ## [](#march-2024)March 2024 ### [](#serverless-limited-availability)Serverless: limited availability [Redpanda Serverless](../cluster-types/serverless/) moved out of beta and into limited availability (LA). This means that it has usage limits. During LA, existing clusters can scale to the usage limits, but new clusters may need to wait for availability. Serverless is the fastest and easiest way to start data streaming. It is a production-ready deployment option with automatically-scaling clusters available instantly. To start using Serverless, [sign up for a free trial](https://redpanda.com/try-redpanda/cloud-trial#serverless). This is no base cost, and with pay-as-you-go billing after the trial, you only pay for what you consume. ### [](#authentication-with-sso)Authentication with SSO Redpanda Cloud now supports OpenID Connect (OIDC) integration, so administrators can leverage existing identity providers for user authentication to your Redpanda organization with [single sign-on](../../security/cloud-authentication/#single-sign-on) (SSO). Redpanda uses OIDC to delegate the authentication process to an external IdP, such as Okta. To enable this for your account, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#february-2024)February 2024 ### [](#aws-privatelink)AWS PrivateLink [AWS PrivateLink](../../networking/aws-privatelink/) is now available as an easy and highly secure way to connect to Redpanda Cloud from your VPC. You can set up the PrivateLink endpoint service for a new cluster or an existing cluster. To enable AWS PrivateLink for your account, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ### [](#additional-cloud-tiers)Additional cloud tiers When you create a cluster, you select a [cloud tier](../../reference/tiers/byoc-tiers/) with the expected throughput for your cluster, including the maximum ingress, egress, partitions, and connections. On February 5, Redpanda added tiers 6 and 7 for BYOC clusters, which provide higher throughput limits. ## [](#january-2024)January 2024 ### [](#usage-based-billing-in-marketplace)Usage-based billing in marketplace Redpanda Cloud now supports [usage-based billing](../../billing/billing/) for Dedicated clusters. Contact [Redpanda sales](https://redpanda.com/try-redpanda?section=enterprise-trial) to request a private offer for monthly or annual committed use. You can then use existing Google Cloud Marketplace or AWS Marketplace credits to quickly provision Dedicated Cloud clusters, and you can view your bills and manage your subscription directly in the marketplace. ## [](#december-2023)December 2023 ### [](#serverless-clusters-beta)Serverless clusters: beta [Redpanda Serverless](../cluster-types/serverless/) is a managed streaming service (Kafka API) that completely abstracts users from scaling and operational concerns, and you only pay for what you consume. It’s the fastest and easiest way to start event streaming in the cloud. You can try the beta release of Redpanda Serverless with a free trial. ## [](#november-2023)November 2023 ### [](#aws-byoc-support-for-arm-based-graviton2)AWS BYOC support for ARM-based Graviton2 BYOC clusters on AWS now support ARM-based Graviton2 instances. This lowers VM costs and supports increased partition count. ### [](#iceberg-sink-connector)Iceberg Sink connector With the [managed connector for Apache Iceberg](../../develop/managed-connectors/create-iceberg-sink-connector/), you can write data into Iceberg tables. This enables integration with the data lake ecosystem and efficient data management for complex analytics. ### [](#schema-registry-management)Schema Registry management In the Redpanda Console UI, you can [perform Schema Registry operations](../../manage/schema-reg/schema-reg-ui/), such as registering a schema, creating a new version of it, and configuring compatibility. The **Schema Registry** page lists verified schemas, including their serialization format and versions. Select an individual schema to see which topics it applies to. ### [](#maintenance-windows)Maintenance windows With maintenance windows, you have greater flexibility to plan upgrades to your cluster. By default, Redpanda Cloud upgrades take place on Tuesdays. Optionally, on the **Cluster settings** page, you can select a window of specific off-hours for your business for Redpanda to apply updates. All times are in Coordinated Universal Time (UTC). Updates may start at any time during that window. --- # Page 407: Redpanda Cloud Documentation **URL**: https://docs.redpanda.com/redpanda-cloud/home.md --- # Redpanda Cloud Documentation --- title: Redpanda Cloud Documentation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/home/pages/index.adoc description: Home page for the Redpanda Cloud docs. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-09-04" --- ## Overview Redpanda Cloud is a complete event streaming platform delivered as a fully-managed service. Select from different cluster options to meet your unique requirements for data sovereignty, infrastructure operations, and development teams. [Learn more](../get-started/cloud-overview/) ## Deploy[](#home-primary-title) [ ### Serverless Clusters hosted in Redpanda Cloud. This is the fastest and easiest way to start data streaming. Get started ](../get-started/cluster-types/serverless/) --- # Page 408: Manage **URL**: https://docs.redpanda.com/redpanda-cloud/manage.md --- # Manage --- title: Manage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/index.adoc description: Manage Redpanda. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Redpanda CLI](rpk/) The `rpk` tool is a single binary application that provides a way to interact with your Redpanda clusters from the command line. - [Cluster Maintenance](cluster-maintenance/) Learn about cluster maintenance and configuration properties. - [Mountable Topics](mountable-topics/) Safely attach and detach Tiered Storage topics to and from a cluster. - [Integrate Redpanda with Iceberg](iceberg/) Generate Iceberg tables for your Redpanda topics for data lakehouse access. - [Schema Registry](schema-reg/) Redpanda's Schema Registry provides the interface to store and manage event schemas. - [Disaster Recovery](disaster-recovery/) Learn about disaster recovery options for Redpanda Cloud. - [Redpanda Cloud API](api/) Use REST APIs to manage Redpanda Cloud resources. - [Redpanda Terraform Provider](terraform-provider/) Use the Redpanda Terraform provider to create and manage Redpanda Cloud resources. - [Monitor Redpanda Cloud](monitor-cloud/) Learn how to configure monitoring on your BYOC or Dedicated cluster to maintain system health and optimize performance. --- # Page 409: Redpanda Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api.md --- # Redpanda Cloud API --- title: Redpanda Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/index.adoc description: Use REST APIs to manage Redpanda Cloud resources. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-03-20" --- - [Use the Control Plane API](controlplane/) Use the Control Plane API to manage resources in your Redpanda Cloud organization. - [Use the Data Plane APIs](cloud-dataplane-api/) Use the Data Plane APIs to manage your Redpanda Cloud clusters. --- # Page 410: Use the Control Plane API with BYOC **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api/cloud-byoc-controlplane-api.md --- # Use the Control Plane API with BYOC --- title: Use the Control Plane API with BYOC latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/cloud-byoc-controlplane-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/cloud-byoc-controlplane-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/cloud-byoc-controlplane-api.adoc description: Use the Control Plane API to manage resources in your Redpanda Cloud BYOC environment. page-git-created-date: "2024-08-01" page-git-modified-date: "2025-03-20" --- The Redpanda Cloud API is a collection of REST APIs that allow you to interact with different parts of Redpanda Cloud. The Control Plane API enables you to programmatically manage your organization’s Redpanda infrastructure outside of the Cloud UI. You can call the API endpoints directly, or use tools like Terraform or Python scripts to automate cluster management. See [Control Plane API](/api/doc/cloud-controlplane/) for the full API reference documentation. ## [](#control-plane-api)Control Plane API The Control Plane API is one central API that allows you to provision clusters, networks, and resource groups. The Control Plane API consists of the following endpoint groups: - [Clusters](/api/doc/cloud-controlplane/group/endpoint-clusters) - [Networks](/api/doc/cloud-controlplane/group/endpoint-networks) - [Operations](/api/doc/cloud-controlplane/group/endpoint-operations) - [Resource Groups](/api/doc/cloud-controlplane/group/endpoint-resource-groups) - [Control Plane Role Bindings](/api/doc/cloud-controlplane/group/endpoint-control-plane-role-bindings) - [Control Plane Users](/api/doc/cloud-controlplane/group/endpoint-control-plane-users) - [Control Plane Service Accounts](/api/doc/cloud-controlplane/group/endpoint-control-plane-service-accounts) ## [](#lro)Long-running operations Some endpoints do not directly return the resource itself, but instead return an operation. The following is an example response of [`POST /clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster): ```bash { "operation": { "id": "cqfc6vdmvio001r4vu4", "metadata": { "@type": "type.googleapis.com/redpanda.api.controlplane.v1.CreateClusterMetadata", "cluster_id": "cqg168balf4e4pm8ptu" }, "state": "STATE_IN_PROGRESS", "started_at": "2024-07-23T20:31:29.948Z", "type": "TYPE_CREATE_CLUSTER", "resource_id": "cqg168balf4e4pm8ptu" } } ``` The response object represents the long-running operation of creating a cluster. Cluster creation is an example of an operation that can take a longer period of time to complete. ### [](#check-operation-state)Check operation state To check the progress of an operation, make a request to the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint using the operation ID as a parameter: ```bash curl -H "Authorization: Bearer " https://api.redpanda.com/v1/operations/ ``` > 💡 **TIP** > > When using a shell substitution variable for the token, use double quotes to wrap the header value. The response contains the current state of the operation: `IN_PROGRESS`, `COMPLETED`, or `FAILED`. ## [](#cluster-tiers)Cluster tiers When you create a BYOC or Dedicated cluster, you select a usage tier. Each tier provides tested and guaranteed workload configurations for throughput, partitions (pre-replication), and connections. Availability depends on the region and the cluster type. See the full list of regions, zones, and tiers available with each provider in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). ## [](#create-a-cluster)Create a cluster To create a new cluster, first create a resource group and network, if you have not already done so. ### [](#create-a-resource-group)Create a resource group Create a resource group by making a POST request to the [`/v1/resource-groups`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_createresourcegroup) endpoint. Pass a name for your resource group in the request body. ```bash curl -H 'Content-Type: application/json' \ -H "Authorization: Bearer " \ -d '{ "resource_group": { "name": "" } }' -X POST https://api.redpanda.com/v1/resource-groups ``` A resource group ID is returned. Pass this ID later when you call the Create Cluster endpoint. ### [](#create-a-network)Create a network Create a network by making a request to [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork). Choose a [CIDR range](../../../networking/cidr-ranges/) that does not overlap with your existing VPCs or your Redpanda network. ```bash curl -d \ '{ "network": { "cidr_block": "10.0.0.0/20", "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_BYOC", "name": "", "resource_group_id": "", "region": "us-west1" } }' -H "Content-Type: application/json" \ -H "Authorization: Bearer " -X POST https://api.redpanda.com/v1/networks ``` This endpoint returns a [long-running operation](#lro). ### [](#create-a-new-cluster)Create a new cluster After the network is created, make a request to the [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) with the resource group ID and network ID in the request body. ```bash curl -d \ '{ "cluster": { "cloud_provider": "CLOUD_PROVIDER_GCP", "connection_type": "CONNECTION_TYPE_PUBLIC", "name": "my-new-cluster", "resource_group_id": "", "network_id": "", "region": "us-west1", "throughput_tier": "tier-1-gcp-um4g", "type": "TYPE_BYOC", "zones": [ "us-west1-a", "us-west1-b", "us-west1-c" ], "cluster_configuration": { "custom_properties": { "audit_enabled":true } } } }' -H "Content-Type: application/json" \ -H "Authorization: Bearer " -X POST https://api.redpanda.com/v1/clusters ``` The Create Cluster endpoint returns a [long-running operation](#lro). When the operation completes, you can retrieve cluster details by calling [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster), and passing the cluster ID as a parameter. #### [](#additional-steps-to-create-a-byoc-cluster)Additional steps to create a BYOC cluster 1. Ensure that you have installed `rpk`. 2. After making a Create Cluster request, run `rpk cloud byoc`. Pass `metadata.cluster_id` from the Create Cluster response: ##### AWS ```bash rpk cloud byoc aws apply --redpanda-id= ``` ##### Azure ```bash rpk cloud byoc azure apply --redpanda-id= --subscription-id= ``` ##### GCP ```bash rpk cloud byoc gcp apply --redpanda-id= --project-id= ``` ## [](#update-cluster-configuration)Update cluster configuration To update your cluster configuration properties, make a request to the [`PATCH /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint, passing the cluster ID as a parameter. Include the properties to update in the request body. ```bash curl -H "Authorization: Bearer " \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{ "cluster_configuration": { "custom_properties": { "iceberg_enabled":true, "iceberg_catalog_type":"rest" } } }' -X PATCH "https://api.cloud.redpanda.com/v1/clusters/" ``` The Update Cluster endpoint returns a [long-running operation](#lro). [Check the operation state](#check-operation-state) to verify that the update is complete. ## [](#delete-a-cluster)Delete a cluster To delete a cluster, make a request to the [`DELETE /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_deletecluster) endpoint, passing the cluster ID as a parameter. This is a [long-running operation](#lro). ```bash curl -H "Authorization: Bearer " -X DELETE https://api.redpanda.com/v1/clusters/ ``` ### [](#additional-steps-to-delete-a-byoc-cluster)Additional steps to delete a BYOC cluster 1. Make a request to [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) to check the state of the cluster. Wait until the state is `STATE_DELETING_AGENT`. 2. After the state changes to `STATE_DELETING_AGENT`, run `rpk cloud byoc` to destroy the agent. #### AWS ```bash rpk cloud byoc aws destroy --redpanda-id= ``` #### Azure ```bash rpk cloud byoc azure destroy --redpanda-id= ``` #### GCP ```bash rpk cloud byoc gcp destroy --redpanda-id= --project-id= ``` 3. When the cluster is deleted, the delete operation’s state changes to `STATE_COMPLETED`. At this point, you may make a DELETE request to the [`/v1/networks/{id}`](/api/doc/cloud-controlplane/operation/operation-networkservice_deletenetwork) endpoint to delete the network. This is a long running operation. 4. Optional: After the network is deleted, make a request to [`DELETE /v1/resource-groups/{id}`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_deleteresourcegroup) to delete the resource group. ## [](#manage-rbac)Manage RBAC You can also use the Control Plane API to manage [RBAC configurations](../../../security/authorization/rbac/rbac/). ### [](#list-role-bindings)List role bindings To see role assignments for IAM user and service accounts, make a GET request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_listrolebindings) endpoint. ```bash curl https://api.redpanda.com/v1/role-bindings?filter.role_name=&filter.scope.resource_type=SCOPE_RESOURCE_TYPE_CLUSTER \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-role-binding)Get role binding To see roles assignments for a specific IAM account, make a GET request to the [`/v1/role-bindings/{id}`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_getrolebinding) endpoint, passing the role binding ID as a parameter. ```bash curl "https://api.redpanda.com/v1/role-bindings/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-user)Get user To see details of an IAM user account, make a GET request to the [`/v1/users/{id}`](/api/doc/cloud-controlplane/operation/operation-userservice_getuser) endpoint, passing the user account ID as a parameter. ```bash curl "https://api.redpanda.com/v1/users/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#create-role-binding)Create role binding To assign a role to an IAM user or service account, make a POST request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_createrolebinding) endpoint. Specify the role and scope, which includes the specific resource ID and an optional resource type, in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/role-bindings" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "role_name": "", "account_id": "", "scope": { "resource_type": "SCOPE_RESOURCE_TYPE_CLUSTER", "resource_id": "" } }' ``` For ``, use one of roles listed in [Predefined roles](../../../security/authorization/rbac/rbac/#predefined-roles) (`Reader`, `Writer`, `Admin`). ### [](#create-service-account)Create service account > 📝 **NOTE** > > Service accounts are assigned the Admin role for all resources in the organization. To create a new service account, make a POST request to the [`/v1/service-accounts`](/api/doc/cloud-controlplane/operation/operation-serviceaccountservice_createserviceaccount) endpoint, with a service account name and optional description in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/service-accounts" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "service_account": { "name": "", "description": "" } }' ``` ## [](#next-steps)Next steps - [Use the Data Plane APIs](../cloud-dataplane-api/) --- # Page 411: Use the Data Plane APIs **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api/cloud-dataplane-api.md --- # Use the Data Plane APIs --- title: Use the Data Plane APIs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/cloud-dataplane-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/cloud-dataplane-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/cloud-dataplane-api.adoc description: Use the Data Plane APIs to manage your Redpanda Cloud clusters. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-20" --- The Redpanda Cloud API is a collection of REST APIs that allow you to interact with different parts of Redpanda Cloud. The Data Plane APIs enable you to programmatically manage the resources within your clusters, including topics, users, access control lists (ACLs), and connectors. You can call the API endpoints directly, or use tools like Terraform or Python scripts to automate resource management. See [Data Plane API](/api/doc/cloud-dataplane/) for the full Data Plane API reference documentation. The [data plane](/api/doc/cloud-dataplane/topic/topic-cloud-api-overview#topic-cloud-api-architecture) contains the actual Redpanda clusters. Every cluster is its own data plane, and so it has its own distinct [Data Plane API URL](/api/doc/cloud-dataplane/topic/topic-cloud-api-overview#topic-data-plane-apis-url). ## [](#get-data-plane-api-url)Get Data Plane API URL ### BYOC or Dedicated To retrieve the Data Plane API URL of a cluster, make a request to the [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) endpoint of the Control Plane API. ### Serverless To retrieve the Data Plane API URL of a cluster, make a request to the [`GET /v1/serverless/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_getserverlesscluster) endpoint of the Control Plane API. The response includes a `dataplane_api.url` value: ```bash "id": "....", "name": "my-cluster", .... "dataplane_api": { "url": "https://api-xyz.abc.fmc.ppd.cloud.redpanda.com" }, ... ``` ## [](#data-plane-apis)Data Plane APIs ### [](#create-a-user)Create a user To create a new user in your Redpanda cluster, make a POST request to the [`/v1/users`](/api/doc/cloud-dataplane/operation/operation-userservice_createuser) endpoint, including the SASL mechanism, username, and password in the request body: ```bash curl -X POST "https:///v1/users" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"mechanism":"SASL_MECHANISM_SCRAM_SHA_256","name":"payment-service","password":"secure-password"}' ``` > 💡 **TIP** > > When using a shell substitution variable for the token, use double quotes to wrap the header value. The success response returns the newly-created username and SASL mechanism: { "user": { "name": "payment-service", "mechanism": "SASL\_MECHANISM\_SCRAM\_SHA\_256" } } ### [](#create-an-acl)Create an ACL To create a new ACL in your Redpanda cluster, make a [`POST /v1/acls`](/api/doc/cloud-dataplane/operation/operation-aclservice_createacl) request. The following example ACL allows all operations on any Redpanda topic for a user with the name `payment-service`. ```bash curl -X POST "https:///v1/acls" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"host":"*","operation":"OPERATION_ALL","permission_type":"PERMISSION_TYPE_ALLOW","principal":"User:payment-service","resource_name":"*","resource_pattern_type":"RESOURCE_PATTERN_TYPE_LITERAL","resource_type":"RESOURCE_TYPE_TOPIC"}' ``` The success response is empty, with a 201 status code. {} ### [](#create-a-topic)Create a topic To create a new Redpanda topic without specifying any further parameters, such as the desired topic-level configuration or partition count, make a POST request to [`/v1/topics`](/api/doc/cloud-dataplane/operation/operation-topicservice_createtopic) endpoint: ```bash curl -X POST "/v1/topics" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"name":""}' ``` ### [](#manage-secrets)Manage secrets Secrets are stored externally in your cloud provider’s secret management service. Redpanda fetches the secrets when you reference them in cluster properties. #### [](#create-a-secret)Create a secret Make a request to [`POST /v1/secrets`](/api/doc/cloud-dataplane/operation/operation-secretservice_createsecret). You must use a Base64-encoded secret. ```bash curl -X POST "https:///v1/secrets" \ -H "accept: application/json" \ -H "authorization: Bearer " \ -H "content-type: application/json" \ -d '{"id":"","scopes":["SCOPE_REDPANDA_CLUSTER"],"secret_data":""}' ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The API key you generated during authentication. - ``: The name of the secret you want to add. Use only the following characters: `^[A-Z][A-Z0-9_]*$`. - ``: The Base64-encoded secret. - This scope: `"SCOPE_REDPANDA_CLUSTER"`. The response returns the name and scope of the secret. You can then use the Control Plane API or `rpk` to [set a cluster property value](../../cluster-maintenance/config-cluster/) to reference a secret, using the secret name. For the Control Plane API, you must use the following notation with the secret name in the request body to correctly reference the secret: ```bash "iceberg_rest_catalog_client_secret": "${secrets.}" ``` #### [](#update-a-secret)Update a secret Make a request to [`PUT /v1/secrets/{id}`](/api/doc/cloud-dataplane/operation/operation-secretservice_updatesecret). You can only update the secret value, not its name. You must use a Base64-encoded secret. ```bash curl -X PUT "https:///v1/secrets/" \ -H "accept: application/json" \ -H "authorization: Bearer " \ -H "content-type: application/json" \ -d '{"scopes":["SCOPE_REDPANDA_CLUSTER"],"secret_data":""}' ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The name of the secret you want to update. The secret’s name is also its ID. - ``: The API key you generated during authentication. - This scope: `"SCOPE_REDPANDA_CLUSTER"`. - ``: Your new Base64-encoded secret. The response returns the name and scope of the secret. It might take several minutes for the new secret value to propagate to any cluster properties that reference it. #### [](#delete-a-secret)Delete a secret Before you delete a secret, make sure that you remove references to it from your cluster configuration. Make a request to [`DELETE /v1/secrets/{id}`](/api/doc/cloud-dataplane/operation/operation-secretservice_deletesecret). ```bash curl -X DELETE "https:///v1/secrets/" \ -H "accept: application/json" \ -H "authorization: Bearer " \ ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The name of the secret you want to delete. - ``: The API key you generated during authentication. ### [](#use-redpanda-connect)Use Redpanda Connect Use the API to manage [Redpanda Connect pipelines](../../../develop/connect/about/) in Redpanda Cloud. > 📝 **NOTE** > > The Pipeline APIs for Redpanda Connect are supported in BYOC and Serverless clusters only. #### [](#get-redpanda-connect-pipeline)Get Redpanda Connect pipeline To get details of a specific pipeline, make a [`GET /v1/redpanda-connect/pipelines/{id}`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_getpipeline) request. ```bash curl "https:///v1/redpanda-connect/pipelines/" ``` #### [](#stop-a-redpanda-connect-pipeline)Stop a Redpanda Connect pipeline To stop a running pipeline, make a [`PUT /v1/redpanda-connect/pipelines/{id}/stop`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_stoppipeline) request. ```bash curl -X PUT "https:///v1/redpanda-connect/pipelines//stop" ``` #### [](#start-a-redpanda-connect-pipeline)Start a Redpanda Connect pipeline To start a previously stopped pipeline, make a [`PUT /v1/redpanda-connect/pipelines/{id}/start`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_startpipeline) request. ```bash curl -X PUT "https:///v1/redpanda-connect/pipelines//start" ``` #### [](#update-a-redpanda-connect-pipeline)Update a Redpanda Connect pipeline To update a pipeline, make a [`PUT /v1/redpanda-connect/pipelines/{id}`](/api/doc/cloud-dataplane/operation/operation-redpandaconnectservice_updatepipeline) request. You update a pipeline configuration to scale resources, for example the number of CPU cores and amount of memory allocated. ```bash curl -X PUT "https://api.redpanda.com/v1/redpanda-connect/pipelines/" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"resources":{"cpu_shares":"8","memory_shares":"8G"}}' ``` ### [](#manage-kafka-connect)Manage Kafka Connect Use the API to configure your [Kafka Connect](../../../develop/managed-connectors/) clusters. > ❗ **IMPORTANT** > > - To enable this feature, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). To disable this feature, see [Disable Kafka Connect](../../../develop/managed-connectors/disable-kc/). > > - Redpanda Support does not manage or monitor Kafka Connect. For fully-supported connectors, consider [Redpanda Connect](../../../develop/connect/about/). > > - When Kafka Connect is enabled, there is a dedicated node running even when no connectors are deployed. > 📝 **NOTE** > > Kafka Connect is supported in BYOC and Dedicated clusters only. #### [](#create-a-kafka-connect-cluster-secret)Create a Kafka Connect cluster secret Kafka Connect cluster secret data must first be in JSON format, and then Base64-encoded. 1. Prepare the secret data in JSON format: ```none {"secret.access.key": ""} ``` 2. Encode the secret data in Base64: ```none echo '{"secret.access.key": ""}' | base64 ``` 3. Use the [Secrets API](/api/doc/cloud-dataplane/operation/operation-kafkaconnectservice_createsecret) to create a secret that stores the Base64-encoded secret data: ```bash curl -X POST "https:///v1/kafka-connect/clusters/redpanda/secrets" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"name":"","secret_data":""}' ``` The response returns an `id` that you can use to [create the Kafka Connect connector](#create-a-kafka-connect-connector). #### [](#create-a-kafka-connect-connector)Create a Kafka Connect connector To create a connector, make a POST request to [`/v1/kafka-connect/clusters/{cluster_name}/connectors`](/api/doc/cloud-dataplane/operation/operation-kafkaconnectservice_createconnector). The following example shows how to create an S3 sink connector with the name `my-connector`: ```bash curl -X POST "/v1/kafka-connect/clusters/redpanda/connectors" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"config":{"connector.class":"com.redpanda.kafka.connect.s3.S3SinkConnector","topics":"test-topic","aws.secret.access.key":"${secretsManager::secret.access.key}","aws.s3.bucket.name":"bucket-name","aws.access.key.id":"access-key","aws.s3.bucket.check":"false","region":"us-east-1"},"name":"my-connector"}' ``` > ⚠️ **CAUTION** > > The field `aws.secret.access.key` in this example contains sensitive information that usually shouldn’t be added to a configuration directly. Redpanda recommends that you first create a secret and then use the secret ID to inject the secret in your Create Connector request. > > If you had created a secret following the example from the previous section [Create a Kafka Connect cluster secret](#create-a-kafka-connect-cluster-secret), use the `id` returned in the Create Secret response to replace the placeholder `` in this Create Connector example. The syntax `${secretsManager::secret.access.key}` tells the Kafka Connect cluster to load ``, specifying the key `secret.access.key` from the secret JSON. Example success response: { "name": "my-connector", "config": { "aws.access.key.id": "access-key", "aws.s3.bucket.check": "false", "aws.s3.bucket.name": "bucket-name", "aws.secret.access.key": "secret-key", "connector.class": "com.redpanda.kafka.connect.s3.S3SinkConnector", "name": "my-connector", "region": "us-east-1", "topics": "test-topic" }, "tasks": \[\], "type": "sink" } #### [](#restart-a-kafka-connect-connector)Restart a Kafka Connect connector To restart a connector, make a POST request to the [`/v1/kafka-connect/clusters/{cluster_name}/connectors/{name}/restart`](/api/doc/cloud-dataplane/operation/operation-kafkaconnectservice_restartconnector) endpoint: ```bash curl -X POST "/v1/kafka-connect/clusters/redpanda/connectors/my-connector/restart" \ -H "Authorization: Bearer " \ -H "accept: application/json"\ -H "content-type: application/json" \ -d '{"include_tasks":false,"only_failed":false}' ``` ## [](#limitations)Limitations - Client SDKs are not available. --- # Page 412: Use the Control Plane API with Dedicated Cloud **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api/cloud-dedicated-controlplane-api.md --- # Use the Control Plane API with Dedicated Cloud --- title: Use the Control Plane API with Dedicated Cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/cloud-dedicated-controlplane-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/cloud-dedicated-controlplane-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/cloud-dedicated-controlplane-api.adoc description: Use the Control Plane API to manage resources in your Redpanda Cloud Dedicated environment. page-git-created-date: "2024-08-01" page-git-modified-date: "2025-03-20" --- The Redpanda Cloud API is a collection of REST APIs that allow you to interact with different parts of Redpanda Cloud. The Control Plane API enables you to programmatically manage your organization’s Redpanda infrastructure outside of the Cloud UI. You can call the API endpoints directly, or use tools like Terraform or Python scripts to automate cluster management. See [Control Plane API](/api/doc/cloud-controlplane/) for the full API reference documentation. ## [](#control-plane-api)Control Plane API The Control Plane API is one central API that allows you to provision clusters, networks, and resource groups. The Control Plane API consists of the following endpoint groups: - [Clusters](/api/doc/cloud-controlplane/group/endpoint-clusters) - [Networks](/api/doc/cloud-controlplane/group/endpoint-networks) - [Operations](/api/doc/cloud-controlplane/group/endpoint-operations) - [Resource Groups](/api/doc/cloud-controlplane/group/endpoint-resource-groups) - [Control Plane Role Bindings](/api/doc/cloud-controlplane/group/endpoint-control-plane-role-bindings) - [Control Plane Users](/api/doc/cloud-controlplane/group/endpoint-control-plane-users) - [Control Plane Service Accounts](/api/doc/cloud-controlplane/group/endpoint-control-plane-service-accounts) ## [](#lro)Long-running operations Some endpoints do not directly return the resource itself, but instead return an operation. The following is an example response of [`POST /clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster): ```bash { "operation": { "id": "cqfc6vdmvio001r4vu4", "metadata": { "@type": "type.googleapis.com/redpanda.api.controlplane.v1.CreateClusterMetadata", "cluster_id": "cqg168balf4e4pm8ptu" }, "state": "STATE_IN_PROGRESS", "started_at": "2024-07-23T20:31:29.948Z", "type": "TYPE_CREATE_CLUSTER", "resource_id": "cqg168balf4e4pm8ptu" } } ``` The response object represents the long-running operation of creating a cluster. Cluster creation is an example of an operation that can take a longer period of time to complete. ### [](#check-operation-state)Check operation state To check the progress of an operation, make a request to the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint using the operation ID as a parameter: ```bash curl -H "Authorization: Bearer " https://api.redpanda.com/v1/operations/ ``` > 💡 **TIP** > > When using a shell substitution variable for the token, use double quotes to wrap the header value. The response contains the current state of the operation: `IN_PROGRESS`, `COMPLETED`, or `FAILED`. ## [](#cluster-tiers)Cluster tiers When you create a BYOC or Dedicated cluster, you select a usage tier. Each tier provides tested and guaranteed workload configurations for throughput, partitions (pre-replication), and connections. Availability depends on the region and the cluster type. See the full list of regions, zones, and tiers available with each provider in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). ## [](#create-a-cluster)Create a cluster To create a new cluster, first create a resource group and network, if you have not already done so. ### [](#create-a-resource-group)Create a resource group Create a resource group by making a POST request to the [`/v1/resource-groups`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_createresourcegroup) endpoint. Pass a name for your resource group in the request body. ```bash curl -H 'Content-Type: application/json' \ -H "Authorization: Bearer " \ -d '{ "resource_group": { "name": "" } }' -X POST https://api.redpanda.com/v1/resource-groups ``` A resource group ID is returned. Pass this ID later when you call the Create Cluster endpoint. ### [](#create-a-network)Create a network Create a network by making a request to [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork). Choose a [CIDR range](../../../networking/cidr-ranges/) that does not overlap with your existing VPCs or your Redpanda network. ```bash curl -d \ '{ "network": { "cidr_block": "10.0.0.0/20", "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_DEDICATED", "name": "", "resource_group_id": "", "region": "us-west1" } }' -H "Content-Type: application/json" \ -H "Authorization: Bearer " -X POST https://api.redpanda.com/v1/networks ``` This endpoint returns a [long-running operation](#lro). ### [](#create-a-new-cluster)Create a new cluster After the network is created, make a request to the [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) with the resource group ID and network ID in the request body. ```bash curl -d \ '{ "cluster": { "cloud_provider": "CLOUD_PROVIDER_GCP", "connection_type": "CONNECTION_TYPE_PUBLIC", "name": "my-new-cluster", "resource_group_id": "", "network_id": "", "region": "us-west1", "throughput_tier": "tier-1-gcp-um4g", "type": "TYPE_DEDICATED", "zones": [ "us-west1-a", "us-west1-b", "us-west1-c" ], "cluster_configuration": { "custom_properties": { "audit_enabled":true } } } }' -H "Content-Type: application/json" \ -H "Authorization: Bearer " -X POST https://api.redpanda.com/v1/clusters ``` The Create Cluster endpoint returns a [long-running operation](#lro). When the operation completes, you can retrieve cluster details by calling [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster), and passing the cluster ID as a parameter. ## [](#update-cluster-configuration)Update cluster configuration To update your cluster configuration properties, make a request to the [`PATCH /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint, passing the cluster ID as a parameter. Include the properties to update in the request body. ```bash curl -H "Authorization: Bearer " \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{ "cluster_configuration": { "custom_properties": { "audit_enabled":true } } }' -X PATCH "https://api.cloud.redpanda.com/v1/clusters/" ``` The Update Cluster endpoint returns a [long-running operation](#lro). [Check the operation state](#check-operation-state) to verify that the update is complete. ## [](#delete-a-cluster)Delete a cluster To delete a cluster, make a request to the [`DELETE /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_deletecluster) endpoint, passing the cluster ID as a parameter. This is a [long-running operation](#lro). ```bash curl -H "Authorization: Bearer " -X DELETE https://api.redpanda.com/v1/clusters/ ``` ## [](#manage-rbac)Manage RBAC You can also use the Control Plane API to manage [RBAC configurations](../../../security/authorization/rbac/rbac/). ### [](#list-role-bindings)List role bindings To see role assignments for IAM user and service accounts, make a GET request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_listrolebindings) endpoint. ```bash curl https://api.redpanda.com/v1/role-bindings?filter.role_name=&filter.scope.resource_type=SCOPE_RESOURCE_TYPE_CLUSTER \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-role-binding)Get role binding To see roles assignments for a specific IAM account, make a GET request to the [`/v1/role-bindings/{id}`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_getrolebinding) endpoint, passing the role binding ID as a parameter. ```bash curl "https://api.redpanda.com/v1/role-bindings/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-user)Get user To see details of an IAM user account, make a GET request to the [`/v1/users/{id}`](/api/doc/cloud-controlplane/operation/operation-userservice_getuser) endpoint, passing the user account ID as a parameter. ```bash curl "https://api.redpanda.com/v1/users/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#create-role-binding)Create role binding To assign a role to an IAM user or service account, make a POST request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_createrolebinding) endpoint. Specify the role and scope, which includes the specific resource ID and an optional resource type, in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/role-bindings" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "role_name": "", "account_id": "", "scope": { "resource_type": "SCOPE_RESOURCE_TYPE_CLUSTER", "resource_id": "" } }' ``` For ``, use one of roles listed in [Predefined roles](../../../security/authorization/rbac/rbac/#predefined-roles) (`Reader`, `Writer`, `Admin`). ### [](#create-service-account)Create service account > 📝 **NOTE** > > Service accounts are assigned the Admin role for all resources in the organization. To create a new service account, make a POST request to the [`/v1/service-accounts`](/api/doc/cloud-controlplane/operation/operation-serviceaccountservice_createserviceaccount) endpoint, with a service account name and optional description in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/service-accounts" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "service_account": { "name": "", "description": "" } }' ``` ## [](#next-steps)Next steps - [Use the Data Plane APIs](../cloud-dataplane-api/) --- # Page 413: Use the Control Plane API with Serverless **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api/cloud-serverless-controlplane-api.md --- # Use the Control Plane API with Serverless --- title: Use the Control Plane API with Serverless latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/cloud-serverless-controlplane-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/cloud-serverless-controlplane-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/cloud-serverless-controlplane-api.adoc description: Use the Control Plane API to manage resources in your Redpanda Serverless environment. page-git-created-date: "2024-08-01" page-git-modified-date: "2025-03-20" --- The Redpanda Cloud API is a collection of REST APIs that allow you to interact with different parts of Redpanda Cloud. The Control Plane API enables you to programmatically manage your organization’s Redpanda infrastructure outside of the Cloud UI. You can call the API endpoints directly, or use tools like Terraform or Python scripts to automate cluster management. See [Control Plane API](/api/doc/cloud-controlplane/) for the full API reference documentation. ## [](#control-plane-api)Control Plane API The Control Plane API is one central API that allows you to provision clusters, networks, and resource groups. The Control Plane API consists of the following endpoint groups: - [Operations](/api/doc/cloud-controlplane/group/endpoint-operations) - [Resource Groups](/api/doc/cloud-controlplane/group/endpoint-resource-groups) - [Serverless Clusters](/api/doc/cloud-controlplane/group/endpoint-serverless-clusters) - [Serverless Regions](/api/doc/cloud-controlplane/group/endpoint-serverless-regions) - [Control Plane Role Bindings](/api/doc/cloud-controlplane/group/endpoint-control-plane-role-bindings) - [Control Plane Users](/api/doc/cloud-controlplane/group/endpoint-control-plane-users) - [Control Plane Service Accounts](/api/doc/cloud-controlplane/group/endpoint-control-plane-service-accounts) ## [](#create-a-cluster)Create a cluster To create a new serverless cluster, you can use the default resource group, or create a new resource group if you like. You need to choose a region where your cluster is hosted. ### [](#create-a-resource-group)Create a resource group > 📝 **NOTE** > > This step is optional. Serverless includes a default resource group. To retrieve the default resource group ID, make a GET request to the [`/v1/resource-groups`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_listresourcegroups) endpoint: > > ```bash > curl -H "Authorization: Bearer " https://api.redpanda.com/v1/resource-groups > ``` Create a resource group by making a POST request to the [`/v1/resource-groups`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_createresourcegroup) endpoint. Pass a name for your resource group in the request body. ```bash curl -H 'Content-Type: application/json' \ -H "Authorization: Bearer " \ -d '{ "name": "" }' -X POST https://api.redpanda.com/v1/resource-groups ``` A resource group ID is returned. Pass this ID later when you call the Create Serverless Cluster endpoint. ### [](#choose-a-region)Choose a region To see the available regions for Redpanda Serverless, make a GET request to the [`/v1/serverless/regions`](/api/doc/cloud-controlplane/operation/operation-serverlessregionservice_listserverlessregions) endpoint. You can specify a cloud provider in your request. Serverless currently only supports AWS. ```bash curl -H "Authorization: Bearer " 'https://api.redpanda.com/v1/serverless/regions?cloud_provider=CLOUD_PROVIDER_AWS' ``` > 💡 **TIP** > > When using a shell substitution variable for the token, use double quotes to wrap the header value. ```json { "serverless_regions": [ { "name": "eu-central-1", "display_name": "eu-central-1", "default_timezone": { "id": "Europe/Berlin", "version": "" }, "cloud_provider": "CLOUD_PROVIDER_AWS", "available": true }, ... ], "next_page_token": "" } ``` You can also see a list of supported regions in [Serverless regions](../../../reference/tiers/serverless-regions/). ### [](#create-a-new-serverless-cluster)Create a new serverless cluster Create a Serverless cluster by making a request to [`POST /v1/serverless/clusters`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_createserverlesscluster) with the resource group ID and serverless region name in the request body. ```bash curl -H 'Content-Type: application/json' \ -H "Authorization: Bearer " \ -d '{ "serverless_cluster": { "name": "", "resource_group_id": "", "serverless_region": "us-east-1" } }' -X POST https://api.redpanda.com/v1/serverless/clusters ``` The Create Serverless Cluster endpoint returns a [long-running operation](#lro-serverless). When the operation completes, you can retrieve cluster details by calling [`GET /v1/serverless/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_getserverlesscluster), and passing the cluster ID as a parameter. ## [](#update-cluster-configuration)Update cluster configuration To update your cluster configuration properties, make a request to the [`PATCH /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint, passing the cluster ID as a parameter. Include the properties to update in the request body. ```bash curl -H "Authorization: Bearer " \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{ "cluster_configuration": { "custom_properties": { "audit_enabled":true } } }' -X PATCH "https://api.cloud.redpanda.com/v1/clusters/" ``` The Update Cluster endpoint returns a [long-running operation](#lro). [Check the operation state](#check-operation-state) to verify that the update is complete. ## [](#delete-a-cluster)Delete a cluster To delete a cluster, make a request to the [`DELETE /v1/serverless/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_getserverlesscluster) endpoint, passing the cluster ID as a parameter. This is a [long-running operation](#lro-serverless). ```bash curl -H "Authorization: Bearer " -X DELETE https://api.redpanda.com/v1/serverless/clusters/ ``` Optional: When the cluster is deleted, the delete operation’s state changes to `STATE_COMPLETED`. At this point, you may make a DELETE request to the [`/v1/resource-groups/{id}`](/api/doc/cloud-controlplane/operation/operation-resourcegroupservice_deleteresourcegroup) endpoint to delete the resource group. ## [](#lro-serverless)Long-running operations Some endpoints do not directly return the resource itself, but instead return an operation. The following is an example response of [`POST /serverless/clusters`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_createserverlesscluster): ```bash { "operation": { "id": "cqaramrndjr40k3qei50", "metadata": null, "state": "STATE_IN_PROGRESS", "started_at": { "seconds": "1721087323", "nanos": 888601218 }, "finished_at": null, "type": "TYPE_CREATE_SERVERLESS_CLUSTER" } } ``` The response object represents the long-running operation of creating a cluster. Cluster creation is an example of an operation that can take a longer period of time to complete. ### [](#check-operation-state)Check operation state To check the progress of an operation, make a request to the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint using the operation ID as a parameter: ```bash curl -H "Authorization: Bearer " https://api.redpanda.com/v1/operations/ ``` The response contains the current state of the operation: `IN_PROGRESS`, `COMPLETED`, or `FAILED`. ## [](#manage-rbac)Manage RBAC You can also use the Control Plane API to manage [RBAC configurations](../../../security/authorization/rbac/rbac/). ### [](#list-role-bindings)List role bindings To see role assignments for IAM user and service accounts, make a GET request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_listrolebindings) endpoint. ```bash curl https://api.redpanda.com/v1/role-bindings?filter.role_name=&filter.scope.resource_type=SCOPE_RESOURCE_TYPE_CLUSTER \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-role-binding)Get role binding To see roles assignments for a specific IAM account, make a GET request to the [`/v1/role-bindings/{id}`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_getrolebinding) endpoint, passing the role binding ID as a parameter. ```bash curl "https://api.redpanda.com/v1/role-bindings/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#get-user)Get user To see details of an IAM user account, make a GET request to the [`/v1/users/{id}`](/api/doc/cloud-controlplane/operation/operation-userservice_getuser) endpoint, passing the user account ID as a parameter. ```bash curl "https://api.redpanda.com/v1/users/ \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` ### [](#create-role-binding)Create role binding To assign a role to an IAM user or service account, make a POST request to the [`/v1/role-bindings`](/api/doc/cloud-controlplane/operation/operation-rolebindingservice_createrolebinding) endpoint. Specify the role and scope, which includes the specific resource ID and an optional resource type, in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/role-bindings" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "role_name": "", "account_id": "", "scope": { "resource_type": "SCOPE_RESOURCE_TYPE_CLUSTER", "resource_id": "" } }' ``` For ``, use one of roles listed in [Predefined roles](../../../security/authorization/rbac/rbac/#predefined-roles) (`Reader`, `Writer`, `Admin`). ### [](#create-service-account)Create service account > 📝 **NOTE** > > Service accounts are assigned the Admin role for all resources in the organization. To create a new service account, make a POST request to the [`/v1/service-accounts`](/api/doc/cloud-controlplane/operation/operation-serviceaccountservice_createserviceaccount) endpoint, with a service account name and optional description in the request body. ```bash curl -X POST "https://api.redpanda.com/v1/service-accounts" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "service_account": { "name": "", "description": "" } }' ``` ## [](#next-steps)Next steps - [Use the Data Plane APIs](../cloud-dataplane-api/) --- # Page 414: Use the Control Plane API **URL**: https://docs.redpanda.com/redpanda-cloud/manage/api/controlplane.md --- # Use the Control Plane API --- title: Use the Control Plane API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api/controlplane/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api/controlplane/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/api/controlplane/index.adoc description: Use the Control Plane API to manage resources in your Redpanda Cloud organization. page-git-created-date: "2024-08-01" page-git-modified-date: "2025-03-20" --- - [Use the Control Plane API with BYOC](../cloud-byoc-controlplane-api/) Use the Control Plane API to manage resources in your Redpanda Cloud BYOC environment. - [Use the Control Plane API with Dedicated Cloud](../cloud-dedicated-controlplane-api/) Use the Control Plane API to manage resources in your Redpanda Cloud Dedicated environment. - [Use the Control Plane API with Serverless](../cloud-serverless-controlplane-api/) Use the Control Plane API to manage resources in your Redpanda Serverless environment. --- # Page 415: Audit Logging **URL**: https://docs.redpanda.com/redpanda-cloud/manage/audit-logging.md --- # Audit Logging --- title: Audit Logging latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: audit-logging page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: audit-logging.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/audit-logging.adoc description: Learn how to use Redpanda's audit logging capabilities. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > Audit logging is supported on BYOC and Dedicated clusters running Redpanda version 24.3 and later. To configure audit logging, see [Configure Cluster Properties](../cluster-maintenance/config-cluster/). Many scenarios for streaming data include the need for fine-grained auditing of user activity related to the system. This is especially true for regulated industries such as finance, healthcare, and the public sector. Complying with [PCI DSS v4](https://www.pcisecuritystandards.org/document_library/?document=pci_dss) standards, for example, requires verbose and detailed activity auditing, alerting, and analysis capabilities. Redpanda’s auditing capabilities support recording both administrative and operational interactions with topics and with users. Redpanda complies with the Open Cybersecurity Schema Framework (OCSF), providing a predictable and extensible solution that works seamlessly with industry standard tools. With audit logging enabled, there should be no noticeable changes in performance other than slightly elevated CPU usage. ## [](#audit-log-flow)Audit log flow The Redpanda audit log mechanism functions similar to the Kafka flow. When a user interacts with another user or with a topic, Redpanda writes an event to a specialized audit topic. The audit topic is immutable. Only Redpanda can write to it. Users are prevented from writing to the audit topic directly and the Kafka API cannot create or delete it. ![Audit log flow](../../shared/_images/audit-logging-flow.png) By default, any management and authentication actions performed on the cluster yield messages written to the audit log topic that are retained for seven days. Interactions with all topics by all principals are audited. Actions performed using the Kafka API and Admin API are all audited, as are actions performed directly through `rpk`. Messages recorded to the audit log topic comply with the [open cybersecurity schema framework](https://schema.ocsf.io/). Any number of analytics frameworks, such as Splunk or Sumo Logic, can receive and process these messages. Using an open standard ensures Redpanda’s audit logs coexist with those produced by other IT assets, powering holistic monitoring and analysis of your assets. ## [](#audit-log-configuration-options)Audit log configuration options Redpanda’s audit logging mechanism supports several options to control the volume and availability of audit records. Configuration is applied at the cluster level. To configure audit logging, see [Configure Cluster Properties](../cluster-maintenance/config-cluster/). - [`audit_enabled`](../../reference/properties/cluster-properties/#audit_enabled): Boolean value to enable audit logging. When you set this to `true`, Redpanda checks for an existing topic named `_redpanda.audit_log`. If none is found, Redpanda automatically creates one for you. Default: `true`. - [`audit_enabled_event_types`](../../reference/properties/cluster-properties/#audit_enabled_event_types): List of strings in JSON style identifying the event types to include in the audit log. This may include any of the following: `management, produce, consume, describe, heartbeat, authenticate, schema_registry, admin`. Default: `'["management","authenticate","admin"]'`. - [`audit_excluded_principals`](../../reference/properties/cluster-properties/#audit_excluded_principals): List of strings in JSON style identifying the principals the audit logging system should ignore. Principals can be listed as `User:name` or `name`, both are accepted. Default: `null`. ## [](#enable-audit-logging)Enable audit logging Audit logging is enabled by default. Cluster administrators can configure the audited topics and principals. However, only the Redpanda team can configure the type of audited events. For more information or support, contact your Redpanda account team. ## [](#configure-retention-for-audit-logs)Configure retention for audit logs You can export audit events to your SIEM for long-term retention to support audit and compliance needs. Redpanda Data recommends that you retain audit logs for at least one year in a separate system like your SIEM, so if there is an issue with the Redpanda cluster you have access to the audit logs. If you need to change the default seven-day retention period, update the retention settings using the `retention.ms` property for the `_redpanda.audit_log` topic: ```bash # Set 1-year retention (in milliseconds) on the audit log topic rpk topic alter-config _redpanda.audit_log --set retention.ms=31536000000 ``` > 📝 **NOTE** > > In Redpanda Cloud, both `retention.ms` (time-based) and `retention.bytes` (size-based) retention policies are applied simultaneously. Data becomes eligible for deletion when either limit is reached, depending on whichever occurs first. This means neither setting strictly takes precedence; the earliest limit (by time or size) triggers data cleanup. When updating audit log retention, check to make sure you do not already have a size-based retention policy that might remove logs before the period you specify. ## [](#next-steps)Next steps [See samples of audit log messages](audit-log-samples/) --- # Page 416: Sample Audit Log Messages **URL**: https://docs.redpanda.com/redpanda-cloud/manage/audit-logging/audit-log-samples.md --- # Sample Audit Log Messages --- title: Sample Audit Log Messages latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: audit-logging/audit-log-samples page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: audit-logging/audit-log-samples.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/audit-logging/audit-log-samples.adoc description: Sample Redpanda audit log messages. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- Redpanda’s audit logs comply with version 1.0.0 of the [Open Cybersecurity Schema Framework (OCSF)](https://github.com/ocsf). This provides a predictable and extensible solution that works seamlessly with industry standard tools. This page aggregates several sample log files covering a range of scenarios. ## [](#standard-ocsf-messages)Standard OCSF messages Redpanda produces the following standard OCSF class messages: - Authentication (3002) for all authentication events - Application Lifecycle (6002) for when the audit system is enabled or disabled or when Redpanda starts or stops (if auditing is enabled when Redpanda starts or stops) - API Activity (6003) for any access to the Kafka API, Admin API, or Schema Registry Refer to the [OCSF Schema Definition](https://schema.ocsf.io/) for the field definitions for each event class. ## [](#authentication-events)Authentication events These messages illustrate various scenarios around successful and unsuccessful authentication events. Authentication successful This scenario shows the message resulting from an admin using rpk with successful authentication. This is an authentication type event. ```json { "category_uid": 3, "class_uid": 3002, "metadata": { "product": { "name": "Redpanda", // This is the Node ID of the broker that produced this audit event "uid": "2", "vendor_name": "Redpanda Data, Inc.", "version": "v23.3.0-dev-2457-g76dc896f8c" }, "version": "1.0.0" }, "severity_id": 1, "time": 1700533469078, "type_uid": 300201, "activity_id": 1, "auth_protocol": "SASL-SCRAM", "auth_protocol_id": 99, // This is the IP address of the Kafka broker that received the authorization request "dst_endpoint": { "ip": "127.0.0.1", "port": 19092, // Name of the Redpanda kafka server "svc_name": "kafka rpc protocol" }, // Indicates that credentials were not encrypted using TLS "is_cleartext": true, "is_mfa": false, "service": { "name": "kafka rpc protocol" }, // This is the IP address of the client that generated the authorization request "src_endpoint": { "ip": "127.0.0.1", // This is the client ID of the kafka client "name": "rpk", "port": 42906 }, "status_id": 1, "user": { "name": "user", "type_id": 1 } } ``` Authentication successful (OIDC with group claims) This scenario shows a successful OIDC authentication event that includes the user’s IdP group memberships in the `user.groups` field. Group memberships are extracted from the OIDC token and included in all authentication events for OIDC users. ```json { "category_uid": 3, "class_uid": 3002, "metadata": { "product": { "name": "Redpanda", "uid": "0", "vendor_name": "Redpanda Data, Inc.", "version": "v26.1.1" }, "version": "1.0.0" }, "severity_id": 1, "time": 1700533469078, "type_uid": 300201, "activity_id": 1, "auth_protocol": "SASL-OAUTHBEARER", "auth_protocol_id": 99, "dst_endpoint": { "ip": "127.0.0.1", "port": 9092, "svc_name": "kafka rpc protocol" }, "is_cleartext": false, "is_mfa": false, "service": { "name": "kafka rpc protocol" }, "src_endpoint": { "ip": "10.0.1.50", "name": "kafka-client", "port": 48210 }, "status_id": 1, // IdP group memberships extracted from the OIDC token "user": { "name": "alice@example.com", "type_id": 1, "groups": [ {"type": "idp_group", "name": "engineering"}, {"type": "idp_group", "name": "analytics"} ] } } ``` Authentication failed This scenario illustrates a common failure where a user entered the wrong credentials. This is an authentication type event. ```json { "category_uid": 3, "class_uid": 3002, "metadata": { "product": { "name": "Redpanda", "uid": "1", "vendor_name": "Redpanda Data, Inc.", "version": "v23.3.0-dev-2457-g76dc896f8c" }, "version": "1.0.0" }, "severity_id": 1, "time": 1700534756350, "type_uid": 300201, "activity_id": 1, "auth_protocol": "SASL-SCRAM", "auth_protocol_id": 99, "dst_endpoint": { "ip": "127.0.0.1", "port": 19092, "svc_name": "kafka rpc protocol" }, "is_cleartext": true, "is_mfa": false, "service": { "name": "kafka rpc protocol" }, "src_endpoint": { "ip": "127.0.0.1", "name": "rpk", "port": 45236 }, "status_id": 2, "status_detail": "SASL authentication failed: security: Invalid credentials", "user": { "name": "admin", "type_id": 1 } } ``` ## [](#kafka-api-events)Kafka API events The Redpanda Kafka API offers a wide array of options for interacting with your Redpanda clusters. Following are examples of messages from common interactions with the API. Create ACL entry This example illustrates an ACL update that also requires a superuser authentication. It lists the edited ACL and the updated permissions. This is a management type event. ```json { "category_uid": 6, "class_uid": 6003, "metadata": { "product": { "name": "Redpanda", "vendor_name": "Redpanda Data, Inc.", "version": "v23.3.0-dev-2457-g76dc896f8c" }, "profiles": [ "cloud" ], "version": "1.0.0" }, "severity_id": 1, "time": 1700533393776, "type_uid": 600303, "activity_id": 3, "actor": { "authorizations": [ { "decision": "authorized", // This shows a superuser level authorization "policy": { "desc": "superuser", "name": "aclAuthorization" } } ], "user": { "name": "admin", "type_id": 2 } }, "api": { // The API operation performed "operation": "create_acls", "service": { "name": "kafka rpc protocol" } }, "cloud": { "provider": "" }, "dst_endpoint": { "ip": "127.0.0.1", "port": 19092, "svc_name": "kafka rpc protocol" }, // List of resources accessed "resources": [ // The created ACL { "name": "create acl", "type": "acl_binding", "data": { "resource_type": "topic", "resource_name": "*", "pattern_type": "literal", "acl_principal": "{type user name user}", "acl_host": "{{any_host}}", "acl_operation": "all", "acl_permission": "allow" } }, // Below indicates that the user had cluster level authorization { "name": "kafka-cluster", "type": "cluster" } ], "src_endpoint": { "ip": "127.0.0.1", "name": "rpk", "port": 50276 }, "status_id": 1, "unmapped": { // Provides a more parsable output of how the // authorization decision was made "authorization_metadata": { "acl_authorization": { "host": "", "op": "", "permission_type": "AUTHORIZED", "principal": "" }, "resource": { "name": "", "pattern": "", "type": "" } } } } ``` Authorization matched on a group ACL This example shows an API Activity (6003) where the authorization decision matched an ALLOW ACL on a `Group:` principal. The `actor.user.groups` field includes the matched group with type `idp_group`, and the `authorization_metadata` shows the group ACL that granted access. See [Group-Based Access Control](../../../security/authorization/gbac/). ```json { "category_uid": 6, "class_uid": 6003, "metadata": { "product": { "name": "Redpanda", "uid": "0", "vendor_name": "Redpanda Data, Inc.", "version": "v26.1.0" }, "version": "1.0.0" }, "severity_id": 1, "time": 1774544504327, "type_uid": 600303, "activity_id": 3, "actor": { "authorizations": [ { "decision": "authorized", "policy": { "desc": "acl: {principal type {group} name {/sales} host {{any_host}} op all perm allow}, resource: type {topic} name {sales-topic} pattern {literal}", "name": "aclAuthorization" } } ], // The matched group appears in the user's groups field "user": { "name": "alice", "type_id": 1, "groups": [ { "type": "idp_group", "name": "/sales" } ] } }, "api": { "operation": "produce", "service": { "name": "kafka rpc protocol" } }, "dst_endpoint": { "ip": "127.0.1.1", "port": 9092, "svc_name": "kafka rpc protocol" }, "resources": [ { "name": "sales-topic", "type": "topic" } ], "src_endpoint": { "ip": "127.0.0.1", "name": "rdkafka", "port": 42728 }, "status_id": 1, "unmapped": { "authorization_metadata": { "acl_authorization": { "host": "{{any_host}}", "op": "all", "permission_type": "allow", "principal": "type {group} name {/sales}" }, "resource": { "name": "sales-topic", "pattern": "literal", "type": "topic" } } } } ``` Metadata request (with counts) This shows a message for a scenario where a user requests a set of metadata using rpk. It provides detailed information on the type of request and the information sent to the user. This is a describe type event. ```json { "category_uid": 6, "class_uid": 6003, // If present, indicates that >1 of the same authz check was performed // within the period of the audit log collecting entries // This provides start and end time (the time period these events were // observed) "count": 2, "end_time": 1700533480725, "metadata": { "product": { "name": "Redpanda", "uid": "0", "vendor_name": "Redpanda Data, Inc.", "version": "v23.3.0-dev-2457-g76dc896f8c" }, "profiles": [ "cloud" ], "version": "1.0.0" }, "severity_id": 1, "start_time": 1700533480724, "time": 1700533480724, "type_uid": 600303, "activity_id": 3, "actor": { "authorizations": [ { "decision": "authorized", // Represents a policy for a non-super user "policy": { "desc": "acl: {principal {type user name user} host {{any_host}} op all perm allow}, resource: type {topic} name {*} pattern {literal}", "name": "aclAuthorization" } } ], "user": { "name": "user", "type_id": 1 } }, "api": { "operation": "metadata", "service": { "name": "kafka rpc protocol" } }, "cloud": { "provider": "" }, "dst_endpoint": { "ip": "127.0.0.1", "port": 19092, "svc_name": "kafka rpc protocol" }, "resources": [ // The topics accessed { "name": "test", "type": "topic" } ], "src_endpoint": { "ip": "127.0.0.1", "name": "rpk", "port": 53602 }, "status_id": 1, "unmapped": { "authorization_metadata": { "acl_authorization": { "host": "{{any_host}}", "op": "all", "permission_type": "allow", "principal": "{type user name user}" }, "resource": { "name": "*", "pattern": "literal", "type": "topic" } } } } ``` --- # Page 417: Cluster Maintenance **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance.md --- # Cluster Maintenance --- title: Cluster Maintenance latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/index.adoc description: Learn about cluster maintenance and configuration properties. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- - [Cluster State](cluster-state/) Learn about the current status of a cluster. - [Upgrades and Maintenance](../maintenance/) Learn how Redpanda Cloud manages maintenance operations. - [Configure Cluster Properties](config-cluster/) Learn how to configure cluster properties to enable and manage features. - [Audit Logging](../audit-logging/) Learn how to use Redpanda's audit logging capabilities. - [About Client Throughput Quotas](about-throughput-quotas/) Understand how Redpanda's user-based and client ID-based throughput quotas work, including entity hierarchy, precedence rules, and quota tracking behavior. - [Manage Throughput](manage-throughput/) Configure broker-wide and client-specific throughput quotas to prevent resource exhaustion and noisy-neighbor issues. - [Configure Client Connections](configure-client-connections/) Learn about guidelines for configuring client connections in Redpanda clusters for optimal availability. --- # Page 418: About Client Throughput Quotas **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance/about-throughput-quotas.md --- # About Client Throughput Quotas --- title: About Client Throughput Quotas latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/about-throughput-quotas page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/about-throughput-quotas.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/about-throughput-quotas.adoc description: Understand how Redpanda's user-based and client ID-based throughput quotas work, including entity hierarchy, precedence rules, and quota tracking behavior. learning-objective-1: Describe the difference between user-based and client ID-based quotas learning-objective-2: Determine which quota type to use for your use case learning-objective-3: Explain quota precedence rules and how Redpanda tracks quota usage page-git-created-date: "2026-03-31" page-git-modified-date: "2026-03-31" --- Redpanda uses throughput quotas to limit the rate of produce and consume requests from clients. Understanding how quotas work helps you prevent individual clients from disproportionately consuming resources and causing performance degradation for other clients (also known as the "noisy-neighbor" problem), and ensure fair resource sharing across users and applications. After reading this page, you will be able to: - Describe the difference between user-based and client ID-based quotas - Determine which quota type to use for your use case - Explain quota precedence rules and how Redpanda tracks quota usage To configure and manage throughput quotas, see [Manage Throughput](../manage-throughput/). ## [](#throughput-control-overview)Throughput control overview Redpanda provides two ways to control throughput: - Broker-wide limits: Configured using cluster properties. For details, see [Broker-wide throughput limits](../manage-throughput/#broker-wide-throughput-limits). - Client throughput quotas: Configured using the Kafka API. Client quotas enable per-user and per-client rate limiting with fine-grained control through entity hierarchy and precedence rules. This page focuses on client quotas. ## [](#supported-quota-types)Supported quota types Redpanda supports three Kafka API-based quota types: | Quota type | Description | | --- | --- | | producer_byte_rate | Limit throughput of produce requests (bytes per second) | | consumer_byte_rate | Limit throughput of fetch requests (bytes per second) | | controller_mutation_rate | Limit rate of topic mutation requests (partitions created or deleted per second) | All quota types can be applied to groups of client connections based on user principals, client IDs, or combinations of both. ## [](#quota-entities)Quota entities Redpanda uses two pieces of identifying information from each client connection to determine which quota applies: - Client ID: An ID that clients self-declare. Quotas can target an exact client ID (`client-id`) or a prefix (`client-id-prefix`). Multiple client connections that share a client ID or ID prefix are grouped into a single quota entity. - User [principal](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#principal): An authenticated identity verified through SASL, mTLS, or OIDC. Connections that share the same user are considered one entity. You can configure quotas that target either entity type, or combine both for fine-grained control. ### [](#client-id-based-quotas)Client ID-based quotas Client ID-based quotas apply to clients identified by their `client-id` field, which is set by the client application. The client ID is typically a configurable property when you create a client with Kafka libraries. When using client ID-based quotas, multiple clients using the same client ID share the same quota tracking. Client ID-based quotas rely on clients honestly reporting their identity and correctly setting the `client-id` property. This makes client ID-based quotas unsuitable for guaranteeing isolation between tenants. Use client ID-based quotas when: - Authentication is not enabled. - Grouping by application or service name is sufficient. - You operate a single-tenant environment where all clients are trusted. - You need simple rate limiting without user-level isolation. ### [](#user-based-quotas)User-based quotas > ❗ **IMPORTANT** > > User-based quotas require [authentication](../../../security/cloud-authentication/) to be enabled on your cluster. User-based quotas apply to authenticated user principals. Each user has a separate quota, providing a way to limit the impact of individual users on the cluster. User-based quotas rely on Redpanda’s authentication system to verify user identity. The user principal is extracted from SASL credentials, mTLS certificates, or OIDC tokens and cannot be forged by clients. Use user-based quotas when: - You operate a multi-tenant environment, such as SaaS platforms or enterprises with departments. - You require isolation between users or tenants, to avoid noisy-neighbor issues. - You need per-user billing or metering. ### [](#combined-user-and-client-quotas)Combined user and client quotas You can combine user and client identities for fine-grained control over specific (user, client) combinations. Use combined quotas when: - You need fine-grained control, for example: user `alice` using a specific application. - Different rate limits apply to different apps used by the same user. For example, `alice`'s `payment-processor` gets 10 MB/s, but `alice`'s `analytics-consumer` gets 50 MB/s. See [Quota precedence and tracking](#quota-precedence-and-tracking) for examples. ## [](#quota-precedence-and-tracking)Quota precedence and tracking When a request arrives, Redpanda resolves which quota to apply by matching the request’s authenticated user principal and client ID against configured quotas. Redpanda applies the most specific match, using the precedence order in the following table (highest priority first). The precedence level that matches also determines how quota usage is tracked. Redpanda tracks quota usage using a tracker key that determines which connections share the same quota bucket. How connections are grouped into buckets depends on the type of entity the quota targets. To get independent quota tracking per user and client ID combination, configure quotas that include both dimensions, such as `/config/users//clients/` or `/config/users//clients/`. | Level | Match type | Config path | Tracker key | Isolation behavior | | --- | --- | --- | --- | --- | | 1 | Exact user + exact client | /config/users//clients/ | (user, client-id) | Each unique (user, client-id) pair tracked independently | | 2 | Exact user + client prefix | /config/users//client-id-prefix/ | (user, client-id-prefix) | Clients matching the prefix share tracking within that user | | 3 | Exact user + default client | /config/users//clients/ | (user, client-id) | Each unique (user, client-id) pair tracked independently | | 4 | Exact user only | /config/users/ | user | All clients for that user share a single tracking bucket | | 5 | Default user + exact client | /config/users//clients/ | (user, client-id) | Each unique (user, client-id) pair tracked independently | | 6 | Default user + client prefix | /config/users//client-id-prefix/ | (user, client-id-prefix) | Clients matching the prefix share tracking within each user | | 7 | Default user + default client | /config/users//clients/ | (user, client-id) | Each unique (user, client-id) pair tracked independently | | 8 | Default user only | /config/users/ | user | All clients for each user share a single tracking bucket (per user) | | 9 | Exact client only | /config/clients/ | client-id | All users with that client ID share a single tracking bucket | | 10 | Client prefix only | /config/client-id-prefix/ | client-id-prefix | All clients matching the prefix share a single bucket across all users | | 11 | Default client only | /config/clients/ | client-id | Each unique client ID tracked independently | | 12 | No quota configured | N/A | N/A | No tracking / unlimited throughput | > ❗ **IMPORTANT** > > The `` entity matches any user or client that doesn’t have a more specific quota configured. This is different from an empty/unauthenticated user (`user=""`), or undeclared client ID (`client-id=""`), which are treated as specific entities. ### [](#unauthenticated-connections)Unauthenticated connections Unauthenticated connections have an empty user principal (`user=""`) and are not treated as `user=`. Unauthenticated connections: - Fall back to client-only quotas. - Have unlimited throughput only if no client-only quota matches. ### [](#example-precedence-resolution)Example: Precedence resolution Given these configured quotas: ```bash rpk cluster quotas alter --add consumer_byte_rate=5000000 --name user=alice --name client-id=app-1 rpk cluster quotas alter --add consumer_byte_rate=10000000 --name user=alice rpk cluster quotas alter --add consumer_byte_rate=20000000 --name client-id=app-1 ``` | User + Client ID | Precedence match | | --- | --- | | user=alice, client-id=app-1 | Level 1: Exact user + exact client | | user=alice, client-id=app-2 | Level 4: Exact user only | | user=bob, client-id=app-1 | Level 9: Exact client only | | user=bob, client-id=app-2 | Level 12: No quota configured | When no quota matches (level 12), the connection is not throttled. ### [](#example-user-only-quota)Example: User-only quota If you configure a 10 MB/s produce quota for user `alice`: ```bash rpk cluster quotas alter --add producer_byte_rate=10000000 --name user=alice ``` Then `alice` connecting with client ID `app-1` and `alice` connecting with client ID `app-2` share the same 10 MB/s produce limit. To give each of `alice`'s clients an independent 10 MB/s limit, configure: ```bash rpk cluster quotas alter --add producer_byte_rate=10000000 --name user=alice --default client-id ``` ### [](#example-user-default-quota)Example: User default quota If you configure a default 10 MB/s produce quota for all users: ```bash rpk cluster quotas alter --add producer_byte_rate=10000000 --default user ``` This quota applies to all users who don’t have a more specific quota configured. Each user is tracked independently: `alice` gets her own 10 MB/s bucket, `bob` gets his own 10 MB/s bucket, and so on. Within each user, all client ID values share that user’s bucket. `alice` connecting with client ID `app-1` and `alice` connecting with client ID `app-2` share the same 10 MB/s produce limit, while `bob`'s connections have a separate 10 MB/s limit. ## [](#throttling-enforcement)Throughput throttling enforcement > 📝 **NOTE** > > As of v24.2, Redpanda enforces all throughput limits per broker, including client throughput. Redpanda enforces throughput limits by applying backpressure to clients. When a connection exceeds its throughput limit, Redpanda throttles the connection to bring the rate back within the allowed level: 1. Redpanda adds a `throttle_time_ms` field to responses, indicating how long the client should wait. 2. If the client doesn’t honor the throttle time, Redpanda inserts delays on the connection’s next read operation. In Redpanda Cloud, the throttling delay is set to 30 seconds. ## [](#default-behavior)Default behavior Quotas are opt-in restrictions and not enforced by default. When no quotas are configured, clients have unlimited throughput. ## [](#next-steps)Next steps - [Configure throughput quotas](../manage-throughput/) - [Enable authentication for user-based quotas](../../../security/cloud-authentication/) --- # Page 419: Cluster State **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance/cluster-state.md --- # Cluster State --- title: Cluster State latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/cluster-state page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/cluster-state.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/cluster-state.adoc description: Learn about the current status of a cluster. page-git-created-date: "2025-07-23" page-git-modified-date: "2025-07-24" --- The cluster state shows the current status of a cluster. Redpanda Cloud updates the state automatically, allowing you to monitor a cluster’s health and availability. ## Serverless | State | Description | | --- | --- | | Creating | Cluster is in the process of having its control plane state created. | | Placing | Cluster is in the process of being placed on a cell with sufficient resources in the data plane. | | Ready | Cluster is running and accepting external requests. | | Deleting | Cluster is in the process of having its control plane state removed. Resources dedicated to the cluster in the data plane are released. | | Failed | Cluster is unable to enter the Ready state from either the Creating or Placing states.Try re-creating the cluster. | | Suspended | Cluster is running but blocks all external requests.This can happen when credits run out. Enter a credit card to return to the Ready state. | ## BYOC/Dedicated | State | Description | | --- | --- | | Creating agent | Cluster is in the process of having its control plane state created, and the Redpanda Cloud agent is being deployed. | | Creating | Cluster is in the process of having its control plane state created. | | Ready | Cluster is running and accepting external requests. | | Deleting | Cluster is in the process of having its control plane state removed. Resources dedicated to the cluster in the data plane are released. | | Deleting agent | Cluster is in the process of having its control plane state and Redpanda Cloud agent removed. | | Upgrading | Cluster is undergoing a rolling upgrade or a scaling operation. | | Failed | Cluster is unable to enter the Ready state from either the Creating or the Creating agent states.Try re-creating the cluster. | | Suspended | Cluster is running but blocks all external requests.This can happen when credits run out. Enter a credit card to return to the Ready state. | --- # Page 420: Configure Cluster Properties **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance/config-cluster.md --- # Configure Cluster Properties --- title: Configure Cluster Properties latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/config-cluster page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/config-cluster.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/config-cluster.adoc description: Learn how to configure cluster properties to enable and manage features. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-08-27" --- Cluster configuration properties are set to their default values and are automatically replicated across all brokers. You can use cluster properties to enable and manage features such as [Iceberg topics](../../iceberg/about-iceberg-topics/), [data transforms](../../../develop/data-transforms/), and [audit logging](../../audit-logging/). For a complete list of the cluster properties available in Redpanda Cloud, see [Cluster Configuration Properties](../../../reference/properties/cluster-properties/) and [Object Storage Properties](../../../reference/properties/object-storage-properties/). > 📝 **NOTE** > > Some properties are read-only and cannot be changed. For example, `cluster_id` is a read-only property that is automatically set when the cluster is created. ## [](#prerequisites)Prerequisites - **`rpk` version 25.1.2+**: To check your current version, see [Install or Update rpk](../../rpk/rpk-install/). - **Redpanda version 25.1.2+**: You can find the version on your cluster’s Overview page in the Redpanda Cloud UI. To verify that you’re logged into the Redpanda control plane and have the correct `rpk` profile configured for your target cluster, run `rpk cloud login` and select your cluster. ## [](#limitations)Limitations Cluster properties are supported on BYOC and Dedicated clusters running on AWS and GCP. - They are not available on BYOC and Dedicated clusters running on Azure. - They are not available on Serverless clusters. ## [](#set-cluster-configuration-properties)Set cluster configuration properties You can set cluster configuration properties using the `rpk` command-line tool or the Cloud API. ### rpk Use `rpk cluster config` to set cluster properties. For example, to enable audit logging, set [`audit_enabled`](../../../reference/properties/cluster-properties/#audit_enabled) to `true`: ```bash rpk cluster config set audit_enabled true ``` To set a cluster property with a secret, you must use the following notation: ```bash rpk cluster config set iceberg_rest_catalog_client_secret '${secrets.}' ``` > 📝 **NOTE** > > Some properties require a rolling restart, and it can take several minutes for the update to complete. The `rpk cluster config set` command returns the operation ID. ### Cloud API Use the Cloud API to set cluster properties: - Create a cluster by making a [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) request. Edit `cluster_configuration` in the request body with a key-value pair for `custom_properties`. - Update a cluster by making a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request, passing the cluster ID as a parameter. Include the properties to update in the request body. For example, to set [`audit_enabled`](../../../reference/properties/cluster-properties/#audit_enabled) to `true`: ```bash # Store your cluster ID in a variable. export RP_CLUSTER_ID= # Retrieve a Redpanda Cloud access token. export RP_CLOUD_TOKEN=`curl -X POST "https://auth.prd.cloud.redpanda.com/oauth/token" \ -H "content-type: application/x-www-form-urlencoded" \ -d "grant_type=client_credentials" \ -d "client_id=" \ -d "client_secret="` # Update your cluster configuration to enable audit logging. curl -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" -X PATCH \ "https://api.cloud.redpanda.com/v1/clusters/${RP_CLUSTER_ID}" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"cluster_configuration":{"custom_properties": {"audit_enabled":true}}}' ``` The [`PATCH /clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request returns the ID of a long-running operation. You can check the status of the operation by polling the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint. To set a cluster property with a secret, you must use the following notation with the secret name: ```bash curl -H "Authorization: Bearer " -X PATCH \ "https://api.cloud.redpanda.com/v1/clusters/" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"cluster_configuration": { "custom_properties": { "iceberg_rest_catalog_client_secret": "${secrets.}" } } }' ``` > 📝 **NOTE** > > Some properties require a rolling restart for the update to take effect. This triggers a [long-running operation](../../api/cloud-byoc-controlplane-api/#lro) that can take several minutes to complete. ## [](#view-cluster-property-values)View cluster property values You can see the value of a cluster configuration property using `rpk` or the Cloud API. ### rpk Use `rpk cluster config get` to view the current cluster property value. For example, to view the current value of [`audit_enabled`](../../../reference/properties/cluster-properties/#audit_enabled), run: ```bash rpk cluster config get audit_enabled ``` ### Cloud API Use the Cloud API to get the current configuration property values for a cluster. Make a [`GET /clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) request, passing the cluster ID as a parameter. The response body contains the current `computed_properties` values. For example, to get the current value of [`audit_enabled`](../../../reference/properties/cluster-properties/#audit_enabled): ```bash # Store your cluster ID in a variable. export RP_CLUSTER_ID= # Retrieve a Redpanda Cloud access token. export RP_CLOUD_TOKEN=`curl -X POST "https://auth.prd.cloud.redpanda.com/oauth/token" \ -H "content-type: application/x-www-form-urlencoded" \ -d "grant_type=client_credentials" \ -d "client_id=" \ -d "client_secret="` # Get your cluster configuration property values. curl -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" -X GET \ "https://api.cloud.redpanda.com/v1/clusters/${RP_CLUSTER_ID}" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ ``` ## [](#suggested-reading)Suggested reading - [Introduction to rpk](../../rpk/intro-to-rpk/) - [Redpanda Cloud API Overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) - [Redpanda Cloud API Quickstart](/api/doc/cloud-controlplane/topic/topic-quickstart) --- # Page 421: Configure Client Connections **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance/configure-client-connections.md --- # Configure Client Connections --- title: Configure Client Connections latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/configure-client-connections page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/configure-client-connections.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/configure-client-connections.adoc description: Learn about guidelines for configuring client connections in Redpanda clusters for optimal availability. page-git-created-date: "2025-11-19" page-git-modified-date: "2025-11-19" --- Optimize the availability of your clusters by configuring and tuning properties. > 💡 **TIP** > > Before you configure connection limits or reconnection settings, start by gathering detailed data about your client connections. > > - Use the [`redpanda_rpc_active_connections` metric](../../../reference/public-metrics-reference/#redpanda_rpc_active_connections) to view current Kafka client connections. > > - For clusters on v25.3 and later, use [`rpk cluster connections list`](../../../reference/rpk/rpk-cluster/rpk-cluster-connections-list/) or the `GET /v1/monitoring/kafka/connections` endpoint in the Data Plane API to identify: > > - Which clients and applications are connected > > - Long-lived connections and long-running requests > > - Connections with no activity > > - Whether any clients are causing excessive load > > > By reviewing connection details, you can make informed decisions about tuning connection limits and troubleshooting issues. > > > See also: [Data Plane API reference](/api/doc/cloud-dataplane/operation/operation-monitoringservice_listkafkaconnections), [Monitor Redpanda Cloud](../../monitor-cloud/#throughput) ## [](#limit-client-connections)Limit client connections To mitigate the risk of a client creating too many connections and using too many system resources, you can configure a Redpanda cluster to impose limits on the number of client connections that can be created. The following Redpanda cluster properties limit the number of connections: - [`kafka_connections_max_per_ip`](../../../reference/properties/cluster-properties/#kafka_connections_max_per_ip): Similar to Kafka’s `max.connections.per.ip`, this sets the maximum number of connections accepted per IP address by a broker. - [`kafka_connections_max_overrides`](../../../reference/properties/cluster-properties/#kafka_connections_max_overrides): A list of IP addresses for which `kafka_connections_max_per_ip` is overridden and doesn’t apply. > 📝 **NOTE** > > - These connection limit properties are disabled by default. You must manually enable them. > > - The total number of connections is not equal to the number of clients, because a client can open multiple connections. As a conservative estimate, for a cluster with N brokers, plan for N + 2 connections per client. ### [](#configure-connection-count-limit-by-client-ip)Configure connection count limit by client IP Configure the `kafka_connections_max_per_ip` property to limit the number of connections from each client IP address. > ❗ **IMPORTANT** > > Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through private link endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client. Similarly, multiple clients running on the same host will share the same IP address, and the limit applies collectively to all those clients. See also: [Configure Cluster Properties](../config-cluster/) #### [](#configure-the-limit)Configure the limit To configure `kafka_connections_max_per_ip` safely without disrupting legitimate clients, follow these steps: 1. Set up your monitoring stack for your cluster. See [Monitor Redpanda Cloud](../../monitor-cloud/). 2. Monitor current connection patterns using the `redpanda_rpc_active_connections` metric with the `redpanda_server="kafka"` filter: ```none redpanda_rpc_active_connections{redpanda_id="CLOUD_CLUSTER_ID", redpanda_server="kafka"} ``` 3. Analyze the connection data to identify the normal range of connections for each broker during typical traffic cycles. For example, in the following Grafana screenshot, the normal range is around 200-300 connections: ![Range of active connections over time](../../../shared/_images/monitor_connections.png) 4. Set the `kafka_connections_max_per_ip` value based on your analysis. Use the upper bound of normal connections observed, or use a lower value if you know how many connections per client IP are being opened. 5. Continue monitoring the connection metrics after applying the limit to ensure that legitimate clients are not affected and that the problematic client is properly controlled. > 📝 **NOTE** > > If you find a high load of unexpected connections from multiple IP addresses, `kafka_connections_max_per_ip` alone may be insufficient. If offending IPs outnumber legitimate client IPs, you may need to set `kafka_connections_max_per_ip` so low that it affects legitimate clients. If this is the case, use `kafka_connections_max_overrides` to exempt known legitimate client IPs from the connection limit. #### [](#limitations)Limitations - Decreasing the limit does not terminate any currently open Kafka API connections. - This limit does not apply to Kafka HTTP Proxy connections. - Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers. - The limit may negatively affect tail latencies across all client connections. - All clients behind the shared IP are collectively subject to the single `kafka_connections_max_per_ip` limit. - Connection rejections occur randomly among clients when the limit is reached. For example, suppose `kafka_connections_max_per_ip` is set to 100, but clients behind a NAT gateway collectively need 150 connections. When the limit is reached, clients can make only some of the connections while others get rejected, leaving the client in a not-working state. - Redpanda may modify this property during internal operations. - Availability incidents caused by misconfiguring this feature are excluded from the Redpanda Cloud SLA. ## [](#configure-client-reconnections)Configure client reconnections You can configure the Kafka client backoff and retry properties to change the default behavior of the clients to suit your failure requirements. Set the following Kafka client properties on your application’s producer or consumer to manage client reconnections: - `reconnect.backoff.ms`: Amount of time to wait before attempting to reconnect to the broker. The default is 50 milliseconds. - `reconnect.backoff.max.ms`: Maximum amount of time in milliseconds to wait when reconnecting to a broker. The backoff increases exponentially for each consecutive connection failure, up to this maximum. The default is 1000 milliseconds (1 second). Additionally, you can use Kafka properties to control message retry behavior. Delivery fails when either the delivery timeout or the number of retries is met. - `delivery.timeout.ms`: Amount of time for message delivery, so messages are not retried forever. The default is 120000 milliseconds (2 minutes). - `retries`: Number of times a producer can retry sending a message before marking it as failed. The default value is 2147483647 for Kafka >= 2.1, or 0 for Kafka <= 2.0. - `retry.backoff.ms`: Amount of time to wait before attempting to retry a failed request to a given topic partition. The default is 100 milliseconds. ## [](#see-also)See also - [Configure Producers](../../../develop/produce-data/configure-producers/) - [Manage Throughput](../manage-throughput/) --- # Page 422: Manage Throughput **URL**: https://docs.redpanda.com/redpanda-cloud/manage/cluster-maintenance/manage-throughput.md --- # Manage Throughput --- title: Manage Throughput latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cluster-maintenance/manage-throughput page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cluster-maintenance/manage-throughput.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/cluster-maintenance/manage-throughput.adoc description: Configure broker-wide and client-specific throughput quotas to prevent resource exhaustion and noisy-neighbor issues. learning-objective-1: Set user-based throughput quotas learning-objective-2: Set client ID-based quotas learning-objective-3: Monitor quota usage and throttling behavior page-git-created-date: "2025-08-19" page-git-modified-date: "2026-03-31" --- Redpanda throttles throughput on ingress and egress independently, and you can configure limits at the broker and client levels. This prevents clients from causing unbounded network and disk usage on brokers. You can configure limits at two levels: - Broker limits: These apply to all clients connected to the broker and restrict total traffic on the broker. See [Broker-wide throughput limits](#broker-wide-throughput-limits). - Client limits: These apply to authenticated users or clients defined by their client ID. You can manage client quotas with [`rpk cluster quotas`](../../../reference/rpk/rpk-cluster/rpk-cluster-quotas/), with the Redpanda Cloud UI, with the [Redpanda Cloud Data Plane API](https://docs.redpanda.com/api/doc/cloud-dataplane/operation/operation-quotaservice_listquotas), or with the Kafka API. When no quotas apply, the client has unlimited throughput. > 📝 **NOTE** > > Throughput throttling is supported for BYOC and Dedicated clusters only. After reading this page, you will be able to: - Set user-based throughput quotas - Set client ID-based quotas - Monitor quota usage and throttling behavior ## [](#view-connected-client-details)View connected client details Before configuring throughput quotas, check the [current produce and consume throughput](../../monitor-cloud/#throughput) of a client. Use the [`rpk cluster connections list`](../../../reference/rpk/rpk-cluster/rpk-cluster-connections-list/) command or the [`GET /v1/monitoring/kafka/connections`](/api/doc/cloud-dataplane/operation/operation-monitoringservice_listkafkaconnections) Data Plane API endpoint to view detailed information about active Kafka client connections. For example, to view a cluster’s connected clients in order of highest current produce throughput, run: ### rpk ```bash rpk cluster connections list --order-by="recent_request_statistics.produce_bytes desc" ``` ```bash UID STATE USER CLIENT-ID IP:PORT NODE SHARD OPEN-TIME IDLE PROD-TPUT/SEC FETCH-TPUT/SEC REQS/MIN b20601a3-624c-4a8c-ab88-717643f01d56 OPEN UNAUTHENTICATED perf-producer-client 127.0.0.1:55012 0 0 9s 0s 78.9MB 0B 292 36338ca5-86b7-4478-ad23-32d49cfaef61 OPEN UNAUTHENTICATED rpk 127.0.0.1:49722 0 0 13s 13.694243104s 0B 0B 1 7e277ef6-0176-4007-b100-6581bfde570f OPEN UNAUTHENTICATED rpk 127.0.0.1:49736 0 0 13s 10.093957335s 0B 0B 2 567d9918-d3dc-4c74-ab5d-85f70cd3ee35 OPEN UNAUTHENTICATED rpk 127.0.0.1:49748 0 0 13s 0.591413542s 0B 0B 5 08616f21-08f9-46e7-8f06-964bd8240d9b OPEN UNAUTHENTICATED rpk 127.0.0.1:49764 0 0 13s 10.094602845s 0B 0B 2 e4d5b57e-5c76-4975-ada8-17a88d68a62d OPEN UNAUTHENTICATED rpk 127.0.0.1:54992 0 0 10s 0.302090085s 0B 14.5MB 27 b41584f3-2662-4185-a4b8-0d8510f5c780 OPEN UNAUTHENTICATED perf-producer-client 127.0.0.1:55002 0 0 8s 7.743592270s 0B 0B 1 62fde947-411d-4ea8-9461-3becc2631b46 CLOSED UNAUTHENTICATED rpk 127.0.0.1:48578 0 0 26s 0.000737836s 0B 0B 1 95387e2e-2ec4-4040-aa5e-4257a3efa1a2 CLOSED UNAUTHENTICATED rpk 127.0.0.1:48564 0 0 26s 0.208180826s 0B 0B 1 ``` ### Data Plane API ```bash curl \ --request GET 'https:///v1/monitoring/kafka/connections' \ --header "Authorization: Bearer $ACCESS_TOKEN" \ --data '{ "filter": "", "order_by": "recent_request_statistics.produce_bytes desc" }' ``` Show example API response ```json { "connections": [ { "node_id": 0, "shard_id": 0, "uid": "b20601a3-624c-4a8c-ab88-717643f01d56", "state": "KAFKA_CONNECTION_STATE_OPEN", "open_time": "2025-10-15T14:15:15.755065000Z", "close_time": "1970-01-01T00:00:00.000000000Z", "authentication_info": { "state": "AUTHENTICATION_STATE_UNAUTHENTICATED", "mechanism": "AUTHENTICATION_MECHANISM_UNSPECIFIED", "user_principal": "" }, "listener_name": "", "tls_info": { "enabled": false }, "source": { "ip_address": "127.0.0.1", "port": 55012 }, "client_id": "perf-producer-client", "client_software_name": "apache-kafka-java", "client_software_version": "3.9.0", "transactional_id": "my-tx-id", "group_id": "", "group_instance_id": "", "group_member_id": "", "api_versions": { "18": 4, "22": 3, "3": 12, "24": 3, "0": 7 }, "idle_duration": "0s", "in_flight_requests": { "sampled_in_flight_requests": [ { "api_key": 0, "in_flight_duration": "0.000406892s" } ], "has_more_requests": false }, "total_request_statistics": { "produce_bytes": "78927173", "fetch_bytes": "0", "request_count": "4853", "produce_batch_count": "4849" }, "recent_request_statistics": { "produce_bytes": "78927173", "fetch_bytes": "0", "request_count": "4853", "produce_batch_count": "4849" } }, ... ], "total_size": "9" } ``` To view connections for a specific client, you can use a filter expression: ### rpk ```bash rpk cluster connections list --client-id="perf-producer-client" ``` ```bash UID STATE USER CLIENT-ID IP:PORT NODE SHARD OPEN-TIME IDLE PROD-TPUT/SEC FETCH-TPUT/SEC REQS/MIN b41584f3-2662-4185-a4b8-0d8510f5c780 OPEN UNAUTHENTICATED perf-producer-client 127.0.0.1:55002 0 0 8s 7.743592270s 0B 0B 1 b20601a3-624c-4a8c-ab88-717643f01d56 OPEN UNAUTHENTICATED perf-producer-client 127.0.0.1:55012 0 0 9s 0s 78.9MB 0B 292 ``` The `USER` field in the connection list shows the authenticated principal. Unauthenticated connections show `UNAUTHENTICATED`, which corresponds to an empty user principal (`user=""`) in quota configurations, not `user=`. ### Data Plane API ```bash curl \ --request GET 'https:///v1/monitoring/kafka/connections' \ --header "Authorization: Bearer $ACCESS_TOKEN" \ --data '{ "filter": "client_id = \"perf-producer-client\"" }' ``` Show example API response ```json { "connections": [ { "node_id": 0, "shard_id": 0, "uid": "b41584f3-2662-4185-a4b8-0d8510f5c780", "state": "KAFKA_CONNECTION_STATE_OPEN", "open_time": "2025-10-15T14:15:15.219538000Z", "close_time": "1970-01-01T00:00:00.000000000Z", "authentication_info": { "state": "AUTHENTICATION_STATE_UNAUTHENTICATED", "mechanism": "AUTHENTICATION_MECHANISM_UNSPECIFIED", "user_principal": "" }, "listener_name": "", "tls_info": { "enabled": false }, "source": { "ip_address": "127.0.0.1", "port": 55002 }, "client_id": "perf-producer-client", "client_software_name": "apache-kafka-java", "client_software_version": "3.9.0", "transactional_id": "", "group_id": "", "group_instance_id": "", "group_member_id": "", "api_versions": { "18": 4, "3": 12, "10": 4 }, "idle_duration": "7.743592270s", "in_flight_requests": { "sampled_in_flight_requests": [], "has_more_requests": false }, "total_request_statistics": { "produce_bytes": "0", "fetch_bytes": "0", "request_count": "3", "produce_batch_count": "0" }, "recent_request_statistics": { "produce_bytes": "0", "fetch_bytes": "0", "request_count": "3", "produce_batch_count": "0" } }, ... ], "total_size": "2" } ``` The user principal field in the connection list shows the authenticated principal. Unauthenticated connections show `AUTHENTICATION_STATE_UNAUTHENTICATED`, which corresponds to an empty user principal (`user=""`) in quota configurations, not `user=`. To view connections for a specific authenticated user: ```bash rpk cluster connections list --user alice ``` This shows all connections from user `alice`, useful for monitoring clients that are subject to user-based quotas. ## [](#broker-wide-throughput-limits)Broker-wide throughput limits Broker-wide throughput limits account for all Kafka API traffic going into or out of the broker, as data is produced to or consumed from a topic. The limit values represent the allowed rate of data in bytes per second passing through in each direction. Redpanda also provides administrators the ability to exclude clients from throughput throttling and to fine-tune which Kafka request types are subject to throttling limits. ## [](#client-throughput-limits)Client throughput limits Redpanda provides configurable throughput quotas for individual clients or authenticated users. Quotas are managed through the Kafka-compatible AlterClientQuotas and DescribeClientQuotas APIs, accessible with `rpk`, Redpanda Console, or Kafka client libraries. Redpanda supports two types of client throughput quotas: - Client ID-based quotas: Limit throughput based on the self-declared `client-id` field. - User-based quotas: Limit throughput based on authenticated user [principal](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#principal). Requires [authentication](../../../security/cloud-authentication/). You can also combine both types for fine-grained control (for example, limiting a specific user when using a specific client application). For conceptual information about quota types, entity hierarchy, precedence rules, and how Redpanda tracks and enforces quotas through throttling, see [About Client Throughput Quotas](../about-throughput-quotas/). ### [](#set-user-based-quotas)Set user-based quotas > ❗ **IMPORTANT** > > User-based quotas require authentication to be enabled. To set up authentication, see [Authentication](../../../security/cloud-authentication/). #### [](#quota-for-a-specific-user)Quota for a specific user To limit throughput for a specific authenticated user across all clients: ```bash rpk cluster quotas alter --add producer_byte_rate=2000000 --name user=alice ``` This limits user `alice` to 2 MB/s for produce requests regardless of the client ID used. To view quotas for a user: ```bash rpk cluster quotas describe --name user=alice ``` Expected output: ```bash user=alice producer_byte_rate=2000000 ``` #### [](#default-quota-for-all-users)Default quota for all users To set a fallback quota for any user without a more specific quota: ```bash rpk cluster quotas alter --add consumer_byte_rate=5000000 --default user ``` This applies a 5 MB/s fetch quota to all authenticated users who don’t have a more specific quota configured. ### [](#remove-a-user-quota)Remove a user quota To remove a quota for a specific user: ```bash rpk cluster quotas alter --delete consumer_byte_rate --name user=alice ``` To remove all quotas for a user: ```bash rpk cluster quotas delete --name user=alice ``` ### [](#set-client-id-based-quotas)Set client ID-based quotas Client ID-based quotas apply to all users using a specific client ID. These quotas do not require authentication. Because the client ID is self-declared, client ID-based quotas are not suitable for guaranteeing isolation between tenants. For multi-tenant environments, Redpanda recommends user-based quotas for per-tenant isolation. #### [](#individual-client-id-throughput-limit)Individual client ID throughput limit > 📝 **NOTE** > > The following sections show how to manage throughput with `rpk`. You can also manage throughput with the [Redpanda Cloud Data Plane API](https://docs.redpanda.com/api/doc/cloud-dataplane/operation/operation-quotaservice_listquotas). To view current throughput quotas set through the Kafka API, run [`rpk cluster quotas describe`](../../../reference/rpk/rpk-cluster/rpk-cluster-quotas-describe/). For example, to see the quotas for client ID `consumer-1`: ```bash rpk cluster quotas describe --name client-id=consumer-1 ``` ```bash client-id=consumer-1 producer_byte_rate=140000 ``` To set a throughput quota for a single client, use the [`rpk cluster quotas alter`](../../../reference/rpk/rpk-cluster/rpk-cluster-quotas-alter/) command. ```bash rpk cluster quotas alter --add consumer_byte_rate=200000 --name client-id=consumer-1 ``` ```bash ENTITY STATUS client-id=consumer-1 OK ``` #### [](#group-of-clients-throughput-limit)Group of clients throughput limit Alternatively, you can view or configure throughput quotas for a group of clients based on a match on client ID prefix. The following example sets the `consumer_byte_rate` quota to client IDs prefixed with `consumer-`: ```bash rpk cluster quotas alter --add consumer_byte_rate=200000 --name client-id-prefix=consumer- ``` > 📝 **NOTE** > > A `client-id-prefix` quota group is not related to Kafka consumer groups. The client ID is an application-defined identifier sent with every request. Client libraries typically default to their own name (such as `kgo`, `rdkafka`, `sarama`, or `perf-producer-client`), but applications can set it using the [`client.id`](https://kafka.apache.org/documentation/#consumerconfigs_client.id) configuration property. This makes prefix-based quotas useful for grouping related applications (for example, `inventory-service-` to match `inventory-service-1`, `inventory-service-2`, etc.). #### [](#default-client-throughput-limit)Default client throughput limit You can apply default throughput limits to clients. Redpanda applies the default limits if no quotas are configured for a specific client ID or prefix. To specify a produce quota of 1 GB/s through the Kafka API (applies across all produce requests to a single broker), run: ```bash rpk cluster quotas alter --default client-id --add producer_byte_rate=1000000000 ``` ### [](#set-combined-user-and-client-quotas)Set combined user and client quotas You can set quotas for specific (user, client ID) combinations for fine-grained control. #### [](#user-with-specific-client)User with specific client To limit a specific user when using a specific client: ```bash rpk cluster quotas alter --add consumer_byte_rate=1000000 --name user=alice --name client-id=consumer-1 ``` User `alice` using `client-id=consumer-1` is limited to a 1 MB/s fetch rate. The same user with a different client ID would use a different quota (or fall back to less specific matches). To view combined quotas: ```bash rpk cluster quotas describe --name user=alice --name client-id=consumer-1 ``` #### [](#user-with-client-prefix)User with client prefix To set a shared quota for a user across multiple clients matching a prefix: ```bash rpk cluster quotas alter --add producer_byte_rate=3000000 --name user=bob --name client-id-prefix=app- ``` All clients used by user `bob` with a client ID starting with `app-` share a combined 3 MB/s produce quota. #### [](#default-user-with-specific-client)Default user with specific client To set a quota for a specific client across all users: ```bash rpk cluster quotas alter --add producer_byte_rate=500000 --default user --name client-id=payment-processor ``` Any user using `client-id=payment-processor` is limited to a 500 KB/s produce rate, unless they have a more specific quota configured. ### [](#bulk-manage-client-throughput-limits)Bulk manage client throughput limits To more easily manage multiple quotas, you can use the `cluster quotas describe` and [`cluster quotas import`](../../../reference/rpk/rpk-cluster/rpk-cluster-quotas-import/) commands to do a bulk export and update. For example, to export all client quotas in JSON format: ```bash rpk cluster quotas describe --format json ``` `rpk cluster quotas import` accepts the output string from `rpk cluster quotas describe --format `: ```bash rpk cluster quotas import --from '{"quotas":[{"entity":[{"name":"analytics-consumer","type":"client-id"}],"values":[{"key":"consumer_byte_rate","values":"10000000"}]},{"entity":[{"name":"analytics-","type":"client-id-prefix"}],"values":[{"key":"producer_byte_rate","values":"10000000"},{"key":"consumer_byte_rate","values":"5000000"}]}]}' ``` You can also save the JSON or YAML output to a file and pass the file path in the `--from` flag. ### [](#view-throughput-limits-in-redpanda-cloud)View throughput limits in Redpanda Cloud You can also use Redpanda Cloud to view enforced limits. In the side menu, go to **Quotas**. ### [](#monitor-client-throughput)Monitor client throughput The following metrics provide insights into client throughput quota usage: - Client quota throughput per rule and quota type: - `/public_metrics` - [`redpanda_kafka_quotas_client_quota_throughput`](../../../reference/public-metrics-reference/#redpanda_kafka_quotas_client_quota_throughput) - Client quota throttling delay per rule and quota type, in seconds: - `/public_metrics` - [`redpanda_kafka_quotas_client_quota_throttle_time`](../../../reference/public-metrics-reference/#redpanda_kafka_quotas_client_quota_throttle_time) To identify which clients are actively connected and generating traffic, see [View connected client details](#view-connected-client-details). Quota metrics use the `redpanda_quota_rule` label to identify which quota was applied to a request. The label distinguishes between different entity types (user, client, or combinations). See the label values in [`redpanda_kafka_quotas_client_quota_throughput`](../../../reference/public-metrics-reference/#redpanda_kafka_quotas_client_quota_throughput). ## [](#see-also)See also - [About Client Throughput Quotas](../about-throughput-quotas/) - [Configure Client Connections](../configure-client-connections/) - [Authentication](../../../security/cloud-authentication/) --- # Page 423: Disaster Recovery **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery.md --- # Disaster Recovery --- title: Disaster Recovery latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/index.adoc description: Learn about disaster recovery options for Redpanda Cloud. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Shadowing complements Redpanda’s existing availability and recovery capabilities. High availability actively protects your day-to-day operations, handling reads and writes seamlessly during node or availability zone failures within a region. Shadowing is your safety net for catastrophic regional disasters. Shadowing delivers near real-time, cross-region replication for mission-critical applications that require rapid failover with minimal data loss. > 📝 **NOTE** > > Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. - [Shadowing](shadowing/) Learn about shadowing for disaster recovery in Redpanda Cloud. --- # Page 424: Shadowing **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing.md --- # Shadowing --- title: Shadowing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/index.adoc description: Learn about shadowing for disaster recovery in Redpanda Cloud. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- > 📝 **NOTE** > > Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. - [Shadowing Overview](overview/) Overview of shadowing for disaster recovery in Redpanda Cloud. - [Configure Shadowing](setup/) Learn how to configure shadowing for disaster recovery. - [Monitor Shadowing](monitor/) Learn how to monitor shadowing for disaster recovery. - [Configure Failover](failover/) Learn how to configure failover for disaster recovery. - [Failover Runbook](failover-runbook/) Step-by-step runbook for failover procedures in disaster recovery. --- # Page 425: Failover Runbook **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing/failover-runbook.md --- # Failover Runbook --- title: Failover Runbook latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/failover-runbook page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/failover-runbook.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc description: Step-by-step runbook for failover procedures in disaster recovery. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- This guide provides step-by-step procedures for emergency failover when your primary Redpanda cluster becomes unavailable. Follow these procedures only during active disasters when immediate failover is required. > ❗ **IMPORTANT** > > This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see [Configure Failover](../failover/). Ensure you have completed the [disaster readiness checklist](../overview/#disaster-readiness-checklist) before an emergency occurs. > 📝 **NOTE** > > Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. ## [](#emergency-failover-procedure)Emergency failover procedure Follow these steps during an active disaster: 1. [Assess the situation](#assess-situation) 2. [Verify shadow cluster status](#verify-shadow-status) 3. [Document current state](#document-state) 4. [Initiate failover](#initiate-failover) 5. [Monitor failover progress](#monitor-progress) 6. [Update application configuration](#update-applications) 7. [Verify application functionality](#verify-functionality) 8. [Clean up and stabilize](#cleanup-stabilize) ### [](#assess-situation)Assess the situation Confirm that failover is necessary: ```bash # Check if the primary cluster is responding rpk cluster info --brokers prod-cluster-1.example.com:9092,prod-cluster-2.example.com:9092 # If primary cluster is down, check shadow cluster health rpk cluster info --brokers shadow-cluster-1.example.com:9092,shadow-cluster-2.example.com:9092 ``` **Decision point**: If the primary cluster is responsive, consider whether failover is actually needed. Partial outages may not require full disaster recovery. **Examples that require full failover:** - Primary cluster is completely unreachable (network partition, regional outage) - Multiple broker failures preventing writes to critical topics - Data center failure affecting majority of brokers - Persistent authentication or authorization failures across the cluster **Examples that may NOT require failover:** - Single broker failure with sufficient replicas remaining - Temporary network connectivity issues affecting some clients - High latency or performance degradation (but cluster still functional) - Non-critical topic or partition unavailability ### [](#verify-shadow-status)Verify shadow cluster status Check the health of your shadow links: #### Cloud UI 1. From the **Shadow Link** page, select the shadow link you want to view. 2. The **Overview** tab shows the state of the shadow link and its topics. #### rpk ```bash # List all shadow links rpk shadow list # Check the configuration of your shadow link rpk shadow describe # Check the status of your disaster recovery link rpk shadow status ``` For detailed command options, see [`rpk shadow list`](../../../../reference/rpk/rpk-shadow/rpk-shadow-list/), [`rpk shadow describe`](../../../../reference/rpk/rpk-shadow/rpk-shadow-describe/), and [`rpk shadow status`](../../../../reference/rpk/rpk-shadow/rpk-shadow-status/). #### Cloud API ```bash # List all shadow links curl "https://api.redpanda.com/v1/shadow-links" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" # Check the configuration of your shadow link curl "https://api.redpanda.com/v1/shadow-links/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" # Get Data Plane API URL of shadow cluster export DATAPLANE_API_URL=`curl https://api.cloud.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .cluster.dataplane_api` # Check the status of your disaster recovery link curl "https://$DATAPLANE_API_URL/v1/shadowlinks/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` Verify that the following conditions exist before proceeding with failover: - Shadow link state should be `ACTIVE`. - Topics should be in `ACTIVE` state (not `FAULTED`). - Replication lag should be reasonable for your RPO requirements. #### [](#understanding-replication-lag)Understanding replication lag Use [`rpk shadow status`](../../../../reference/rpk/rpk-shadow/rpk-shadow-status/) or the [Data Plane API](/api/doc/cloud-dataplane/operation/operation-shadowlinkservice_listshadowlinktopics) to check lag, which shows the message count difference between source and shadow partitions: - **Acceptable lag examples**: 0-1000 messages for low-throughput topics, 0-10000 messages for high-throughput topics - **Concerning lag examples**: Growing lag over 50,000 messages, or lag that continuously increases without recovering - **Critical lag examples**: Lag exceeding your data loss tolerance (for example, if you can only afford to lose 1 minute of data, lag should represent less than 1 minute of typical message volume) ### [](#document-state)Document current state Record the current lag and status before proceeding: #### Cloud UI Capture the status from the **Shadow Link** page. #### rpk ```bash # Capture current status for post-mortem analysis rpk shadow status > failover-status-$(date +%Y%m%d-%H%M%S).log ``` Example output showing healthy replication before failover: shadow link: Overview: NAME UID STATE ACTIVE Tasks: Name Broker\_ID State Reason 1 ACTIVE 2 ACTIVE Topics: Name: , State: ACTIVE Partition SRC\_LSO SRC\_HWM DST\_HWM Lag 0 1234 1468 1456 12 1 2345 2579 2568 11 #### Cloud API ```bash # Capture current status for post-mortem analysis curl "https://$DATAPLANE_API_URL/v1/shadowlinks//topic" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" > failover-status-$(date +%Y%m%d-%H%M%S).log ``` The partition information shows the following: | Field | Description | | --- | --- | | source_last_stable_offset | Source partition last stable offset | | source_high_watermark | Source partition high watermark | | high_watermark | Shadow (destination) partition high watermark | | Lag | Message count difference between source and shadow partitions | > ❗ **IMPORTANT** > > Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see [Shadow link tasks](../overview/#shadow-link-tasks). ### [](#initiate-failover)Initiate failover A complete cluster failover is appropriate If you observe that the source cluster is no longer reachable: #### Cloud UI 1. On your **Shadow Link** page, click **Failover All Topics**. 2. Click to confirm the failover action. The failover process promotes all topics to writable status. #### rpk ```bash # Fail over all topics in the shadow link rpk shadow failover --all ``` For detailed command options, see [`rpk shadow failover`](../../../../reference/rpk/rpk-shadow/rpk-shadow-failover/). #### Cloud API ```bash # Fail over all topics in the shadow link curl -X POST "$DATAPLANE_API_URL/v1/shadowlink//failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` For selective topic failover (when only specific services are affected): #### Cloud UI 1. On your **Shadow Link** page, click the **Failover** button for the topics you want to failover. 2. Click to confirm the failover action. The failover process promotes the selected topics to writable status. #### rpk ```bash # Fail over individual topics rpk shadow failover --topic rpk shadow failover --topic ``` #### Cloud API ```bash # Fail over individual topics curl -X POST "$DATAPLANE_API_URL/v1/shadowlinks//failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "shadowTopicName": "" }' curl -X POST "$DATAPLANE_API_URL/v1/shadowlinks//failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "shadowTopicName": "" }' ``` ### [](#monitor-progress)Monitor failover progress Track the failover process: #### Cloud UI 1. From the **Shadow Link** page, select the shadow link you want to view. 2. Click the **Tasks** tab to view all tasks and their status. #### rpk ```bash # Monitor status until all topics show FAILED_OVER watch -n 5 "rpk shadow status " # Check detailed topic status and lag during emergency rpk shadow status --print-topic ``` Example output during successful failover: shadow link: Overview: NAME UID STATE ACTIVE Tasks: Name Broker\_ID State Reason 1 ACTIVE 2 ACTIVE Topics: Name: , State: FAILED\_OVER Name: , State: FAILED\_OVER Name: , State: FAILING\_OVER #### Cloud API ```bash # Monitor status watch -n 5 'curl "https://$DATAPLANE_API_URL/v1/shadowlinks/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .' # Check detailed topic status and lag during emergency curl "https://$DATAPLANE_API_URL/v1/shadowlinks//topic" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` **Wait for**: All critical topics to reach `FAILED_OVER` state before proceeding. ### [](#update-applications)Update application configuration Redirect your applications to the shadow cluster by updating connection strings in your applications to point to shadow cluster brokers. If using DNS-based service discovery, update DNS records accordingly. Restart applications to pick up new connection settings and verify connectivity from application hosts to shadow cluster. ### [](#verify-functionality)Verify application functionality Test critical application workflows: ```bash # Verify applications can produce messages rpk topic produce --brokers :9092 # Verify applications can consume messages rpk topic consume --brokers :9092 --num 1 ``` Test message production and consumption, consumer group functionality, and critical business workflows to ensure everything is working properly. ### [](#cleanup-stabilize)Clean up and stabilize After all applications are running normally: #### Cloud UI 1. On your **Shadow Link** page, click **Delete**. 2. Type "delete" to confirm the action. #### rpk ```bash # Optional: Delete the shadow link (no longer needed) rpk shadow delete ``` For detailed command options, see [`rpk shadow delete`](../../../../reference/rpk/rpk-shadow/rpk-shadow-delete/). #### Cloud API ```bash # Optional: Delete the shadow link (no longer needed) curl -X DELETE https://api.redpanda.com/v1/shadow-links/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` For the full API reference, see [Control Plane API reference](/api/doc/cloud-controlplane/operation/operation-shadowlinkservice_deleteshadowlink). > 📝 **NOTE** > > This operation [force deletes](#force-delete-warning) the shadow link. Document the time of failover initiation and completion, applications affected and recovery times, data loss estimates based on replication lag, and issues encountered during failover. ## [](#troubleshoot-common-issues)Troubleshoot common issues ### [](#topics-stuck-in-failing_over-state)Topics stuck in FAILING\_OVER state **Problem**: Topics remain in `FAILING_OVER` state for extended periods **Solution**: Check shadow cluster logs for specific error messages and ensure sufficient cluster resources (CPU, memory, disk space) are available on the shadow cluster. Verify network connectivity between shadow cluster nodes and confirm that all shadow topic partitions have elected leaders and the controller partition is properly replicated with an active leader. If topics remain stuck after addressing these cluster health issues and you need immediate failover, you can force delete the shadow link to failover all topics: #### Cloud UI All failover actions in the Cloud UI include force delete functionality by default. When you failover a shadow link, all topics are immediately promoted to writable status. #### rpk ```bash # Force delete the shadow link to failover all topics rpk shadow delete ``` `rpk shadow delete` force deletes the shadow link by default in Redpanda Cloud. #### Cloud API ```bash curl -X DELETE https://api.redpanda.com/v1/shadow-links/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` The `DELETE /shadow-links/` endpoint of the Control Plane API force deletes the shadow link by default in Redpanda Cloud. > ⚠️ **WARNING** > > Force deleting a shadow link immediately fails over all topics in the link. This action is irreversible and should only be used when topics are stuck and you need immediate access to all replicated data. ### [](#topics-in-faulted-state)Topics in FAULTED state **Problem**: Topics show `FAULTED` state and are not replicating **Solution**: Check for authentication issues, network connectivity problems, or source cluster unavailability. Verify that the shadow link service account still has the required permissions on the source cluster. Review shadow cluster logs for specific error messages about the faulted topics. ### [](#application-connection-failures)Application connection failures **Problem**: Applications cannot connect to shadow cluster after failover **Solution**: Verify shadow cluster broker endpoints are correct and check security group and firewall rules. Confirm authentication credentials are valid for the shadow cluster and test network connectivity from application hosts. ### [](#consumer-group-offset-issues)Consumer group offset issues **Problem**: Consumers start from beginning or wrong positions **Solution**: Verify consumer group offsets were replicated (check your filters) and use `rpk group describe ` to check offset positions. If necessary, manually reset offsets to appropriate positions. See [How to manage consumer group offsets in Redpanda](https://support.redpanda.com/hc/en-us/articles/23499121317399-How-to-manage-consumer-group-offsets-in-Redpanda) for detailed reset procedures. ## [](#next-steps)Next steps After successful failover, focus on recovery planning and process improvement. Begin by assessing the source cluster failure and determining whether to restore the original cluster or permanently promote the shadow cluster as your new primary. **Immediate recovery planning:** 1. **Assess source cluster**: Determine root cause of the outage 2. **Plan recovery**: Decide whether to restore source cluster or promote shadow cluster permanently 3. **Data synchronization**: Plan how to synchronize any data produced during failover 4. **Fail forward**: Create a new shadow link with the failed over shadow cluster as source to maintain a DR cluster **Process improvement:** 1. **Document the incident**: Record timeline, impact, and lessons learned 2. **Update runbooks**: Improve procedures based on what you learned 3. **Test regularly**: Schedule regular disaster recovery drills 4. **Review monitoring**: Ensure monitoring caught the issue appropriately --- # Page 426: Configure Failover **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing/failover.md --- # Configure Failover --- title: Configure Failover latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/failover page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/failover.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/failover.adoc description: Learn how to configure failover for disaster recovery. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Failover is the process of modifying shadow topics or an entire shadow cluster from read-only replicas to fully writable resources, and ceasing replication from the source cluster. You can fail over individual topics for selective workload migration or fail over the entire cluster for comprehensive disaster recovery. This critical operation transforms your shadow resources into operational production assets, allowing you to redirect application traffic when the source cluster becomes unavailable. You can failover a shadow link using the Redpanda Cloud UI, `rpk`, or the Data Plane API. > ❗ **IMPORTANT: Experiencing an active disaster?** > > See [Failover Runbook](../failover-runbook/) for immediate step-by-step disaster procedures. > 📝 **NOTE** > > Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. ## [](#failover-behavior)Failover behavior When you initiate failover, Redpanda performs the following operations: 1. **Stops replication**: Halts all data fetching from the source cluster for the specified topics or entire shadow link 2. **Failover topics**: Converts read-only shadow topics into regular, writable topics 3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER` Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported. > 📝 **NOTE** > > To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity. ## [](#failover-commands)Failover commands ### [](#get-data-plane-api-url)Get Data Plane API URL If using the Data Plane API, run the following to get the Data Plane API URL of the shadow cluster: ```bash export DATAPLANE_API_URL=`curl https://api.cloud.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .cluster.dataplane_api` ``` You can perform failover at different levels of granularity to match your disaster recovery needs: ### [](#individual-topic-failover)Individual topic failover To fail over a specific shadow topic while leaving other topics in the shadow link still replicating, run: #### Cloud UI 1. On the **Shadow Link** page, select your shadow link. 2. For any of the topics you want to failover, click the corresponding **Failover** button. 3. Click to confirm the failover action. The failover process promotes the selected topics to writable status. #### rpk ```bash rpk shadow failover --topic ``` For detailed command options, see [`rpk shadow failover`](../../../../reference/rpk/rpk-shadow/rpk-shadow-failover/). #### Data Plane API Send a `POST /shadowlink/{shadow_link_name}/failover` request to the Data Plane API. Specify the name of the shadow topic in the request body: ```bash curl -X POST "$DATAPLANE_API_URL/v1/shadowlink//failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "shadowTopicName": "" }' ``` Use this approach when you need to selectively failover specific workloads or when testing failover procedures. ### [](#complete-shadow-link-failover-cluster-failover)Complete shadow link failover (cluster failover) To fail over all shadow topics associated with the shadow link simultaneously, run: #### Cloud UI 1. On the **Shadow Link** page, select your shadow link. 2. Click **Failover All Topics**. 3. Click to confirm the failover action. The failover process promotes all topics to writable status. #### rpk ```bash rpk shadow failover --all ``` #### Data Plane API Send a `POST /shadowlink/{shadow_link_name}/failover` request to the Data Plane API. If you do not specify a shadow topic in the request body, this command requests a failover of all shadow topics associated with the shadow link: ```bash curl -X POST "$DATAPLANE_API_URL/v1/shadowlink//failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` Use this approach during a complete regional disaster when you need to activate the entire shadow cluster as your new production environment. ### [](#force-delete-shadow-link-emergency-failover)Force delete shadow link (emergency failover) #### Cloud UI All failover actions in the Cloud UI include force delete functionality by default. When you failover a shadow link, all topics are immediately promoted to writable status. #### rpk `rpk shadow delete` force deletes the shadow link by default in Redpanda Cloud: ```bash rpk shadow delete ``` #### Control Plane API Use the Control Plane API to force delete a shadow link: ```bash curl -X DELETE 'https://api.redpanda.com/v1/shadow-links/' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` > ⚠️ **WARNING** > > Force deleting a shadow link is irreversible and immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data. ## [](#failover-states)Failover states ### [](#shadow-link-states)Shadow link states The shadow link itself has a simple state model: - **`ACTIVE`**: Shadow link is operating normally, replicating data - **`PAUSED`**: Shadow link replication is temporarily halted by user action Shadow links do not have dedicated failover states. Instead, the link’s operational status is determined by the collective state of its shadow topics. ### [](#shadow-topic-states)Shadow topic states Individual shadow topics progress through specific states during failover: - **`ACTIVE`**: Normal replication state before failover - **`FAULTED`**: Shadow topic has encountered an error and is not replicating - **`FAILING_OVER`**: Failover initiated, replication stopping - **`FAILED_OVER`**: Failover completed successfully, topic fully writable - **`PAUSED`**: Replication temporarily halted by user action ## [](#monitor-failover-progress)Monitor failover progress To monitor failover progress using the status command, run: ### Cloud UI Track the progress of failover operations from the **Shadow Link** page in the Cloud UI. ### rpk ```bash rpk shadow status ``` The output shows individual topic states and any issues encountered during the failover process. For detailed command options, see [`rpk shadow status`](../../../../reference/rpk/rpk-shadow/rpk-shadow-status/). ### Data Plane API ```bash curl "https://$DATAPLANE_API_URL/v1/shadowlinks/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` Task states during monitoring: - **`ACTIVE`**: Task is operating normally and replicating data - **`FAULTED`**: Task encountered an error and requires attention - **`NOT_RUNNING`**: Task is not currently executing - **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster For detailed information about shadow link tasks and their roles, see [Shadow link tasks](../overview/#shadow-link-tasks). ## [](#post-failover-cluster-behavior)Post-failover cluster behavior After successful failover, your shadow cluster exhibits the following characteristics: **Topic accessibility:** - Failed over topics become fully writable and readable. - Applications can produce and consume messages normally. - All Kafka APIs are available for failedover topics. - Original offsets and timestamps are preserved. **Shadow link status:** - The shadow link remains but stops replicating data. - Link status shows topics in `FAILED_OVER` state. - You can safely delete the shadow link after successful failover. **Operational limitations:** - No automatic fallback mechanism to the original source cluster. - Data transforms remain disabled until you manually re-enable them. - Audit log history from the source cluster is not available (new audit logs begin immediately). ## [](#failover-considerations-and-limitations)Failover considerations and limitations Before implementing failover procedures, understand these key considerations that affect your disaster recovery strategy and operational planning. **Data consistency:** - Some data loss may occur due to replication lag at the time of failover. - Consumer group offsets are preserved, allowing applications to resume from their last committed position. - In-flight transactions at the source cluster are not replicated and will be lost. **Recovery-point-objective (RPO):** The amount of potential data loss depends on replication lag when disaster occurs. Monitor lag metrics to understand your effective RPO. **Network partitions:** If the source cluster becomes accessible again after failover, do not attempt to write to both clusters simultaneously. This creates a scenario with potential data inconsistencies, since metadata starts to diverge. **Testing requirements:** Regularly test failover procedures in non-production environments to validate your disaster recovery processes and measure RTO. ## [](#next-steps)Next steps After completing failover: - Update your application connection strings to point to the shadow cluster - Verify that applications can produce and consume messages normally - Consider deleting the shadow link if failover was successful and permanent For emergency situations, see [Failover Runbook](../failover-runbook/). --- # Page 427: Monitor Shadowing **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing/monitor.md --- # Monitor Shadowing --- title: Monitor Shadowing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/monitor page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/monitor.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/monitor.adoc description: Learn how to monitor shadowing for disaster recovery. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Monitor your [shadow links](../setup/) to ensure proper replication performance and understand your disaster recovery readiness. Use `rpk` commands, metrics, and status information to track shadow link health and troubleshoot issues. > ❗ **IMPORTANT: Experiencing an active disaster?** > > See [Failover Runbook](../failover-runbook/) for immediate step-by-step disaster procedures. ## [](#status-commands)Status commands To list existing shadow links: ### Cloud UI At the organization level of the Cloud UI, navigate to **Shadow Link**. ### rpk ```bash rpk shadow list ``` ### Control Plane API ```bash curl 'https://api.redpanda.com/v1/shadow-links' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` To view shadow link configuration details: ### Cloud UI 1. From the **Shadow Link** page, select the shadow link you want to view. 2. Click the **Tasks** tab to view all tasks and their status. ### rpk ```bash rpk shadow describe ``` For detailed command options, see [`rpk shadow list`](../../../../reference/rpk/rpk-shadow/rpk-shadow-list/) and [`rpk shadow describe`](../../../../reference/rpk/rpk-shadow/rpk-shadow-describe/). This command shows the complete configuration of the shadow link, including connection settings, filters, and synchronization options. ### Control Plane API ```bash curl 'https://api.redpanda.com/v1/shadow-links/' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` To check your shadow link status and ensure proper operation: ### Cloud UI 1. From the **Shadow Link** page, select the shadow link you want to view. 2. Click the **Tasks** tab to view all tasks and their status. ### rpk ```bash rpk shadow status ``` For troubleshooting specific issues, you can use command options to show individual status sections. See [`rpk shadow status`](../../../../reference/rpk/rpk-shadow/rpk-shadow-status/) for available status options. The status output includes the following: ### Cloud API ```bash # Get Data Plane API URL of shadow cluster export DATAPLANE_API_URL=`curl https://api.cloud.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .cluster.dataplane_api` curl "https://$DATAPLANE_API_URL/v1/shadowlinks/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" # View topic state curl "https://$DATAPLANE_API_URL/v1/shadowlinks//topic" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` The status includes the following: - **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`). - **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`). - **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see [Shadow link tasks](../overview/#shadow-link-tasks). - **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM). ## [](#shadow-link-metrics)Metrics Shadowing provides comprehensive metrics to track replication performance and health with the [`public_metrics`](../../../../reference/public-metrics-reference/) endpoint. | Metric | Type | Description | | --- | --- | --- | | redpanda_shadow_link_shadow_lag | Gauge | The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by shadow_link_name, topic, and partition to understand replication lag for each partition. | | redpanda_shadow_link_total_bytes_fetched | Count | The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by shadow_link_name and shard to track data transfer volume from the source cluster. | | redpanda_shadow_link_total_bytes_written | Count | The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses shadow_link_name and shard labels to monitor data written to the shadow cluster. | | redpanda_shadow_link_client_errors | Count | The number of errors seen by the client. Track by shadow_link_name and shard to identify connection or protocol issues between clusters. | | redpanda_shadow_link_shadow_topic_state | Gauge | Number of shadow topics in the respective states. Labeled by shadow_link_name and state to monitor topic state distribution across your shadow links. | | redpanda_shadow_link_total_records_fetched | Count | The total number of records fetched by the sharded replicator (records received by the client). Monitor by shadow_link_name and shard to track message throughput from the source. | | redpanda_shadow_link_total_records_written | Count | The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses shadow_link_name and shard labels to monitor message throughput to the shadow cluster. | See also: [Metrics Reference](../../../../reference/public-metrics-reference/) ## [](#monitoring-best-practices)Monitoring best practices ### [](#health-check-procedures)Health check procedures Establish regular monitoring workflows to ensure shadow link health: #### Cloud UI 1. From the **Shadow Link** page, select the shadow link you want to view. 2. Click the **Tasks** tab to view all tasks and their status. #### rpk ```bash # Check all shadow links are active rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy" # Monitor lag for critical topics rpk shadow status | grep -E "LAG|Lag" ``` #### Cloud API ```bash # Check all shadow links are active curl 'https://api.redpanda.com/v1/shadow-links' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | \ jq -r 'if all(.state == "SHADOW_LINK_STATE_ACTIVE") then "All shadow links healthy" else .[] | select(.state != "SHADOW_LINK_STATE_ACTIVE") end' # Monitor lag for critical topics curl "https://$DATAPLANE_API_URL/v1/shadowlinks//topic" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" ``` ### [](#alert-conditions)Alert conditions Configure monitoring alerts for the following conditions, which indicate problems with Shadowing: - **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements - **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly - **Topic state changes**: When topics move to `FAULTED` state - **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states - **Throughput drops**: When bytes/records fetched drops significantly - **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues For more information about shadow link tasks and their states, see [Shadow link tasks](../overview/#shadow-link-tasks). --- # Page 428: Shadowing Overview **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing/overview.md --- # Shadowing Overview --- title: Shadowing Overview latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/overview.adoc description: Overview of shadowing for disaster recovery in Redpanda Cloud. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- > 📝 **NOTE** > > Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. Shadowing is Redpanda’s enterprise-grade disaster recovery solution that establishes asynchronous, offset-preserving replication between two distinct Redpanda clusters. A cluster is able to create a dedicated client that continuously replicates source cluster data, including offsets, timestamps, and cluster metadata. This creates a read-only shadow cluster that you can quickly failover to handle production traffic during a disaster. Shadowing keeps data flowing, even during regional outages. > ❗ **IMPORTANT: Experiencing an active disaster?** > > See [Failover Runbook](../failover-runbook/) for immediate step-by-step disaster procedures. Unlike traditional replication tools that re-produce messages, Shadowing copies data at the byte level, ensuring shadow topics contain identical copies of source topics with preserved offsets and timestamps. Shadowing replicates: - **Topic data**: All records with preserved offsets and timestamps - **Topic configurations**: Partition counts, retention policies, and other topic properties - **Consumer group offsets**: Enables seamless consumer resumption after failover - **Access control lists (ACLs)**: User permissions and security policies - **Schema Registry data**: Schema definitions and compatibility settings ## [](#how-shadowing-fits-into-disaster-recovery)How Shadowing fits into disaster recovery Shadowing addresses enterprise disaster recovery requirements driven by regulatory compliance and business continuity needs. Organizations typically want to minimize both recovery time objective (RTO) and recovery point objective (RPO), and Shadowing asynchronous replication helps you achieve both goals by reducing data loss during regional outages and enabling rapid application recovery. The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster. > 📝 **NOTE** > > To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity. Shadowing complements Redpanda’s existing availability and recovery capabilities. High availability actively protects your day-to-day operations, handling reads and writes seamlessly during node or availability zone failures within a region. Shadowing is your safety net for catastrophic regional disasters. Shadowing delivers near real-time, cross-region replication for mission-critical applications that require rapid failover with minimal data loss. ## [](#limitations)Limitations Shadowing for disaster recovery currently has the following limitations: - Shadowing is designed for active-passive disaster recovery scenarios. Each shadow cluster can maintain only one shadow link. - Shadowing operates exclusively in asynchronous mode and doesn’t support active-active configurations. This means there will always be some replication lag. - [Data transforms](../../../../develop/data-transforms/) are not supported on shadow clusters while Shadowing is active. Writing to shadow topics is blocked. - During a disaster, [audit log](../../../audit-logging/) history from the source cluster is lost, though the shadow cluster begins generating new audit logs immediately after the failover. - After you failover shadow topics, automatic fallback to the original source cluster is not supported. ## [](#shadow-link-tasks)Shadow link tasks Shadow linking operates through specialized tasks that handle different aspects of replication. If you use a `shadow-config.yaml` configuration file to create the shadow link, each task corresponds to a section in the file. Tasks run continuously to maintain synchronization with the source cluster. #### Source Topic Sync The **Source Topic Sync task** manages topic discovery and metadata synchronization. This task periodically queries the source cluster to discover available topics, applies your configured topic filters to determine which topics should become shadow topics, and synchronizes topic properties between clusters. The task is controlled by the `topic_metadata_sync_options` section in the configuration file. It includes: - **Auto-creation filters**: Determines which source topics automatically become shadow topics - **Property synchronization**: Controls which topic properties replicate from source to shadow - **Starting offset**: Sets where new shadow topics begin replication (earliest, latest, or timestamp-based) - **Sync interval**: How frequently to check for new topics and property changes When this task discovers a new topic that matches your filters, it creates the corresponding shadow topic and begins replication from your configured starting offset. #### Consumer Group Shadowing The **Consumer Group Shadowing task** replicates consumer group offsets and membership information from the source cluster. This ensures that consumer applications can resume processing from the correct position after failover. The task is controlled by the `consumer_offset_sync_options` section in the configuration file. It includes: - **Group filters**: Determines which consumer groups have their offsets replicated - **Sync interval**: How frequently to synchronize consumer group offsets - **Offset clamping**: Automatically adjusts replicated offsets to valid ranges on the shadow cluster This task runs on brokers that host the `__consumer_offsets` topic and continuously tracks consumer group coordinators to optimize offset synchronization. #### Security Migrator The **Security Migrator task** replicates security policies, primarily ACLs (access control lists), from the source cluster to maintain consistent authorization across both environments. The task is controlled by the `security_sync_options` section in the configuration file. It includes: - **ACL filters**: Determines which security policies replicate - **Sync interval**: How frequently to synchronize security settings By default, all ACLs replicate to ensure your shadow cluster maintains the same security posture as your source cluster. ### [](#task-status-and-monitoring)Task status and monitoring Each task reports its status through the shadow link status API. Task states include: - **`ACTIVE`**: Task is running normally and performing synchronization - **`PAUSED`**: Task has been manually paused through configuration - **`FAULTED`**: Task encountered an error and requires attention - **`NOT_RUNNING`**: Task is not currently executing - **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster You can pause individual tasks by setting the `paused` field to `true` in the corresponding configuration section. This allows you to selectively disable parts of the replication process without affecting the entire shadow link. For monitoring task health and troubleshooting task issues, see [Monitor Shadowing](../monitor/). ## [](#what-gets-replicated)What gets replicated Shadowing replicates your topic data with complete fidelity, preserving all message records with their original offsets, timestamps, headers, and metadata. The partition structure remains identical between source and shadow clusters, ensuring applications can resume processing from the exact same position after failover. Consumer group data flows according to your group filters, replicating offsets and membership information for matched groups. ACLs replicate based on your security filters. Schema Registry data synchronizes schema definitions, versions, and compatibility settings. Partition count is always replicated to ensure the shadow topic matches the source topic’s partition structure. ### [](#topic-properties-replication)Topic properties replication The [Source Topic Sync task](#shadow-link-tasks) handles topic property replication. For topic properties, Redpanda follows these replication rules: **Never replicated** - `redpanda.remote.readreplica` - `redpanda.remote.recovery` - `redpanda.remote.allowgaps` - `redpanda.virtual.cluster.id` - `redpanda.leaders.preference` - `redpanda.cloud_topic.enabled` **Always replicated** - `max.message.bytes` - `cleanup.policy` - `message.timestamp.type` **Always replicated (unless `exclude_default` is `true`)** - `compression.type` - `retention.bytes` - `retention.ms` - `delete.retention.ms` - `replication.factor` - `min.compaction.lag.ms` - `max.compaction.lag.ms` To replicate additional topic properties, explicitly list them in `synced_shadow_topic_properties`. The filtering system you configure determines the precise scope of replication across all components, allowing you to balance comprehensive disaster recovery with operational efficiency. ## [](#best-practices)Best practices To ensure reliable disaster recovery with Shadowing: - **Do not modify shadow topic properties**: Avoid modifying synced topic properties on shadow topics, as these properties automatically revert to source topic values. ## [](#implementation-overview)Implementation overview Choose your implementation approach: - **[Setup and Configuration](../setup/)**: Initial shadow configuration, authentication, and topic selection - **[Monitoring and Operations](../monitor/)**: Health checks, lag monitoring, and operational procedures - **[Planned Failover](../failover/)**: Controlled disaster recovery testing and migrations - **[Failover Runbook](../failover-runbook/)**: Rapid disaster response procedures > 💡 **TIP** > > You can create and manage shadow links with the Redpanda Cloud UI, the [Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview), or `rpk`, giving you flexibility in how you interact with your disaster recovery infrastructure. ## [](#next-steps)Next steps After setting up Shadowing for your Redpanda clusters, consider these additional steps: - **Test your disaster recovery procedures**: Regularly practice failover scenarios in a non-production environment. See [Failover Runbook](../failover-runbook/) for step-by-step disaster procedures. - **Monitor shadow link health**: Set up alerting on the metrics described above to ensure early detection of replication issues. - **Implement automated failover**: Consider developing automation scripts that can detect outages and initiate failover based on predefined criteria. - **Review security policies**: Ensure your ACL filters replicate the appropriate security settings for your disaster recovery environment. - **Document your configuration**: Maintain up-to-date documentation of your shadow link configuration, including network settings, authentication details, and filter definitions. --- # Page 429: Configure Shadowing **URL**: https://docs.redpanda.com/redpanda-cloud/manage/disaster-recovery/shadowing/setup.md --- # Configure Shadowing --- title: Configure Shadowing latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: disaster-recovery/shadowing/setup page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: disaster-recovery/shadowing/setup.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/disaster-recovery/shadowing/setup.adoc description: Learn how to configure shadowing for disaster recovery. page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- You can create and manage shadow links with the Redpanda Cloud UI, the [Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview), or `rpk`, giving you flexibility in how you interact with your disaster recovery infrastructure. > 💡 **TIP** > > Deploy clusters in different geographic regions to protect against regional disasters. ## [](#prerequisites)Prerequisites ### [](#license-and-cluster-requirements)License and cluster requirements Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. ### [](#cluster-configuration)Cluster configuration The shadow cluster must have the [`enable_shadow_linking`](../../../../reference/properties/cluster-properties/#enable_shadow_linking) cluster property set to `true`. > 📝 **NOTE** > > Starting with Redpanda v25.3, this cluster property is enabled by default on new Redpanda Cloud clusters. For existing clusters on versions earlier than v25.3, you must enable this property manually. See [Configure Cluster Properties](../../../cluster-maintenance/config-cluster/). ### [](#replication-service-permissions)Replication service permissions You must configure a service account on the source cluster with the following [ACL](../../../../security/authorization/acl/) permissions for shadow link replication: - **Topics**: `read` permission on all topics you want to replicate - **Topic configurations**: `describe_configs` permission on topics for configuration synchronization - **Consumer groups**: `describe` and `read` permission on consumer groups for offset replication - **ACLs**: `describe` permission on ACL resources to replicate security policies - **Cluster**: `describe` permission on the cluster resource to access ACLs This service account authenticates from the shadow cluster to the source cluster and performs the actual data replication. The credentials for this account are provided when you set up the shadow link. ### [](#network-and-authentication)Network and authentication You must configure network connectivity between clusters with appropriate firewall rules to allow the shadow cluster to connect to the source cluster for data replication. Shadowing uses a pull-based architecture where the shadow cluster fetches data from the source cluster. For detailed networking configuration, see [Networking](#networking). If using [authentication](../../../../security/cloud-authentication/) for the shadow link connection, configure the source cluster with your chosen authentication method (SASL/SCRAM, TLS, mTLS) and ensure the shadow cluster has the proper credentials to authenticate to the source cluster. ## [](#set-up-shadowing)Set up Shadowing To set up Shadowing, you need to create a shadow link and configure filters to select which topics, consumer groups, ACLs, and Schema Registry data to replicate. If using the Cloud API to set up Shadowing, you must [authenticate](/api/doc/cloud-controlplane/authentication) to the API by including an access token in your requests. ### [](#create-a-shadow-link)Create a shadow link Any BYOC or Dedicated cluster can create a shadow link to a source cluster. > 💡 **TIP** > > You can use `rpk` to generate a sample configuration file with common filter patterns: > > ```bash > # Generate a sample configuration file with placeholder values > rpk shadow config generate --for-cloud -o shadow-config.yaml > ``` > > This creates a complete YAML configuration file that you can customize for your environment. The template includes all available fields with comments explaining their purpose. For detailed command options, see [`rpk shadow config generate --for-cloud`](../../../../reference/rpk/rpk-shadow/rpk-shadow-config-generate/). Explore the configuration file ```yaml # Sample ShadowLinkConfig YAML with all fields name: # Unique name for this shadow link, example: "production-dr" cloud_options: # Use either source_redpanda_id or bootstrap_servers: only one is required. source_redpanda_id: # Optional: 20 character lowercase ID of the cluster # Example: m7xtv2qq5njbhwruk88f shadow_redpanda_id: # 20 character lowercase ID of the cluster # Example: m7xtv2qq5njbhwruk88f client_options: bootstrap_servers: # Source cluster brokers to connect to - : # Example: "prod-kafka-1.example.com:9092" - : # Example: "prod-kafka-2.example.com:9092" - : # Example: "prod-kafka-3.example.com:9092" source_cluster_id: # Optional: UUID assigned by Redpanda # Example: a882bc98-7aca-40f6-a657-36a0b4daf1fd # This UUID is not available in Redpanda Cloud. # TLS settings using PEM strings tls_settings: enabled: true tls_pem_settings: ca: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- key: ${secrets.} cert: |- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- # Create SASL credentials in the source cluster. # Then, with this configuration, ensure the shadow cluster uses the credentials # to authenticate to the source cluster. authentication_configuration: # SASL/SCRAM authentication scram_configuration: username: # SASL/SCRAM username, example: "shadow-replication-user" password: ${secrets.} # ID of secret containing SASL/SCRAM password scram_mechanism: SCRAM_SHA_256 # SCRAM mechanism: "SCRAM_SHA_256" or "SCRAM_SHA_512" # Connection tuning - adjust based on network characteristics metadata_max_age_ms: 10000 # How often to refresh cluster metadata (default: 10000ms) connection_timeout_ms: 1000 # Connection timeout (default: 1000ms, increase for high latency) retry_backoff_ms: 100 # Backoff between retries (default: 100ms) fetch_wait_max_ms: 500 # Max time to wait for fetch requests (default: 500ms) fetch_min_bytes: 5242880 # Min bytes per fetch (default: 5MB) fetch_max_bytes: 20971520 # Max bytes per fetch (default: 20MB) fetch_partition_max_bytes: 1048576 # Max bytes per partition fetch (default: 1MB) topic_metadata_sync_options: interval: 30s # How often to sync topic metadata (examples: "30s", "1m", "5m") auto_create_shadow_topic_filters: # Filters for automatic topic creation - pattern_type: LITERAL # Include all topics (wildcard) filter_type: INCLUDE name: '*' - pattern_type: PREFIX # Exclude topics with specific prefix filter_type: EXCLUDE name: # Examples: "temp-", "test-", "debug-" synced_shadow_topic_properties: # Additional topic properties to sync (beyond defaults) - retention.ms # Topic retention time - segment.ms # Segment roll time exclude_default: false # Include default properties (compression, retention, etc.) start_at_earliest: {} # Start from the beginning of source topics (default) paused: false # Enable topic metadata synchronization consumer_offset_sync_options: interval: 30s # How often to sync consumer group offsets paused: false # Enable consumer offset synchronization group_filters: # Filters for consumer groups to sync - pattern_type: LITERAL filter_type: INCLUDE name: '*' # Include all consumer groups security_sync_options: interval: 30s # How often to sync security settings paused: false # Enable security settings synchronization acl_filters: # Filters for ACLs to sync - resource_filter: resource_type: TOPIC # Resource type: "TOPIC", "GROUP", "CLUSTER" pattern_type: PREFIXED # Pattern type: "LITERAL", "PREFIXED" name: # Examples: "prod-", "app-data-" access_filter: principal: User: # Principal name, example: "User:app-service" operation: ANY # Operation: "READ", "WRITE", "CREATE", "DELETE", "ALTER", "DESCRIBE", "ANY" permission_type: ALLOW # Permission: "ALLOW" or "DENY" host: '*' # Host pattern, examples: "*", "10.0.0.0/8", "app-server.example.com" schema_registry_sync_options: # Schema Registry synchronization options shadow_schema_registry_topic: {} # Enable byte-for-byte _schemas topic replication ``` Because the shadow cluster pulls from the source cluster, the shadow cluster requires credentials to connect to the source cluster. And because you cannot store plaintext passwords in Redpanda Cloud, you must create a secret to hold the password for the user on the source cluster. If using mTLS, you must also create a secret to hold the key of the client certificate for the client to authenticate. Reference that secret in `client_options.tls_settings.key_file` in the configuration file. 1. In the shadow cluster, create the secret: #### Cloud UI In the shadow cluster, go to the **Secrets Store** page and create a secret for the source cluster user, scoped to Redpanda Cluster. If necessary, first create the user with all ACLs enabled in the source cluster. #### rpk In the shadow cluster, create a secret to store the authentication credential that the cluster will use (`"scram_configuration": "password"` in the example configuration in the next step). Your secret must be scoped to "Redpanda Cluster". Use [`rpk security secret create`](../../../../reference/rpk/rpk-security/rpk-security-secret-create/) to create the secret from the command line. #### Data Plane API In the shadow cluster, create a secret to store the authentication credential that the cluster will use (`"scram_configuration": "password"` in the example configuration in the next step). Your secret must be scoped to "Redpanda Cluster". Use the [Data Plane API](../../../api/cloud-dataplane-api/) to programmatically create the secret. 2. In the shadow cluster, create a shadow link to the source cluster. #### Cloud UI 1. At the organization level of the Cloud UI, navigate to **Shadow Link**. 2. Click **Create shadow link**. 3. Enter a unique name for the shadow link. The name must start and end with lowercase alphanumeric characters, hyphens allowed. 4. Select the source cluster from which data will be replicated. You can select an existing Redpanda Cloud cluster, or you can enter a bootstrap server URL to connect to any Kafka-compatible cluster. For an existing Redpanda Cloud cluster, you select the specific cluster on the next page. 5. Enter the authorization and authentication details from the source cluster, including the user and the name of the secret containing the password created in the previous step. 6. Optionally, expand **Advanced options** to configure client connection properties. 7. Click **Save** to apply changes. #### rpk 1. Run `rpk cloud login`. Select your shadow cluster when prompted. 2. To create a shadow link with the source cluster using `rpk`, run the following command from the shadow cluster: ```bash # When logged in, optionally create a new rpk profile to easily # switch to the shadow cluster rpk profile create --from-cloud shadow-cluster # Use the generated configuration file to create the shadow link rpk shadow create --config-file shadow-config.yaml ``` For detailed command options, see [`rpk shadow create`](../../../../reference/rpk/rpk-shadow/rpk-shadow-create/). > 💡 **TIP** > > Use [`rpk profile`](../../../rpk/config-rpk-profile/) to save your cluster connection details and credentials for both source and shadow clusters. This allows you to easily switch between the two configurations. #### Control Plane API To create a shadow link using the Control Plane API, make a `POST /shadow-links` request from the shadow cluster: ```bash curl -X POST 'https://api.redpanda.com/v1/shadow-links' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "shadow_link": { "shadow_redpanda_id": "", "name": "", "client_options": { "bootstrap_servers": [":", ":", ":"], "tls_settings": { "enabled": true }, "authentication_configuration": { "scram_configuration": { "username": "", "password": "${secrets.}", "scram_mechanism": "SCRAM_MECHANISM_SCRAM_SHA_256" } } }, "topic_metadata_sync_options": { "interval": "30s", "auto_create_shadow_topic_filters": [ { "name": "*", "filter_type": "FILTER_TYPE_INCLUDE", "pattern_type": "PATTERN_TYPE_LITERAL" }, { "name": "", "filter_type": "FILTER_TYPE_EXCLUDE", "pattern_type": "PATTERN_TYPE_PREFIX" } ], "start_at_earliest": {}, "paused": false }, "consumer_offset_sync_options": { "paused": true }, "security_sync_options": { "paused": true } } }' ``` Replace the placeholders with your own values: - ``: ID of the shadow (destination) cluster. - ``: Unique name for this shadow link, for example, `production-dr`. - `:`, `: …​`: Source cluster brokers to connect to, for example, `prod-kafka-1.example.com:9092`, `prod-kafka-2.example.com:9092`. - ``: SASL/SCRAM username, for example, `shadow-replication-user`. You create this user in the source cluster. - ``: The name of the secret containing the SASL/SCRAM password from the source cluster. - ``: Exclude topics that use this prefix, for example, `temp-`, `test-`, `debug-`. The response object represents the [long-running operation](../../../api/cloud-byoc-controlplane-api/#lro) of creating a shadow link. For the full API reference, see [Control Plane API reference](/api/doc/cloud-controlplane/operation/operation-shadowlinkservice_createshadowlink). ### [](#set-filters)Set filters Filters determine which resources Shadowing automatically creates when establishing your shadow link. Topic filters select which topics Shadowing automatically creates as shadow topics when they appear on the source cluster. After Shadowing creates a shadow topic, it continues replicating until you failover the topic, delete it, or delete the entire shadow link. Consumer group and ACL filters control which groups and security policies replicate to maintain application functionality. #### [](#filter-types-and-patterns)Filter types and patterns Each filter uses two key settings: - **Pattern type**: Determines how names are matched - `LITERAL`: Matches names exactly (including the special wildcard `*` to match all items) - `PREFIX`: Matches names that start with the specified string - **Filter type**: Specifies whether to INCLUDE or EXCLUDE matching items - `INCLUDE`: Replicate items that match the pattern - `EXCLUDE`: Skip items that match the pattern #### [](#filter-processing-rules)Filter processing rules Redpanda processes filters in the order you define them with EXCLUDE filters taking precedence. Design your filter lists carefully: 1. **Exclude filters win**: If any EXCLUDE filter matches a resource, it is excluded regardless of INCLUDE filters. 2. **Order matters for INCLUDE filters**: Among INCLUDE filters, the first match determines the result. 3. **Default behavior**: Items that don’t match any filter are excluded from replication. #### [](#common-filtering-patterns)Common filtering patterns Replicate all topics except test topics: ```yaml topic_metadata_sync_options: auto_create_shadow_topic_filters: - pattern_type: PREFIX filter_type: EXCLUDE name: test- # Exclude all test topics - pattern_type: LITERAL filter_type: INCLUDE name: '*' # Include all other topics ``` Replicate only production topics: ```yaml topic_metadata_sync_options: auto_create_shadow_topic_filters: - pattern_type: PREFIX filter_type: INCLUDE name: prod- # Include production topics - pattern_type: PREFIX filter_type: INCLUDE name: production- # Alternative production prefix ``` Replicate specific consumer groups: ```yaml consumer_offset_sync_options: group_filters: - pattern_type: LITERAL filter_type: INCLUDE name: critical-app-consumers # Include specific consumer group - pattern_type: PREFIX filter_type: INCLUDE name: prod-consumer- # Include production consumers ``` #### [](#schema-registry-synchronization)Schema Registry synchronization Shadowing can replicate Schema Registry data by shadowing the `_schemas` system topic. When enabled, this provides byte-for-byte replication of schema definitions, versions, and compatibility settings. To enable Schema Registry synchronization, add the following to your shadow link configuration: ```yaml schema_registry_sync_options: shadow_schema_registry_topic: {} ``` Requirements: - The `_schemas` topic must exist on the source cluster - The `_schemas` topic must not exist on the shadow cluster, or must be empty - Once enabled, the `_schemas` topic will be replicated completely Important: After the `_schemas` topic becomes a shadow topic, it cannot be stopped without either failing over the topic or deleting it entirely. #### [](#system-topic-filtering-rules)System topic filtering rules Redpanda system topics have the following specific filtering restrictions: - Literal filters for `__consumer_offsets` and `_redpanda.audit_log` are rejected. - Prefix filters for topics starting with `_redpanda` or `__redpanda` are rejected. - Wildcard `*` filters will not match topics that start with `_redpanda` or `__redpanda`. - To shadow specific system topics, you must provide explicit literal filters for those individual topics. #### [](#acl-filtering)ACL filtering ACLs are replicated by the [Security Migrator task](../overview/#shadow-link-tasks). This is recommended to ensure that your shadow cluster has the same permissions as your source cluster. To configure ACL filters: ```yaml security_sync_options: acl_filters: # Include read permissions for production topics - resource_filter: resource_type: TOPIC # Filter by topic resource pattern_type: PREFIXED # Match by prefix name: prod- # Production topic prefix access_filter: principal: User:app-user # Application service user operation: READ # Read operation permission_type: ALLOW # Allow permission host: '*' # Any host # Include consumer group permissions - resource_filter: resource_type: GROUP # Filter by consumer group pattern_type: LITERAL # Exact match name: '*' # All consumer groups access_filter: principal: User:app-user # Same application user operation: READ # Read operation permission_type: ALLOW # Allow permission host: '*' # Any host ``` #### [](#consumer-group-filtering-and-behavior)Consumer group filtering and behavior Consumer group filters determine which consumer groups have their offsets replicated to the shadow cluster by the [Consumer Group Shadowing task](../overview/#shadow-link-tasks). Offset replication operates selectively within each consumer group. Only committed offsets for active shadow topics are synchronized, even if the consumer group has offsets for additional topics that aren’t being shadowed. For example, if consumer group "app-consumers" has committed offsets for "orders", "payments", and "inventory" topics, but only "orders" is an active shadow topic, then only the "orders" offsets will be replicated to the shadow cluster. ```yaml consumer_offset_sync_options: interval: 30s # How often to sync consumer group offsets paused: false # Enable consumer offset synchronization group_filters: - pattern_type: PREFIX filter_type: INCLUDE name: prod-consumer- # Include production consumer groups - pattern_type: LITERAL filter_type: EXCLUDE name: test-consumer-group # Exclude specific test groups ``` ##### [](#important-consumer-group-considerations)Important consumer group considerations **Avoid name conflicts:** If you plan to consume data from the shadow cluster, do not use the same consumer group names as those used on the source cluster. While this won’t break shadow linking, it can impact your RPO/RTO because conflicting group names may interfere with offset replication and consumer resumption during disaster recovery. **Offset clamping:** When Redpanda replicates consumer group offsets from the source cluster, offsets are automatically "clamped" during the commit process on the shadow cluster. If a committed offset from the source cluster is above the high watermark (HWM) of the corresponding shadow partition, Redpanda clamps the offset to the shadow partition’s HWM before committing it to the shadow cluster. This ensures offsets remain valid and prevents consumers from seeking beyond available data on the shadow cluster. #### [](#starting-offset-for-new-shadow-topics)Starting offset for new shadow topics When the [Source Topic Sync task](../overview/#shadow-link-tasks) creates a shadow topic for the first time, you can control where replication begins on the source topic. This setting only applies to empty shadow partitions and is crucial for disaster recovery planning. Changing this configuration only affects new shadow topics, existing shadow topics continue replicating from their current position. ```yaml topic_metadata_sync_options: start_at_earliest: {} ``` Alternatively, to start from the most recent offset: ```yaml topic_metadata_sync_options: start_at_latest: {} ``` Or to start from a specific timestamp: ```yaml topic_metadata_sync_options: start_at_timestamp: 2024-01-01T00:00:00Z ``` Starting offset options: - **`earliest`** (default): This replicates all existing data from the source topic. Use this for complete disaster recovery where you need full data history. - **`latest`**: This starts replication from the current end of the source topic, skipping existing data. Use this when you only need new data for disaster recovery and want to minimize initial replication time. - **`timestamp`**: This starts replication from the first record with a timestamp at or after the specified time. Use this for point-in-time disaster recovery scenarios. > ❗ **IMPORTANT** > > The starting offset only affects **new shadow topics**. After a shadow topic exists and has data, changing this setting has no effect on that topic’s replication. #### [](#networking)Networking Configure network connectivity between your source and shadow clusters to enable shadow link replication. The shadow cluster initiates connections to the source cluster using a pull-based architecture. For additional details about networking, see [Network and authentication](#network-and-authentication). ##### [](#connection-requirements)Connection requirements - **Direction**: Shadow cluster connects to source cluster (outbound from shadow, inbound to source) - **Protocol**: Kafka protocol over TCP (default port 9092, or your configured listener ports) - **Persistence**: Connections remain active for continuous replication ##### [](#firewall-configuration)Firewall configuration You must configure firewall rules to allow the shadow cluster to reach the source cluster. **On the source cluster network:** - Allow inbound TCP connections on Kafka listener ports (typically 9092). - Allow connections from the shadow cluster’s IP addresses or subnets. **On the shadow cluster network:** - Allow outbound TCP connections to the source cluster’s Kafka listener ports. - Ensure DNS resolution works for source cluster hostnames. ##### [](#bootstrap-servers)Bootstrap servers Specify multiple bootstrap servers in your shadow link configuration for high availability: ```yaml client_options: bootstrap_servers: # Source cluster brokers to connect to - : # Example: "prod-kafka-1.example.com:9092" - : # Example: "prod-kafka-2.example.com:9092" - : # Example: "prod-kafka-3.example.com:9092" ``` The shadow cluster uses these addresses to discover all brokers in the source cluster. If one bootstrap server is unavailable, the shadow cluster tries the next one in the list. ##### [](#network-security)Network security For production deployments, secure the network connection between clusters: TLS encryption: ```yaml client_options: tls_settings: enabled: true # Enable TLS tls_pem_settings: ca: |- # CA certificate in PEM format -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- key: ${secrets.} # Client private key (can use secrets reference) cert: |- # Optional: Client certificate in PEM format for mutual TLS -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- do_not_set_sni_hostname: false # Optional: Skip SNI hostname when using TLS (default: false) ``` Authentication: ```yaml client_options: authentication_configuration: # SASL/SCRAM authentication. # Create SASL credentials in the source cluster. # Then, with this configuration, ensure the shadow cluster uses the credentials # to authenticate to the source cluster. scram_configuration: username: # SASL/SCRAM username, example: "shadow-replication-user" password: ${secrets.} # ID of secret containing SASL/SCRAM password scram_mechanism: SCRAM_SHA_256 # SCRAM mechanism: "SCRAM_SHA_256" or "SCRAM_SHA_512" ``` ##### [](#connection-tuning)Connection tuning Adjust connection parameters based on your network characteristics. For example: ```yaml client_options: # Connection and metadata settings connection_timeout_ms: 1000 # Default 1000ms, increase for high-latency networks retry_backoff_ms: 100 # Default 100ms, backoff between connection retries metadata_max_age_ms: 10000 # Default 10000ms, how often to refresh cluster metadata # Fetch request settings fetch_wait_max_ms: 500 # Default 500ms, max time to wait for fetch requests fetch_min_bytes: 5242880 # Default 5MB, minimum bytes to fetch per request fetch_max_bytes: 20971520 # Default 20MB, maximum bytes to fetch per request fetch_partition_max_bytes: 1048576 # Default 1MB, maximum bytes to fetch per partition ``` ## [](#update-an-existing-shadow-link)Update an existing shadow link To modify a shadow link configuration after creation, run: ### Cloud UI 1. At the organization level of the Cloud UI, navigate to **Shadow Link**. 2. Select the shadow link you want to modify, and click **Edit**. 3. Edit the shadow link settings or the shadowing behavior by specifying which content from the source cluster to shadow (topics, ACLs, consumer groups, Schema Registry). You can also enable additional topic properties to be shadowed or disable optional topic properties from being included in the shadowing. 4. Click **Save** to apply changes. ### rpk ```bash rpk shadow update ``` For detailed command options, see [`rpk shadow update`](../../../../reference/rpk/rpk-shadow/rpk-shadow-update/). This opens your default editor to modify the shadow link configuration. Only changed fields are updated on the server. The shadow link name cannot be changed - you must delete and recreate the link to rename it. ### Control Plane API ```bash curl -X PATCH 'https://api.redpanda.com/v1/shadow-links/' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "security_sync_options": { "paused": false } }' ``` This endpoint returns a [long-running operation](../../../api/cloud-byoc-controlplane-api/#lro). For the full API reference, see [Control Plane API reference](/api/doc/cloud-controlplane/operation/operation-shadowlinkservice_updateshadowlink). --- # Page 430: Integrate Redpanda with Iceberg **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg.md --- # Integrate Redpanda with Iceberg --- title: Integrate Redpanda with Iceberg latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/index.adoc description: Generate Iceberg tables for your Redpanda topics for data lakehouse access. page-git-created-date: "2025-04-04" page-git-modified-date: "2025-07-30" --- - [About Iceberg Topics](about-iceberg-topics/) Learn how Redpanda can integrate topics with Apache Iceberg. - [Specify Iceberg Schema](specify-iceberg-schema/) Learn about supported Iceberg modes and how you can integrate schemas with Iceberg topics. - [Use Iceberg Catalogs](use-iceberg-catalogs/) Learn how to access Redpanda topic data stored in Iceberg tables, using table metadata or a catalog integration. - [Integrate with REST Catalogs](rest-catalog/) Integrate Redpanda topics with managed Iceberg REST Catalogs. - [Query Iceberg Topics](query-iceberg-topics/) Query Redpanda topic data stored in Iceberg tables, based on the topic Iceberg mode and schema. - [Migrate to Iceberg Topics](migrate-to-iceberg-topics/) Migrate existing Iceberg integrations to Redpanda Iceberg topics. --- # Page 431: About Iceberg Topics **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/about-iceberg-topics.md --- # About Iceberg Topics --- title: About Iceberg Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/about-iceberg-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/about-iceberg-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/about-iceberg-topics.adoc description: Learn how Redpanda can integrate topics with Apache Iceberg. page-git-created-date: "2025-04-04" page-git-modified-date: "2025-09-23" --- The Apache Iceberg integration for Redpanda allows you to store topic data in the cloud in the Iceberg open table format. This makes your streaming data immediately available in downstream analytical systems, including data warehouses like Snowflake, Databricks, ClickHouse, and Redshift, without setting up and maintaining additional ETL pipelines. You can also integrate your data directly into commonly-used big data processing frameworks, such as Apache Spark and Flink, standardizing and simplifying the consumption of streams as tables in a wide variety of data analytics pipelines. Redpanda supports [version 2](https://iceberg.apache.org/spec/#format-versioning) of the Iceberg table format. ## [](#iceberg-concepts)Iceberg concepts [Apache Iceberg](https://iceberg.apache.org) is an open source format specification for defining structured tables in a data lake. The table format lets you quickly and easily manage, query, and process huge amounts of structured and unstructured data. This is similar to the way you would manage and run SQL queries against relational data in a database or data warehouse. The open format lets you use many different languages, tools, and applications to process the same data in a consistent way, so you can avoid vendor lock-in. This data management system is also known as a _data lakehouse_. In the Iceberg specification, tables consist of the following layers: - **Data layer**: Stores the data in data files. The Iceberg integration currently supports the Parquet file format. Parquet files are column-based and suitable for analytical workloads at scale. They come with compression capabilities that optimize files for object storage. - **Metadata layer**: Stores table metadata separately from data files. The metadata layer allows multiple writers to stage metadata changes and apply updates atomically. It also supports database snapshots, and time travel queries that query the database at a previous point in time. - Manifest files: Track data files and contain metadata about these files, such as record count, partition membership, and file paths. - Manifest list: Tracks all the manifest files belonging to a table, including file paths and upper and lower bounds for partition fields. - Metadata file: Stores metadata about the table, including its schema, partition information, and snapshots. Whenever a change is made to the table, a new metadata file is created and becomes the latest version of the metadata in the catalog. For Iceberg-enabled topics, the manifest files are in JSON format. - **Catalog**: Contains the current metadata pointer for the table. Clients reading and writing data to the table see the same version of the current state of the table. The Iceberg integration supports two [catalog integration](../use-iceberg-catalogs/) types. You can configure Redpanda to catalog files stored in the same object storage bucket or container where the Iceberg data files are located, or you can configure Redpanda to use an [Iceberg REST catalog](https://iceberg.apache.org/terms/#decoupling-using-the-rest-catalog) endpoint to update an externally-managed catalog when there are changes to the Iceberg data and metadata. ![Redpanda’s Iceberg integration](../../../shared/_images/iceberg-integration-optimized.png) When you enable the Iceberg integration for a Redpanda topic, Redpanda brokers store streaming data in the Iceberg-compatible format in Parquet files in object storage, in addition to the log segments uploaded using Tiered Storage. Storing the streaming data in Iceberg tables in the cloud allows you to derive real-time insights through many compatible data lakehouse, data engineering, and business intelligence [tools](https://iceberg.apache.org/vendors/). ## [](#prerequisites)Prerequisites To enable Iceberg for Redpanda topics, you must have the following: - A running [BYOC](../../../get-started/cluster-types/byoc/) or BYOVPC cluster on Redpanda version 25.1 or later. The Iceberg integration is supported only for BYOC and BYOVPC, and the cluster properties to configure Iceberg are available with v25.1. - rpk: See [Install or Update rpk](../../rpk/rpk-install/). - Familiarity with the Redpanda Cloud API. You must [authenticate](/api/doc/cloud-controlplane/authentication) to the Cloud API and use the Control Plane API to update your cluster configuration. ## [](#limitations)Limitations - It is not possible to append topic data to an existing Iceberg table that is not created by Redpanda. - If you enable the Iceberg integration on an existing Redpanda topic, Redpanda does not backfill the generated Iceberg table with topic data. - JSON schemas are supported starting with Redpanda version 25.2. ## [](#enable-iceberg-integration)Enable Iceberg integration To create an Iceberg table for a Redpanda topic, you must set the cluster configuration property `[iceberg_enabled](../../../reference/properties/cluster-properties/#iceberg_enabled)` to `true`, and also configure the topic property `redpanda.iceberg.mode`. You can choose to provide a schema if you need the Iceberg table to be structured with defined columns. 1. Set the `iceberg_enabled` configuration option on your cluster to `true`. #### rpk ```bash rpk cloud login rpk profile create --from-cloud rpk cluster config set iceberg_enabled true ``` #### Cloud API ```bash # Store your cluster ID in a variable export RP_CLUSTER_ID= # Retrieve a Redpanda Cloud access token export RP_CLOUD_TOKEN=`curl -X POST "https://auth.prd.cloud.redpanda.com/oauth/token" \ -H "content-type: application/x-www-form-urlencoded" \ -d "grant_type=client_credentials" \ -d "client_id=" \ -d "client_secret="` # Update cluster configuration to enable Iceberg topics curl -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" -X PATCH \ "https://api.cloud.redpanda.com/v1/clusters/${RP_CLUSTER_ID}" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"cluster_configuration":{"custom_properties": {"iceberg_enabled":true}}}' ``` The [`PATCH /clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request returns the ID of a long-running operation. The operation may take up to ten minutes to complete. You can check the status of the operation by polling the [`GET /operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation) endpoint. 2. (Optional) Create a new topic. ```bash rpk topic create ``` ```bash TOPIC STATUS OK ``` 3. Configure `redpanda.iceberg.mode` for the topic. You can choose one of the following [Iceberg modes](../specify-iceberg-schema/): - `key_value`: Creates an Iceberg table using a simple schema, consisting of two columns, one for the record metadata including the key, and another binary column for the record’s value. - `value_schema_id_prefix`: Creates an Iceberg table whose structure matches the Redpanda schema for this topic, with columns corresponding to each field. You must register a schema in the Schema Registry (see next step), and producers must write to the topic using the Schema Registry wire format. - `value_schema_latest`: Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. - `disabled` (default): Disables writing to an Iceberg table for this topic. ```bash rpk topic alter-config --set redpanda.iceberg.mode= ``` ```bash TOPIC STATUS OK ``` 4. Register a schema for the topic. This step is required for the `value_schema_id_prefix` and `value_schema_latest` modes. ```bash rpk registry schema create --schema --type ``` ```bash SUBJECT VERSION ID TYPE 1 1 PROTOBUF ``` ### [](#access-iceberg-data)Access Iceberg data To query the Iceberg table, you need access to the object storage bucket or container where the Iceberg data is stored. For BYOC clusters, the bucket name and table location are as follows: | Cloud provider | Bucket or container name | Iceberg table location | | --- | --- | --- | | AWS | redpanda-cloud-storage- | redpanda-iceberg-catalog/redpanda/ | | Azure | The Redpanda cluster ID is also used as the container name (ID) and the storage account ID. | | GCP | redpanda-cloud-storage- | For BYOVPC clusters, the bucket name is the name you chose when you created the object storage bucket as a customer-managed resource. For Azure clusters, you must add the public IP addresses or ranges from the REST catalog service, or other clients requiring access to the Iceberg data, to your cluster’s allow list. Alternatively, add subnet IDs to the allow list if the requests originate from the same Azure region. For example, to add subnet IDs to the allow list through the Control Plane API [`PATCH /v1/clusters/`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint, run: ```bash curl -X PATCH https://api.cloud.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d @- << EOF { "cloud_storage": { "azure": { "allowed_subnet_ids": [ ] } } } EOF ``` As you produce records to the topic, the data also becomes available in object storage for Iceberg-compatible clients to consume. You can use the same analytical tools to [read the Iceberg topic data](../query-iceberg-topics/) in a data lake as you would for a relational database. See also: [Schema types translation](../specify-iceberg-schema/#schema-types-translation). ### [](#iceberg-data-retention)Iceberg data retention Data in an Iceberg-enabled topic is consumable from Kafka based on the configured [topic retention policy](../../../develop/topics/create-topic/). Conversely, data written to Iceberg remains queryable as Iceberg tables indefinitely. The Iceberg table persists unless you: - Delete the Redpanda topic associated with the Iceberg table. This is the default behavior set by the `[iceberg_delete](../../../reference/properties/cluster-properties/#iceberg_delete)` cluster property and the `redpanda.iceberg.delete` topic property. If you set this property to `false`, the Iceberg table remains even after you delete the topic. - Explicitly delete data from the Iceberg table using a query engine. - Disable the Iceberg integration for the topic and delete the Parquet files in object storage. The DLQ table (`~dlq`) follows the same persistence rules as the main Iceberg table. ## [](#schema-evolution)Schema evolution Redpanda supports schema evolution in accordance with the [Iceberg specification](https://iceberg.apache.org/spec/#schema-evolution). Permitted schema evolutions include reordering fields and promoting field types. When you update the schema in Schema Registry, Redpanda automatically updates the Iceberg table schema to match the new schema. For example, if you produce records to a topic `demo-topic` with the following Avro schema: schema\_1.avsc ```avro { "type": "record", "name": "ClickEvent", "fields": [ { "name": "user_id", "type": "int" }, { "name": "event_type", "type": "string" } ] } ``` ```bash rpk registry schema create demo-topic-value --schema schema_1.avsc echo '{"user_id":23, "event_type":"BUTTON_CLICK"}' | rpk topic produce demo-topic --format='%v\n' --schema-id=topic ``` Then, you update the schema to add a new field `ts`, and produce records with the updated schema: schema\_2.avsc ```avro { "type": "record", "name": "ClickEvent", "fields": [ { "name": "user_id", "type": "int" }, { "name": "event_type", "type": "string" }, { "name": "ts", "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } ], "default": null # Default value for the new field } ] } ``` The `ts` field can be either null or a long representing epoch milliseconds. The default value is null. ```bash rpk registry schema create demo-topic-value --schema schema_2.avsc echo '{"user_id":858, "event_type":"BUTTON_CLICK", "ts":1737998723230}' | rpk topic produce demo-topic --format='%v\n' --schema-id=topic ``` Querying the Iceberg table for `demo-topic` includes the new column `ts`: ```bash +---------+--------------+--------------------------+ | user_id | event_type | ts | +---------+--------------+--------------------------+ | 858 | BUTTON_CLICK | 2025-02-26T20:05:23.230Z | | 23 | BUTTON_CLICK | NULL | +---------+--------------+--------------------------+ ``` ## [](#troubleshoot-errors)Troubleshoot errors If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: - Redpanda cannot find the embedded schema ID in the Schema Registry. - Redpanda fails to translate one or more schema data types to an Iceberg type. - In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte. The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record’s value. > 📝 **NOTE** > > Topic property misconfiguration, such as [overriding the default behavior of `value_schema_latest` mode](../specify-iceberg-schema/#override-value-schema-latest-default) but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration. ### [](#inspect-dlq-table)Inspect DLQ table You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream. The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID: ```bash echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' ``` Querying the DLQ table returns the record that was not translated: ```sql SELECT value FROM ."ClickEvent~dlq"; -- Fully qualified table name ``` ```bash +-------------------------------------------------+ | value | +-------------------------------------------------+ | 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c | | 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 | | 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 | | 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a | | 32 33 3a 35 39 2e 33 38 30 5a 22 7d | +-------------------------------------------------+ ``` The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema. ### [](#reprocess-dlq-records)Reprocess DLQ records You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you: ClickHouse SQL example to reprocess DLQ record ```sql SELECT CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id, jsonExtractString(json, 'event_type') AS event_type, jsonExtractString(json, 'ts') AS ts FROM ( SELECT CAST(value AS String) AS json FROM .`ClickEvent~dlq` -- Ensure that the table name is properly parsed ); ``` ```bash +---------+--------------+--------------------------+ | user_id | event_type | ts | +---------+--------------+--------------------------+ | 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | +---------+--------------+--------------------------+ ``` You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records. ### [](#drop-invalid-records)Drop invalid records To disable the default behavior and drop an invalid record, set the `redpanda.iceberg.invalid.record.action` topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. ## [](#performance-considerations)Performance considerations When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster’s CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. ### [](#use-custom-partitioning)Use custom partitioning To improve query performance, consider implementing custom [partitioning](https://iceberg.apache.org/docs/nightly/partitioning/) for the Iceberg topic. Use the `redpanda.iceberg.partition.spec` topic property to define the partitioning scheme: ```bash # Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" ``` Valid `` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. For example: - To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`. - To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`. - To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`. To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the [Apache Iceberg documentation](https://iceberg.apache.org/spec/#partitioning). > 💡 **TIP** > > - Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning. > > - If you must partition based on columns with high cardinality, for example timestamps, use Iceberg’s available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. ### [](#avoid-high-column-count)Avoid high column count A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. ## [](#next-steps)Next steps - [Use Iceberg Catalogs](../use-iceberg-catalogs/) - [Migrate existing Iceberg integrations to Iceberg Topics](../migrate-to-iceberg-topics/) ## [](#suggested-reading)Suggested reading - [Understanding Apache Kafka Schema Registry](https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka) --- # Page 432: Query Iceberg Topics using AWS Glue **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/iceberg-topics-aws-glue.md --- # Query Iceberg Topics using AWS Glue --- title: Query Iceberg Topics using AWS Glue page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/iceberg-topics-aws-glue page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/iceberg-topics-aws-glue.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc description: Add Redpanda topics as Iceberg tables that you can access through the AWS Glue Data Catalog. # Beta release status page-beta: "true" page-git-created-date: "2025-08-05" page-git-modified-date: "2025-08-05" release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. --- beta This guide walks you through querying Redpanda topics as Iceberg tables stored in AWS S3, using a catalog integration with [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro). For general information about Iceberg catalog integrations in Redpanda, see [Use Iceberg Catalogs](../use-iceberg-catalogs/). ## [](#prerequisites)Prerequisites - An AWS account with access to [AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html). - AWS Glue Data Catalog must be in the same AWS account and region as the cluster. - Redpanda version 25.2 or later. - [`rpk`](../../rpk/rpk-install/) installed or updated to the latest version. - You can also use the Redpanda Cloud API to [reference secrets in your cluster configuration](../../cluster-maintenance/config-cluster/#set-cluster-configuration-properties). - Admin permissions to create IAM policies and roles in AWS. ## [](#limitations)Limitations ### [](#lowercase-field-names-required)Lowercase field names required Use only lowercase field names. AWS Glue converts all table column names to lowercase, and Redpanda requires exact column name matches to manage schemas. Using uppercase letters prevents Redpanda from finding matching columns, which breaks schema management. ### [](#nested-partition-spec-support)Nested partition spec support AWS Glue does not support partitioning on nested fields. If Redpanda detects that the default partitioning `(hour(redpanda.timestamp))` based on the record metadata is in use, it will instead apply an empty partition spec `()`, which means the table will not be partitioned. To use partitioning, you must implement custom partitioning using your own partition columns (that is, columns that are not nested). > 📝 **NOTE** > > In Redpanda versions 25.2.1 and earlier, an empty partition spec `()` can cause a known issue that prevents certain engines like Amazon Redshift from successfully querying the table. To resolve this issue, specify custom partitioning, or upgrade Redpanda to versions 25.2.2 or later. ### [](#manual-deletion-of-iceberg-tables)Manual deletion of Iceberg tables The AWS Glue catalog integration does not support automatic deletion of Iceberg tables from Redpanda. To manually delete Iceberg tables in AWS Glue, you must either: - Set the cluster property `[iceberg_delete](../../../reference/properties/cluster-properties/#iceberg_delete)` to `false` when you configure the catalog integration. - Override the cluster property `iceberg_delete` by setting the topic property `redpanda.iceberg.delete` to `false` for the topic you want to delete. When `iceberg_delete` or the topic override `redpanda.iceberg.delete` is set to `false`, you can delete the Redpanda topic, and then delete the table in AWS Glue and the Iceberg data and metadata files in the S3 bucket. If you plan to re-create the topic after deleting it, you must delete the table data entirely before re-creating the topic. ## [](#authorize-access-to-aws-glue)Authorize access to AWS Glue For BYOC clusters created in March 2026 or later, the required AWS Glue IAM policy is automatically provisioned and attached to the cluster’s IAM role when Iceberg is enabled. You don’t need to manually create IAM policies or roles for Glue access. For clusters created before March 2026, you must re-run `rpk byoc apply` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the cluster’s IAM role with the necessary Glue permissions. ## [](#configure-authentication-and-credentials)Configure authentication and credentials You can configure credentials for the AWS Glue Data Catalog integration in either of the following ways: - Allow Redpanda to use the same object storage credential properties already configured for S3. This is the recommended approach, especially in BYOC deployments where the cluster’s existing AWS credentials already include the necessary Glue permissions. For an example cluster configuration that uses the same IAM credentials for both S3 and AWS Glue, see the **Use cluster’s IAM credentials** tab in the [next section](#update-cluster-configuration). - If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication: - `[iceberg_rest_catalog_credentials_source](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_credentials_source)`. To use the cluster’s IAM role, set the property to `aws_instance_metadata`. To use static credentials, set to `config_file`. - `[iceberg_rest_catalog_aws_access_key](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_aws_access_key)` (static credentials only) - `[iceberg_rest_catalog_aws_secret_key](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_aws_secret_key)` (static credentials only), added as a secret value (see the [next section](#update-cluster-configuration) for details) - `[iceberg_rest_catalog_aws_region](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_aws_region)` For an example cluster configuration that uses separate access keys for AWS Glue, see the **Use static credentials (override IAM)** tab in the [next section](#update-cluster-configuration). ## [](#update-cluster-configuration)Update cluster configuration To configure your Redpanda cluster to enable Iceberg on a topic and integrate with the AWS Glue Data Catalog: 1. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. Use `rpk` as shown in the following examples, or [use the Cloud API](../../cluster-maintenance/config-cluster/#set-cluster-configuration-properties) to update these cluster properties. The update might take several minutes to complete. ### Use cluster’s IAM credentials ```bash # Glue requires Redpanda Iceberg tables to be manually deleted # so iceberg_delete is set to false. rpk cloud login rpk profile create --from-cloud rpk cluster config set \ iceberg_enabled=true \ iceberg_delete=false \ iceberg_catalog_type=rest \ iceberg_rest_catalog_endpoint=https://glue..amazonaws.com/iceberg \ iceberg_rest_catalog_authentication_mode=aws_sigv4 \ iceberg_rest_catalog_credentials_source=aws_instance_metadata \ iceberg_rest_catalog_aws_region= \ iceberg_rest_catalog_base_location=s3:/// ``` ### Use static credentials (override IAM) ```bash # Glue requires Redpanda Iceberg tables to be manually deleted # so iceberg_delete is set to false. rpk cluster config set \ iceberg_enabled=true \ iceberg_delete=false \ iceberg_catalog_type=rest \ iceberg_rest_catalog_endpoint=https://glue..amazonaws.com/iceberg \ iceberg_rest_catalog_authentication_mode=aws_sigv4 \ iceberg_rest_catalog_credentials_source=config_file \ iceberg_rest_catalog_aws_region= \ iceberg_rest_catalog_aws_access_key= \ iceberg_rest_catalog_aws_secret_key='${secrets.}' \ iceberg_rest_catalog_base_location=s3:/// ``` Use your own values for the following placeholders: - ``: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in your `[iceberg_rest_catalog_aws_region](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_aws_region)` property. - `` and ``: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3:///iceberg`. - Bucket name: For BYOC clusters, the bucket name is `redpanda-cloud-storage-`. For BYOVPC clusters, use the name of the object storage bucket you created as a [customer-managed resource](../../../get-started/cluster-types/byoc/aws/vpc-byo-aws/#configure-the-redpanda-network-and-cluster). This must be the same bucket used for your cluster’s object storage. You cannot specify a different bucket for Iceberg data. - Warehouse: This is a name you choose as the logical name (such as `iceberg`) for the warehouse represented by all Redpanda Iceberg topic data in the cluster. As a security best practice, do not use the bucket root for the base location. Always specify a subfolder to avoid interfering with the rest of your cluster’s data in object storage. - `` (static credentials only): The AWS access key ID for your Glue service account. - `` (static credentials only): The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first [store the secret value](../use-iceberg-catalogs/#store-a-secret-for-rest-catalog-authentication). ```bash Successfully updated configuration. New configuration version is 2. ``` 2. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following examples show how to use [`rpk`](../../rpk/rpk-install/) to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to `key_value`. The `key_value` mode creates a two-column Iceberg table for the topic, with one column for the record metadata including the key, and another binary column for the record’s value. See [Specify Iceberg Schema](../specify-iceberg-schema/) for more details on Iceberg modes. Create a new topic and set `redpanda.iceberg.mode`: ```bash rpk topic create --topic-config=redpanda.iceberg.mode=key_value ``` Set `redpanda.iceberg.mode` for an existing topic: ```bash rpk topic alter-config --set redpanda.iceberg.mode=key_value ``` 3. Produce to the topic. For example, ```bash echo "hello world\nfoo bar\nbaz qux" | rpk topic produce --format='%k %v\n' ``` You should see the topic as a table with data in AWS Glue Data Catalog. The data may take some time to become visible, depending on your `[iceberg_target_lag_ms](../../../reference/properties/cluster-properties/#iceberg_target_lag_ms)` setting. 1. In AWS Glue Studio, go to Databases. 2. Select the `redpanda` database. The `redpanda` database and the table within are automatically added for you. The table name is the same as the topic name. ## [](#query-iceberg-table)Query Iceberg table You can query the Iceberg table using different engines, such as Amazon Athena, PyIceberg, or Apache Spark. To query the table or view the table data in AWS Glue, ensure that your account has the necessary permissions to access the catalog, database, and table. To query the table in Amazon Athena: 1. On the list of tables in AWS Glue Studio, click "Table data" under the **View data** column. 2. Click "Proceed" to be redirected to the Athena query editor. 3. In the query editor, select AwsDataCatalog as the data source, and select the `redpanda` database. 4. The SQL query editor should be pre-populated with a query that selects 10 rows from the Iceberg table. Run the query to see a preview of the table data. ```sql SELECT * FROM "AwsDataCatalog"."redpanda"."" limit 10; ``` Your query results should look like the following: ```sql +-----------------------------------------------------+----------------+ | redpanda | value | +-----------------------------------------------------+----------------+ | {partition=0, offset=0, timestamp=2025-07-21 | 77 6f 72 6c 64 | | 18:11:25.070000, headers=null, key=[B@1900af31} | | +-----------------------------------------------------+----------------+ ``` ## [](#suggested-reading)Suggested reading - [Query Iceberg Topics](../query-iceberg-topics/) --- # Page 433: Query Iceberg Topics using Databricks and Unity Catalog **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/iceberg-topics-databricks-unity.md --- # Query Iceberg Topics using Databricks and Unity Catalog --- title: Query Iceberg Topics using Databricks and Unity Catalog latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/iceberg-topics-databricks-unity page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/iceberg-topics-databricks-unity.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc description: Add Redpanda topics as Iceberg tables that you can query in Databricks managed by Unity Catalog. page-git-created-date: "2025-06-12" page-git-modified-date: "2025-07-30" --- This guide walks you through querying Redpanda topics as managed Iceberg tables in Databricks, with AWS S3 as object storage and a catalog integration using [Unity Catalog](https://docs.databricks.com/aws/en/data-governance/unity-catalog). For general information about Iceberg catalog integrations in Redpanda, see [Use Iceberg Catalogs](../use-iceberg-catalogs/). ## [](#prerequisites)Prerequisites - A Databricks workspace in the same region as your S3 bucket. See the [list of supported AWS regions](https://docs.databricks.com/aws/en/resources/supported-regions#supported-regions-list). - Unity Catalog enabled in your Databricks workspace. See the [Databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started) to set up Unity Catalog for your workspace. - [Predictive optimization](https://docs.databricks.com/aws/en/optimizations/predictive-optimization#enable-predictive-optimization) enabled for Unity Catalog. > 📝 **NOTE** > > When you enable predictive optimization, you must also set the following configurations in your Databricks workspace. These configurations allow predictive optimization to automatically generate column statistics and carry out background compaction for Iceberg tables: > > ```sql > SET spark.databricks.delta.liquid.lazyClustering.backfillStats=true; > SET spark.databricks.delta.computeStats.autoConflictResolution=true; > > /* > After setting these configurations, you can optionally run OPTIMIZE to > immediately trigger compaction and liquid clustering, or let predictive > optimization handle it automatically later. > */ > OPTIMIZE ``.redpanda.``; > ``` - [External data access](https://docs.databricks.com/aws/en/external-access/admin) enabled in your metastore. - Workspace admin privileges to complete the steps to create a Unity Catalog storage credential and external location that connects your cluster’s Tiered Storage bucket to Databricks. ## [](#limitations)Limitations The following data types are not currently supported for managed Iceberg tables: | Iceberg type | Equivalent Avro type | | --- | --- | | uuid | uuid | | fixed(L) | fixed | | time | time-millis, time-micros | There are no limitations for Protobuf types. ## [](#create-a-unity-catalog-storage-credential)Create a Unity Catalog storage credential A storage credential is a Databricks object that controls access to external object storage, in this case S3. You associate a storage credential with an AWS IAM role that defines what actions Unity Catalog can perform in the S3 bucket. Follow the steps in the [Databricks documentation](https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials) to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you should have the following configured in AWS and Databricks: - A self-assuming IAM role, meaning you’ve defined the role trust policy so the role trusts itself. - Two IAM policies attached to the IAM role. The first policy grants Unity Catalog read and write access to the bucket. The second policy allows Unity Catalog to configure file events. - A storage credential in Databricks associated with the IAM role, using the role’s ARN. You also use the storage credential’s external ID in the role’s trust relationship policy to make the role self-assuming. ## [](#create-a-unity-catalog-external-location)Create a Unity Catalog external location The external location stores the Unity Catalog-managed Iceberg metadata, and the Iceberg data written by Redpanda. You must use the same bucket configured for [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage) for your Redpanda cluster. For BYOC clusters, the bucket name is `redpanda-cloud-storage-`, where `` is the ID of your Redpanda cluster. For BYOVPC clusters, the bucket name is the name you chose when you created the object storage bucket as a customer-managed resource. Follow the steps in the [Databricks documentation](https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/external-locations) to **manually** create an external location. You can create the external location in the Catalog Explorer or with SQL. You must create the external location manually because the location needs to be associated with the existing Tiered Storage bucket URL, `s3://`. ## [](#create-a-new-catalog)Create a new catalog Follow the steps in the Databricks documentation to [create a standard catalog](https://docs.databricks.com/aws/en/catalogs/create-catalog). When you create the catalog, specify the external location you created in the previous step as the storage location. You use the catalog name when you set the Iceberg cluster configuration properties in Redpanda in a later step. ## [](#authorize-access-to-unity-catalog)Authorize access to Unity Catalog Redpanda recommends using OAuth for service principals to grant Redpanda access to Unity Catalog. 1. Follow the steps in the [Databricks documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a service principal, and then generate an OAuth secret. You use the client ID and secret to set Iceberg cluster configuration properties in Redpanda in the next step. 2. Open your catalog in the Catalog Explorer, then click **Permissions**. 3. Click **Grant** to grant the service principal the following permissions on the catalog: - `ALL PRIVILEGES` - `EXTERNAL USE SCHEMA` The Iceberg integration for Redpanda also supports using bearer tokens. ## [](#update-cluster-configuration)Update cluster configuration To configure your Redpanda cluster to enable Iceberg on a topic and integrate with Unity Catalog: 1. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. Use `rpk` like in the following example, or use the Cloud API to [update these cluster properties](../../cluster-maintenance/config-cluster/#set-cluster-configuration-properties). The update might take several minutes to complete. To reference a secret in a cluster property, you must first [store the secret value](../use-iceberg-catalogs/#store-a-secret-for-rest-catalog-authentication). ```bash rpk cloud login rpk profile create --from-cloud rpk cluster config set \ iceberg_enabled=true \ iceberg_catalog_type=rest \ iceberg_rest_catalog_endpoint=https:///api/2.1/unity-catalog/iceberg-rest \ iceberg_rest_catalog_authentication_mode=oauth2 \ iceberg_rest_catalog_oauth2_server_uri=https:///oidc/v1/token \ iceberg_rest_catalog_oauth2_scope=all-apis \ iceberg_rest_catalog_client_id= \ iceberg_rest_catalog_client_secret='${secrets.}' \ iceberg_rest_catalog_warehouse= \ iceberg_disable_snapshot_tagging=true ``` Use your own values for the following placeholders: - ``: The URL of your [Databricks workspace instance](https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-instance-names-urls-and-ids); for example, `cust-success.cloud.databricks.com`. - ``: The client ID of the service principal you created in an earlier step. - ``: The name of the client secret of the service principal you created in an earlier step. - ``: The name of your catalog in Unity Catalog. ```bash Successfully updated configuration. New configuration version is 2. ``` 2. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following examples show how to use [`rpk`](../../rpk/rpk-install/) to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to `key_value`. The `key_value` mode creates an Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record’s value. See [Specify Iceberg Schema](../specify-iceberg-schema/) for more details on Iceberg modes. Create a new topic and set `redpanda.iceberg.mode`: ```bash rpk topic create --topic-config=redpanda.iceberg.mode=key_value ``` Set `redpanda.iceberg.mode` for an existing topic: ```bash rpk topic alter-config --set redpanda.iceberg.mode=key_value ``` 3. Produce to the topic. For example, ```bash echo "hello world\nfoo bar\nbaz qux" | rpk topic produce --format='%k %v\n' ``` You should see the topic as a table with data in Unity Catalog. The data may take some time to become visible, depending on your `[iceberg_target_lag_ms](../../../reference/properties/cluster-properties/#iceberg_target_lag_ms)` setting. 1. In Catalog Explorer, open your catalog. You should see a `redpanda` schema, in addition to `default` and `information_schema`. 2. The schema and the table residing within it are automatically added for you. The table name is the same as the topic name. ## [](#query-iceberg-table-using-databricks-sql)Query Iceberg table using Databricks SQL You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table. Review the Databricks documentation on [granting permissions to objects](https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore) and [Unity Catalog privileges](https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges) for details. The following example shows how to query the Iceberg table using SQL in Databricks SQL. 1. In the Databricks console, open **SQL Editor**. 2. In the query editor, run: ```sql -- Ensure that the catalog and table name are correctly parsed in case they contain special characters SELECT * FROM ``.redpanda.`` LIMIT 10; ``` Your query results should look like the following: ```sql -- Example for redpanda.iceberg.mode=key_value with 1 record produced to topic +----------------------------------------------------------------------+------------+ | redpanda | value | +----------------------------------------------------------------------+------------+ | {"partition":0,"offset":"0","timestamp":"2025-04-02T18:57:11.127Z", | 776f726c64 | | "headers":null,"key":"68656c6c6f"} | | +----------------------------------------------------------------------+------------+ ``` ## [](#suggested-reading)Suggested reading - [Query Iceberg Topics](../query-iceberg-topics/) --- # Page 434: Migrate to Iceberg Topics **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/migrate-to-iceberg-topics.md --- # Migrate to Iceberg Topics --- title: Migrate to Iceberg Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/migrate-to-iceberg-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/migrate-to-iceberg-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/migrate-to-iceberg-topics.adoc description: Migrate existing Iceberg integrations to Redpanda Iceberg topics. page-topic-type: how-to learning-objective-1: Compare external Iceberg integrations with Iceberg Topics architectures learning-objective-2: Implement data merge strategies using SQL patterns learning-objective-3: Execute validation checks and perform cutover procedures page-git-created-date: "2026-02-28" page-git-modified-date: "2026-02-28" --- Migrate existing Iceberg pipelines to Redpanda Iceberg topics to simplify your architecture and reduce operational overhead. After reading this page, you will be able to: - Compare external Iceberg integrations with Iceberg Topics architectures - Implement data merge strategies using SQL patterns - Execute validation checks and perform cutover procedures ## [](#why-migrate-to-iceberg-topics)Why migrate to Iceberg Topics Redpanda’s built-in Iceberg-enabled topics offer a simpler alternative to external Iceberg integrations for writing streaming data to Iceberg tables. > 📝 **NOTE** > > This page focuses on migrating from Kafka Connect Iceberg Sink. The migration patterns and SQL examples can be adapted for other Iceberg sources such as Apache Flink or Spark. ### [](#kafka-connect-iceberg-sink-comparison)Kafka Connect Iceberg Sink comparison The following table compares Kafka Connect Iceberg Sink with Redpanda Iceberg Topics: | Aspect | Kafka Connect Iceberg Sink | Iceberg Topics | | --- | --- | --- | | Infrastructure | Requires external Kafka Connect cluster | Built into Redpanda brokers | | Dependencies | Separate service to manage | No external dependencies | | Setup time | Medium (deploy connector) | Fast (enable topic property and post schema) | ## [](#prerequisites)Prerequisites To migrate from an existing Iceberg integration to Iceberg Topics, you must have: - [Iceberg Topics](../about-iceberg-topics/) enabled on your Redpanda cluster. - Understanding of your current schema format (Avro, Protobuf, or JSON Schema). - For Kafka Connect migrations, knowledge of your Kafka Connect configuration, especially if using `iceberg.tables.route-field` for multi-table routing. - If migrating multi-table fan-out patterns, [data transforms](../../../develop/data-transforms/how-transforms-work/) enabled on your cluster. - Access to both source and target (Iceberg Topics) tables in your query engine. - Query engine access (Snowflake, Databricks, ClickHouse, or Spark) for data merging. ## [](#migration-steps)Migration steps Redpanda recommends following a phased approach to ensure data consistency and minimize risk: 1. Enable Iceberg on target topics and verify new data flows. 2. Run both systems concurrently during transition. 3. Choose a strategy to combine historical and new data. 4. Verify data completeness and accuracy. 5. Disable the external Iceberg integration. > ❗ **IMPORTANT** > > Iceberg Topics cannot append to existing Iceberg tables that are not created by Redpanda. You must create new Iceberg tables and merge historical data separately. ### [](#enable-iceberg-topics)Enable Iceberg Topics For simple migrations (one topic mapping to one Iceberg table), enable the Iceberg integration for your Redpanda topics. 1. Set the `iceberg_enabled` configuration property on your cluster to `true`: ###### rpk ```bash rpk cloud login rpk profile create --from-cloud rpk cluster config set iceberg_enabled true ``` ###### Cloud API ```bash # Store your cluster ID in a variable export RP_CLUSTER_ID= # Retrieve a Redpanda Cloud access token export RP_CLOUD_TOKEN=$(curl -X POST "https://auth.prd.cloud.redpanda.com/oauth/token" \ -H "content-type: application/x-www-form-urlencoded" \ -d "grant_type=client_credentials" \ -d "client_id=" \ -d "client_secret=") # Update cluster configuration to enable Iceberg topics curl -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" -X PATCH \ "https://api.cloud.redpanda.com/v1/clusters/${RP_CLUSTER_ID}" \ -H 'accept: application/json' \ -H 'content-type: application/json' \ -d '{"cluster_configuration":{"custom_properties": {"iceberg_enabled":true}}}' ``` 2. Configure the `redpanda.iceberg.mode` property for the topic: ```bash rpk topic alter-config --set redpanda.iceberg.mode= ``` Choose the mode based on your message format and schema configuration. For Kafka Connect migrations, use this mapping: | Kafka Connect Converter | Recommended Iceberg Mode | | --- | --- | | io.confluent.connect.avro.AvroConverter | value_schema_id_prefix (messages already use Schema Registry wire format) | | io.confluent.connect.protobuf.ProtobufConverter | value_schema_id_prefix (messages already use Schema Registry wire format) | | org.apache.kafka.connect.json.JsonConverter with schemas | value_schema_latest (Schema Registry resolves schema automatically) | | org.apache.kafka.connect.json.JsonConverter with embedded schemas | key_value (schema included with each message) | See [Specify Iceberg Schema](../specify-iceberg-schema/) to learn more about the different Iceberg modes. 3. If using `value_schema_id_prefix` or `value_schema_latest` modes, register a schema for the topic: ```bash rpk registry schema create -value --schema --type ``` > ❗ **IMPORTANT** > > If using the `value_schema_id_prefix` mode, schema subjects must use the `-value` [naming convention](../../schema-reg/schema-id-validation/#set-subject-name-strategy-per-topic) (TopicNameStrategy). Note the schema ID returned, in case you need it for troubleshooting. 4. Verify that new records are being written to the Iceberg table: - Check that data appears in your query engine. - Validate that the schema translation is correct. - Confirm record counts are increasing. #### [](#multi-table-fan-out-pattern)Multi-table fan-out pattern If your existing integration routes records to multiple Iceberg tables based on a field value (for example, Kafka Connect’s `iceberg.tables.route-field` property), you need to implement equivalent routing logic. You create separate Iceberg-enabled topics for each target table, and Redpanda automatically creates corresponding Iceberg tables. Use either of the following approaches to route records to the correct topic: ##### [](#option-1-data-transforms-with-separate-topics-recommended)Option 1: Data transforms with separate topics (recommended) Use a data transform to read the routing field from each message and write records to separate Iceberg-enabled topics. This approach keeps routing logic within Redpanda and avoids external dependencies. When using Iceberg modes that require schema validation, the transform can register schemas dynamically and encode messages with the appropriate format. 1. Enable data transforms on your cluster: ```bash rpk cluster config set data_transforms_enabled true ``` 2. Create output topics and enable Iceberg with Schema Registry validation: ```bash rpk topic create rpk topic alter-config --set redpanda.iceberg.mode=value_schema_id_prefix rpk topic alter-config --set redpanda.iceberg.mode=value_schema_id_prefix rpk topic alter-config --set redpanda.iceberg.mode=value_schema_id_prefix ``` 3. Implement a transform function that: 1. Reads the routing field from each input message. 2. If using Schema Registry validation, registers schemas dynamically and encodes messages with the appropriate format. 3. Writes to a specific output topic based on the routing field. 4. Deploy the transform, specifying multiple output topics: ```bash rpk transform deploy \ --file transform.wasm \ --name \ --input-topic \ --output-topic \ --output-topic \ --output-topic ``` 5. Validate the fanout by checking that each output topic receives the correct records. For a complete implementation example with dynamic schema registration, see [Multi-topic fan-out with Schema Registry](../../../develop/data-transforms/build/#multi-topic-fanout). The example demonstrates Schema Registry wire format encoding for use with `value_schema_id_prefix` mode. ##### [](#option-2-external-stream-processor)Option 2: External stream processor Use an external stream processor for complex routing logic: 1. Use a stream processor ([Redpanda Connect](../../../develop/connect/about/) or Flink) to split records. 2. Write to separate Iceberg-enabled topics. This approach is more complex but offers more flexibility for advanced routing requirements not supported by data transforms. ### [](#validate-schema-registry-integration)Validate Schema Registry integration If using [`value_schema_id_prefix`](../specify-iceberg-schema/#value_schema_id_prefix) mode, verify that messages use the Schema Registry [wire format](../../schema-reg/schema-reg-overview/#wire-format). ```bash rpk topic consume --num=1 --format='%v\n' | xxd | head -n 1 ``` If the first byte is not `00` (magic byte), you must configure your producer to use the wire format. The `value_schema_id_prefix` mode also requires that schema subjects follow the TopicNameStrategy: `-value`. Verify your schemas use the correct naming: ```bash rpk registry schema list ``` #### [](#verify-no-records-in-dlq)Verify no records in DLQ Check that no records failed validation and were written to the dead-letter queue. If records are present, see [Records in DLQ table](#records-in-dlq-table) for resolution steps. ```sql SELECT COUNT(*) FROM ."~dlq"; ``` ### [](#run-systems-in-parallel)Run systems in parallel Keep your existing Iceberg integration running while Iceberg Topics is enabled. This provides a safety net during the transition period: - New data flows to both the source tables and new Iceberg Topics tables. - You can validate data consistency between both systems. - You have a fallback option if issues arise. Run a query to compare record counts between systems: ```sql -- Source table SELECT COUNT(*) AS source_count FROM .; -- Iceberg Topics table SELECT COUNT(*) AS iceberg_topics_count FROM .; ``` Record counts should increase at similar rates, accounting for the time Iceberg Topics was enabled. Check for DLQ records (see [Records in DLQ table](#records-in-dlq-table)). Monitor Iceberg topic metrics to validate that data is flowing at expected rates: - `redpanda_iceberg_translation_parquet_rows_added`: Track rows written to Iceberg tables (compare with source write rate) - `redpanda_iceberg_translation_translations_finished`: Number of completed translation executions - `redpanda_iceberg_translation_invalid_records`: Records that failed validation - `redpanda_iceberg_translation_dlq_files_created`: Dead-letter queue activity - `redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`: Failed table commits to catalog If using data transforms for multi-table fanout, also monitor: - `redpanda_transform_processor_lag`: Records pending processing in transform input topic For a complete list of Iceberg metrics, see the [Iceberg metrics reference](../../../reference/public-metrics-reference/#iceberg-metrics). > 💡 **TIP** > > Run both systems for at least 24-48 hours to ensure stability before proceeding with data merge. ### [](#merge-historical-data)Merge historical data Choose a strategy to combine your historical data with new Iceberg Topics data. #### [](#option-1-insert-into-pattern-recommended)Option 1: INSERT INTO pattern (recommended) Use this approach to create a unified table with all data, taking into consideration the following: - You want a single table for queries. - You can afford the one-time data copy cost. - You need optimal query performance. This SQL pattern uses partition and offset metadata to identify and copy only records not yet in the target table: ```sql -- Step 1: Find the latest offset per partition in the target (Iceberg Topics) table WITH latest_offsets AS ( SELECT partition, MAX(offset) AS max_offset FROM target_iceberg_topics_table GROUP BY partition ) -- Step 2: Insert records from source table that don't exist in target INSERT INTO target_iceberg_topics_table SELECT s.* FROM source_table AS s LEFT JOIN latest_offsets AS t ON s.partition = t.partition WHERE t.max_offset IS NULL -- Partition not seen before in target OR s.offset > t.max_offset; -- Record is newer than target's latest offset ``` - The `latest_offsets` CTE finds the highest offset in the target table for each partition. - The `LEFT JOIN` ensures you include partitions never seen before in the target (`t.max_offset IS NULL`). - The `WHERE` clause filters to only records with offsets greater than the target’s latest. - This avoids duplicates by using Kafka partition and offset as the deduplication key. This approach may take significant time for large datasets. Consider executing this process during low-query periods. You can also execute on an incremental basis to ease the load on your query engine, for example, by date or partition ranges. #### [](#option-2-view-based-query-federation)Option 2: View-based query federation Use this approach to query both tables without copying data if: - You cannot afford data copy time or cost. - You need immediate access to a unified view. - Query complexity and performance are acceptable with federated queries. - You may consolidate data later. Create a view that queries both tables and deduplicates on the fly: ```sql CREATE VIEW unified_iceberg_view AS WITH latest_offsets AS ( SELECT partition, MAX(offset) AS max_offset FROM target_iceberg_topics_table GROUP BY partition ), historical_data AS ( SELECT s.* FROM source_table AS s LEFT JOIN latest_offsets AS t ON s.partition = t.partition WHERE t.max_offset IS NULL OR s.offset <= t.max_offset -- Only historical records not in target ), new_data AS ( SELECT * FROM target_iceberg_topics_table ) SELECT * FROM historical_data UNION ALL SELECT * FROM new_data; ``` Most Iceberg-compatible query engines support views, including Snowflake, Databricks, ClickHouse, and Spark. ### [](#validate-the-migration)Validate the migration After completing the data merge, verify the migration before cutting over: - Record counts match between source and target: ```sql -- Compare record counts SELECT 'Source' AS table_name, COUNT(*) AS record_count FROM . UNION ALL SELECT 'Target', COUNT(*) FROM .; ``` - All partitions are represented in the target: ```sql -- Check for missing partitions SELECT DISTINCT partition FROM . EXCEPT SELECT DISTINCT partition FROM .; -- Should return no rows ``` - Date ranges cover the full historical period. Compare `MIN(timestamp)` and `MAX(timestamp)` between source and target tables to ensure the target covers the same time range. - No gaps in offset sequences: ```sql -- Check for offset gaps (may indicate missing data) WITH offset_check AS ( SELECT partition, offset, LAG(offset) OVER (PARTITION BY partition ORDER BY offset) AS prev_offset FROM . ) SELECT * FROM offset_check WHERE offset - prev_offset > 1; -- Should return no rows ``` - Sample queries return expected results. Spot check specific records by ID to verify data accuracy. - Schema translation is correct. Run `DESCRIBE` on both tables and verify all fields are present with correct data types. - New records are flowing to Iceberg Topics. Check record count for a recent time window (for example, the last hour). - Query performance is acceptable. - Monitoring and alerts are configured. - No records in DLQ (see [Records in DLQ table](#records-in-dlq-table)). ### [](#troubleshoot-common-migration-issues)Troubleshoot common migration issues #### [](#records-in-dlq-table)Records in DLQ table Iceberg Topics write records that fail validation to a dead-letter queue (DLQ) table. Records may appear in the DLQ due to: - Schema Registry issues. For example, using the wrong schema subject name, or Redpanda cannot find the embedded schema ID in Schema Registry. - When using `value_schema_id_prefix` mode: messages not encoded with Schema Registry wire format. - Incompatible schema changes. For example, changing field types or removing required fields. - Data type translation failures. To check for DLQ records during migration: ```sql SELECT COUNT(*) FROM ."~dlq"; ``` If the count is greater than zero, inspect the failed records. See [Troubleshoot errors](../about-iceberg-topics/#troubleshoot-errors) for steps to inspect and reprocess DLQ records. #### [](#multi-table-fan-out-transform-issues)Multi-table fan-out transform issues If the transform does not process messages, check if: - The specified output topics don’t exist or aren’t enabled with Iceberg. - The routing logic in the transform is incorrect, or the routing field is missing from input messages. - (When using Schema Registry validation) The schema registration failed during initialization, preventing the transform from starting. To check the transform status: ```bash rpk transform list ``` To view logs and check for errors: ```bash rpk transform logs ``` To check for routing errors: ```bash rpk transform logs | grep -i "unknown\|error" ``` If using Schema Registry validation, verify schema registration: ```bash # Check transform logs for schema registration messages rpk transform logs | grep -i "schema" # List registered schemas rpk registry schema list ``` ### [](#plan-for-rollback)Plan for rollback Before cutting over, ensure you have a rollback strategy. See the [Pre-cutover checklist](#pre-cutover-checklist) in the cutover section to verify you’re ready. #### [](#rollback-during-parallel-operation)Rollback during parallel operation If you discover issues while both systems are running: 1. Keep producing to both systems. 2. Point consumers back to source tables. 3. Investigate Iceberg Topics issues using troubleshooting section. 4. Fix issues and re-validate. 5. Attempt cutover again when ready. #### [](#rollback-after-external-integration-disabled)Rollback after external integration disabled > ⚠️ **WARNING** > > Rollback after stopping your external Iceberg integration may result in data loss or gaps. If you must rollback after disabling the external integration: 1. Restart your external Iceberg integration immediately. 2. Identify data written only to Iceberg Topics during the gap. 3. Export that data from Iceberg Topics tables: ```sql SELECT * FROM iceberg_topics_table WHERE timestamp > ''; ``` 4. Write exported data back to the source system (for example, Kafka Connect input topics or directly to source tables). 5. Verify data completeness across both systems. 6. Resume operations on the external integration. Redpanda recommends maintaining the ability to rollback for at least seven days after cutover to allow for issue discovery. ### [](#cut-over-to-iceberg-topics)Cut over to Iceberg Topics #### [](#pre-cutover-checklist)Pre-cutover checklist Before disabling your external Iceberg integration, ensure you have completed all validation steps: - All historical data is successfully merged (see [Merge historical data](#merge-historical-data)). - Parallel operation is complete and stable for at least 24-48 hours. - All validation queries pass (see [Validate the migration](#validate-the-migration)). - No records in DLQ tables, or all DLQ records are investigated and resolved. - Query performance meets requirements. - Downstream consumers are successfully tested with Iceberg Topics tables. - Monitoring and alerts are configured. - Rollback plan is verified and documented. #### [](#cutover-procedure)Cutover procedure 1. Set an appropriate maintenance window, ideally during low-traffic periods. 2. Stop your external Iceberg integration. **For Kafka Connect:** ```bash # Stop connector curl -X PUT http:///kafka-connect/clusters/iceberg-sink-connector/stop # Or delete connector (permanent) curl -X DELETE http:///kafka-connect/clusters/iceberg-sink-connector ``` 3. Monitor Iceberg Topics to ensure data continues flowing. 4. Verify that no new records are being written to source tables: ```sql SELECT MAX(timestamp) FROM .; -- Should not change after integration is stopped ``` 5. Run validation queries from [Validate the migration](#validate-the-migration) after 1-2 hours of operation. 6. Wait for a short period, such as 24-48 hours, to monitor and validate stability. 7. If migrating to a unified table of historical plus new data, optionally delete old source tables after an extended validation period (for example, at least seven days): > 📝 **NOTE** > > Ensure you have backups before deleting historical data. Some organizations keep old tables for compliance or audit purposes. ```sql DROP TABLE .; ``` 8. Decommission external Iceberg infrastructure after an extended safety period (30+ days, for example). If any issues arise during cutover, see [Plan for rollback](#plan-for-rollback). ## [](#next-steps)Next steps - [Query Iceberg Topics](../query-iceberg-topics/) - [About Iceberg Topics](../about-iceberg-topics/) --- # Page 435: Query Iceberg Topics **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/query-iceberg-topics.md --- # Query Iceberg Topics --- title: Query Iceberg Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/query-iceberg-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/query-iceberg-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/query-iceberg-topics.adoc description: Query Redpanda topic data stored in Iceberg tables, based on the topic Iceberg mode and schema. page-git-created-date: "2025-04-04" page-git-modified-date: "2025-09-23" --- When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic [Iceberg mode](../specify-iceberg-schema/) and whether you’ve registered a schema for the topic in the [Redpanda Schema Registry](../../schema-reg/schema-reg-overview/). You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda. ## [](#access-iceberg-tables)Access Iceberg tables Redpanda generates an Iceberg table with the same name as the topic. Depending on the processing engine and your Iceberg catalog implementation, you may also need to define the table (for example using `CREATE TABLE`) to point the data lakehouse to its location in the catalog. For BYOC clusters, the bucket name and table location are as follows: | Cloud provider | Bucket or container name | Iceberg table location | | --- | --- | --- | | AWS | redpanda-cloud-storage- | redpanda-iceberg-catalog/redpanda/ | | Azure | The Redpanda cluster ID is also used as the container name (ID) and the storage account ID. | | GCP | redpanda-cloud-storage- | For BYOVPC clusters, the bucket name is the name you chose when you created the object storage bucket as a customer-managed resource. For Azure clusters, you must add the public IP addresses or ranges from the REST catalog service, or other clients requiring access to the Iceberg data, to your cluster’s allow list. Alternatively, add subnet IDs to the allow list if the requests originate from the same Azure region. For example, to add subnet IDs to the allow list through the Control Plane API [`PATCH /v1/clusters/`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint, run: ```bash curl -X PATCH https://api.cloud.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d @- << EOF { "cloud_storage": { "azure": { "allowed_subnet_ids": [ ] } } } EOF ``` Some query engines may require you to manually refresh the Iceberg table snapshot (for example, by running a command like `ALTER TABLE REFRESH;`) to see the latest data. If your engine needs the full JSON metadata path, use the following: ```none redpanda-iceberg-catalog/redpanda//metadata/v.metadata.json ``` This provides read access to all snapshots written as of the specified table version (denoted by `version-number`). > 📝 **NOTE** > > Redpanda automatically removes expired snapshots on a periodic basis. Snapshot expiry helps maintain a smaller metadata size and reduces the window available for [time travel](#time-travel-queries). ## [](#query-examples)Query examples To follow along with the examples on this page, suppose you produce the same stream of events to a topic `ClickEvent`, which uses a schema, and another topic `ClickEvent_key_value`, which uses the key-value mode. The topic’s Iceberg data is stored in an AWS S3 bucket. A sample record contains the following data: ```bash {"user_id": 2324, "event_type": "BUTTON_CLICK", "ts": "2024-11-25T20:23:59.380Z"} ``` ### [](#topic-with-schema-value_schema_id_prefix-mode)Topic with schema (`value_schema_id_prefix` mode) > 📝 **NOTE** > > The steps in this section also apply to the `value_schema_latest` mode, except the produce step. The `value_schema_latest` mode is not compatible with the Schema Registry wire format. The [`rpk topic produce`](#reference:rpk/rpk-topic/rpk-topic-produce) command embeds the wire format header, so you must use your own producer code with `value_schema_latest`. Assume that you have created the `ClickEvent` topic, set `redpanda.iceberg.mode` to `value_schema_id_prefix`, and are connecting to a REST-based Iceberg catalog. The following is an Avro schema for `ClickEvent`: `schema.avsc` ```avro { "type" : "record", "namespace" : "com.redpanda.examples.avro", "name" : "ClickEvent", "fields" : [ { "name": "user_id", "type" : "int" }, { "name": "event_type", "type" : "string" }, { "name": "ts", "type": "string" } ] } ``` 1. Register the schema under the `ClickEvent-value` subject: ```bash rpk registry schema create ClickEvent-value --schema path/to/schema.avsc --type avro ``` 2. Produce to the `ClickEvent` topic using the following format: ```bash echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' --schema-id=topic ``` The `value_schema_id_prefix` mode requires that you produce to a topic using the [Schema Registry wire format](../../schema-reg/schema-reg-overview/#wire-format), which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record. 3. The following Spark SQL query returns values from columns in the `ClickEvent` table, with the table structure derived from the schema, and column names matching the schema fields. If you’ve integrated a catalog, query engines such as Spark SQL provide Iceberg integrations that allow easy discovery and access to existing Iceberg tables in object storage. ```sql SELECT * FROM ``.redpanda.ClickEvent; ``` ```bash +-----------------------------------+---------+--------------+--------------------------+ | redpanda | user_id | event_type | ts | +-----------------------------------+---------+--------------+--------------------------+ | {"partition":0,"offset":0,"timestamp":2025-03-05 15:09:20.436,"headers":null,"key":null} | 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | +-----------------------------------+---------+--------------+--------------------------+ ``` ### [](#topic-in-key-value-mode)Topic in key-value mode In `key_value` mode, you do not associate the topic with a schema in the Schema Registry, which means using semi-structured data in Iceberg. The record keys and values can have an arbitrary structure, so Redpanda stores them in [binary format](https://apache.github.io/iceberg/spec/?h=spec#primitive-types) in Iceberg. In this example, assume that you have created the `ClickEvent_key_value` topic, and set `redpanda.iceberg.mode` to `key_value`. 1. Produce to the `ClickEvent_key_value` topic using the following format: ```bash echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent_key_value --format='%k %v\n' ``` 2. The following Spark SQL query returns the semi-structured data in the `ClickEvent_key_value` table. The table consists of two columns: one named `redpanda`, containing the record key and other metadata, and another binary column named `value` for the record’s value: ```sql SELECT * FROM ``.redpanda.ClickEvent_key_value; ``` ```bash +-----------------------------------+------------------------------------------------------------------------------+ | redpanda | value | +-----------------------------------+------------------------------------------------------------------------------+ | {"partition":0,"offset":0,"timestamp":2025-03-05 15:14:30.931,"headers":null,"key":key1} | {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"} | +-----------------------------------+------------------------------------------------------------------------------+ ``` Depending on your query engine, you might need to first decode the binary value to display the record key and value using a SQL helper function. For example, see the [`decode` and `unhex`](https://spark.apache.org/docs/latest/api/sql/index.html#unhex) Spark SQL functions, or the [HEX\_DECODE\_STRING](https://docs.snowflake.com/en/sql-reference/functions/hex_decode_string) Snowflake function. Some engines may also automatically decode the binary value for you. ### [](#time-travel-queries)Time travel queries Some query engines, such as Spark, support time travel with Iceberg, allowing you to query the table as it existed at a specific point in the past. You can run a time travel query by specifying a timestamp or version number. Redpanda automatically removes expired snapshots on a periodic basis, which also reduces the window available for time travel queries. By default, Redpanda retains snapshots for five days, so you can query Iceberg tables as of up to five days ago. The following example queries a `ClickEvent` table at a specific timestamp in Spark: ```sql SELECT * FROM ``.redpanda.ClickEvent TIMESTAMP AS OF '2025-03-02 10:00:00'; ``` --- # Page 436: Query Iceberg Topics using Snowflake and Open Catalog **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/redpanda-topics-iceberg-snowflake-catalog.md --- # Query Iceberg Topics using Snowflake and Open Catalog --- title: Query Iceberg Topics using Snowflake and Open Catalog latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/redpanda-topics-iceberg-snowflake-catalog page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc description: Add Redpanda topics as Iceberg tables that you can query in Snowflake using an Open Catalog integration. page-git-created-date: "2025-05-21" page-git-modified-date: "2026-03-06" --- This guide walks you through querying Redpanda topics as Iceberg tables in [Snowflake](https://docs.snowflake.com/en/user-guide/tables-iceberg), with AWS S3 as object storage and a catalog integration using [Open Catalog](https://other-docs.snowflake.com/en/opencatalog/overview). ## [](#prerequisites)Prerequisites - `rpk` or familiarity with the Redpanda Cloud API to use secrets in your cluster configuration. For `rpk`, see [Install or Update rpk](../../rpk/rpk-install/). For the Cloud API, you must [authenticate](/api/cloud-controlplane/authentication) using a service account. - A Snowflake account. - An Open Catalog account. To [create an Open Catalog account](https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account), you require ORGADMIN access in Snowflake. - An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage. Follow this guide to [create a catalog](https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3) with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS: 1. If you don’t already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket. 2. Create an IAM role and attach the IAM policy to the role. 3. After creating a new catalog in Open Catalog, grant the catalog’s AWS IAM user access to the S3 bucket. - A Snowflake [external volume](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume) set up using the Tiered Storage bucket. Follow this guide to [configure the external volume with S3](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3). You can use the same IAM policy as the catalog for the external volume’s IAM role and user. ## [](#set-up-catalog-integration-using-open-catalog)Set up catalog integration using Open Catalog ### [](#create-a-new-open-catalog-service-connection-for-redpanda)Create a new Open Catalog service connection for Redpanda To create a new service connection to integrate the Iceberg-enabled topics into Open Catalog: 1. In Open Catalog, select **Connections**, then **\+ Connection**. 2. In **Configure Service Connection**, provide a name. Open Catalog creates a new principal with this name. 3. Make sure **Create new principal role** is selected. 4. Enter a name for the principal role. Then, click **Create**. After you create the connection, get the client ID and client secret. Save these credentials to add to your cluster configuration in a later step. ### [](#create-a-catalog-role)Create a catalog role Grant privileges to the principal created in the previous step: 1. In Open Catalog, select **Catalogs**, and select your catalog. 2. On the **Roles** tab of your catalog, click **\+ Catalog Role**. 3. Give the catalog role a name. 4. Under **Privileges**, select `CATALOG_MANAGE_CONTENT`. This provides full management [privileges](https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges) for the catalog. Then, click **Create**. 5. On the **Roles** tab of the catalog, click **Grant to Principal Role**. 6. Select the catalog role you just created. 7. Select the principal role you created earlier. Click **Grant**. ### [](#update-cluster-configuration)Update cluster configuration To configure your Redpanda cluster to enable Iceberg on a topic and integrate with Open Catalog: 1. [Store the Open Catalog client secret in your cluster](../use-iceberg-catalogs/#store-a-secret-for-rest-catalog-authentication) using `rpk` or the Data Plane API. 2. [Edit your cluster configuration](../use-iceberg-catalogs/#use-a-secret-in-cluster-configuration) to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below using `rpk` or the Control Plane API. For example, to use `rpk cluster config set`, run: ```bash rpk cluster config set \ iceberg_enabled=true \ iceberg_catalog_type=rest \ iceberg_rest_catalog_endpoint=https://-.snowflakecomputing.com/polaris/api/catalog \ iceberg_rest_catalog_authentication_mode=oauth2 \ iceberg_rest_catalog_client_id= \ iceberg_rest_catalog_client_secret='${secrets.}' \ iceberg_rest_catalog_warehouse= # Optional properties: # iceberg_translation_interval_ms_default=1000 # iceberg_catalog_commit_interval_ms=1000 ``` Use your own values for the following placeholders: - `` and ``: Your [Open Catalog account URI](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters) is composed of these values. > 💡 **TIP** > > In Snowflake, navigate to **Admin**, then **Accounts**. Click the ellipsis near your Open Catalog account name, and select **Manage URLs**. The **Current URL** contains `` and ``. - ``: The client ID of the service connection you created in an earlier step. - ``: The name of the secret you created in the previous step. You must pass the secret name to the `${secrets.}` placeholder, not the secret value itself. - ``: The name of your catalog in Open Catalog. ```bash Successfully updated configuration. New configuration version is 2. ``` 3. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. This mode creates an Iceberg table for the topic consisting of two columns: one for the record metadata including the key, and another binary column for the record’s value. See [Enable Iceberg integration](../about-iceberg-topics/#enable-iceberg-integration) for more details on Iceberg modes. Use any of the following to set `redpanda.iceberg.mode`: - `rpk`. See the following examples to run `rpk topic` commands. - The Cloud UI. Navigate to **Topics** to create a new topic and specify `redpanda.iceberg.mode` in **Additional Configuration**, or edit an existing topic under the topic’s **Configuration** tab. - The Data Plane API to [create a new topic](/api/doc/cloud-dataplane/operation/operation-topicservice_createtopic) or [update a property for an existing topic](/api/doc/cloud-dataplane/operation/operation-topicservice_updatetopicconfigurations). Specify the key-value pair for `redpanda.iceberg.mode` in the request body. The following examples show how to use `rpk` to create a new topic or alter the configuration for an existing topic, setting the Iceberg mode to `key_value`. Create a new topic and set `redpanda.iceberg.mode`: ```bash rpk topic create --topic-config=redpanda.iceberg.mode=key_value ``` Set `redpanda.iceberg.mode` for an existing topic: ```bash rpk topic alter-config --set redpanda.iceberg.mode=key_value ``` 4. Produce to the topic. For example, ```bash echo "hello world\nfoo bar\nbaz qux" | rpk topic produce --format='%k %v\n' ``` You should see the topic as a table in Open Catalog. 1. In Open Catalog, select **Catalogs**, then open your catalog. 2. Under your catalog, you should see the `redpanda` namespace and a table with the name of your topic. The namespace and the table are automatically added for you. ## [](#query-iceberg-table-in-snowflake)Query Iceberg table in Snowflake To query the topic in Snowflake, you must create a [catalog integration](https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration) so that Snowflake has access to the table data and metadata. ### [](#configure-catalog-integration-with-snowflake)Configure catalog integration with Snowflake 1. Run the [`CREATE CATALOG INTEGRATION`](https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog) command in Snowflake: ```sql CREATE CATALOG INTEGRATION CATALOG_SOURCE = POLARIS TABLE_FORMAT = ICEBERG CATALOG_NAMESPACE = 'redpanda' REST_CONFIG = ( CATALOG_URI = '' WAREHOUSE = '' ) REST_AUTHENTICATION = ( TYPE = OAUTH OAUTH_CLIENT_ID = '' OAUTH_CLIENT_SECRET = '' OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL') ) REFRESH_INTERVAL_SECONDS = 30 ENABLED = TRUE; ``` Use your own values for the following placeholders: - ``: Provide a name for your Iceberg catalog integration in Snowflake. - ``: Your [Open Catalog account URI](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters) (`[https://-.snowflakecomputing.com/polaris/api/catalog](https://-.snowflakecomputing.com/polaris/api/catalog)`). - ``: The name of your catalog in Open Catalog. - ``: The client ID of the service connection you created in an earlier step. - ``: The client secret of the service connection you created in an earlier step. 2. Run the following command to verify that the catalog is integrated correctly: ```sql SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG(''); ``` ```bash # Example result for redpanda.iceberg.mode=key_value +-----------------------------------------------------------------------+ | SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('') | +-----------------------------------------------------------------------+ | [{"namespace":"redpanda","name":""}] | +-----------------------------------------------------------------------+ ``` ### [](#create-iceberg-table-in-snowflake)Create Iceberg table in Snowflake After creating the catalog integration, you must create an externally-managed table in Snowflake. You must run your Snowflake queries against this table. In your Snowflake database, run the [CREATE ICEBERG TABLE](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest) command. The following example also specifies that the table should automatically refresh metadata: ```sql CREATE ICEBERG TABLE CATALOG = '' EXTERNAL_VOLUME = '' CATALOG_TABLE_NAME = '' AUTO_REFRESH = TRUE ``` Use your own values for the following placeholders: - ``: Provide a name for your table in Snowflake. - ``: The name of the catalog integration you configured in an earlier step. - ``: The name of the external volume you configured using the Tiered Storage bucket. - ``: The name of the table in your catalog, which is the same as your Redpanda topic name. ### [](#query-table)Query table To verify that Snowflake has successfully created the table containing the topic data, run the following: ```sql SELECT * FROM ; ``` Your query results should look like the following: ```bash # Example for redpanda.iceberg.mode=key_value with 3 records produced to topic +--------------------------------------------------------------------------------------------------------------+------------+ | REDPANDA | VALUE | +--------------------------------------------------------------------------------------------------------------+------------+ | { "partition": 0, "offset": 0, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "68656C6C6F"} | 776F726C64 | | { "partition": 0, "offset": 1, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "666F6F"} | 626172 | | { "partition": 0, "offset": 2, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "62617A" } | 717578 | +--------------------------------------------------------------------------------------------------------------+------------+ ``` --- # Page 437: Integrate with REST Catalogs **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/rest-catalog.md --- # Integrate with REST Catalogs --- title: Integrate with REST Catalogs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/rest-catalog/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/rest-catalog/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/rest-catalog/index.adoc description: Integrate Redpanda topics with managed Iceberg REST Catalogs. page-git-created-date: "2025-08-05" page-git-modified-date: "2025-11-27" --- > 💡 **TIP** > > These guides are for integrating Iceberg topics with managed REST catalogs. Integrating with a REST catalog is recommended for production deployments. If it is not possible to use a REST catalog, you can use the [filesystem-based catalog](../use-iceberg-catalogs/#object-storage). For an example of using the filesystem-based catalog to access Iceberg topics, see the [Getting Started with Iceberg Topics on Redpanda BYOC](https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup) blog post. - [Query Iceberg Topics using AWS Glue](../iceberg-topics-aws-glue/) Add Redpanda topics as Iceberg tables that you can access through the AWS Glue Data Catalog. - [Query Iceberg Topics using Databricks and Unity Catalog](../iceberg-topics-databricks-unity/) Add Redpanda topics as Iceberg tables that you can query in Databricks managed by Unity Catalog. - [Query Iceberg Topics using Snowflake and Open Catalog](../redpanda-topics-iceberg-snowflake-catalog/) Add Redpanda topics as Iceberg tables that you can query in Snowflake using an Open Catalog integration. --- # Page 438: Specify Iceberg Schema **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/specify-iceberg-schema.md --- # Specify Iceberg Schema --- title: Specify Iceberg Schema latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/specify-iceberg-schema page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/specify-iceberg-schema.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/specify-iceberg-schema.adoc description: Learn about supported Iceberg modes and how you can integrate schemas with Iceberg topics. page-git-created-date: "2025-07-31" page-git-modified-date: "2025-07-31" --- In [Iceberg-enabled clusters](../about-iceberg-topics/#enable-iceberg-integration), the `redpanda.iceberg.mode` topic property determines how Redpanda maps topic data to the Iceberg table structure. You can have the generated Iceberg table match the structure of a schema in the Schema Registry, or you can use the `key_value` mode where Redpanda stores the record values as-is in the table. ## [](#supported-iceberg-modes)Supported Iceberg modes Redpanda supports the following modes for Iceberg topics: ### [](#key_value)key\_value Creates an Iceberg table using a simple schema, consisting of two columns, one for the record metadata including the key, and another binary column for the record’s value. ### [](#value_schema_id_prefix)value\_schema\_id\_prefix Creates an Iceberg table whose structure matches the Redpanda schema for the topic, with columns corresponding to each field. You must register a schema in the [Schema Registry](../../schema-reg/schema-reg-overview/) and producers must write to the topic using the Schema Registry wire format. In the [Schema Registry wire format](../../schema-reg/schema-reg-overview/#wire-format), a "magic byte" and schema ID are embedded in the message payload header. Producers to the topic must use the wire format in the serialization process so Redpanda can determine the schema used for each record, use the schema to define the Iceberg table, and store the topic values in the corresponding table columns. ### [](#value_schema_latest)value\_schema\_latest Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the Schema Registry. Producers cannot use the wire format in `value_schema_latest` mode. Redpanda expects the serialized message as-is without the magic byte or schema ID prefix in the record value. > 📝 **NOTE** > > The `value_schema_latest` mode is not compatible with the [`rpk topic produce`](#reference:rpk/rpk-topic/rpk-topic-produce) command which embeds the wire format header. You must use your own producer code to produce to topics in `value_schema_latest` mode. The latest schema is cached periodically. The cache period is defined by the cluster property `iceberg_latest_schema_cache_ttl_ms` (default: 5 minutes). ### [](#disabled)disabled Default for `redpanda.iceberg.mode`. Disables writing to an Iceberg table for the topic. > 📝 **NOTE** > > The following modes are compatible with producing to an Iceberg topic using Redpanda Console: > > - `key_value` > > - Starting in version 25.2, `value_schema_latest` with a JSON schema > > > Otherwise, records may fail to write to the Iceberg table and instead write to the [dead-letter queue](../about-iceberg-topics/#manage-dead-letter-queue). ## [](#configure-iceberg-mode-for-a-topic)Configure Iceberg mode for a topic You can set the Iceberg mode for a topic when you create the topic, or you can update the mode for an existing topic. Option 1. Create a new topic and set `redpanda.iceberg.mode`: ```bash rpk topic create --topic-config=redpanda.iceberg.mode= ``` Option 2. Set `redpanda.iceberg.mode` for an existing topic: ```bash rpk topic alter-config --set redpanda.iceberg.mode= ``` ### [](#override-value-schema-latest-default)Override `value_schema_latest` default In `value_schema_latest` mode, you only need to set the property value to the string `value_schema_latest`. This enables the default behavior of `value_schema_latest` mode, which determines the subject for the topic using the TopicNameStrategy. For example, if your topic is named `sensor` the schema is looked up in the `sensor-value` subject. For Protobuf data, the default behavior also deserializes records using the first message defined in the corresponding Protobuf schema stored in the Schema Registry. If you use a different strategy other than the topic name to derive the subject name, you can override the default behavior of `value_schema_latest` mode and explicitly set the subject name. To override the default behavior, use the following optional syntax: ```bash value_schema_latest:subject=,protobuf_name= ``` - For both Avro and Protobuf, specify a different subject name by using the key-value pair `subject=`, for example `value_schema_latest:subject=sensor-data`. - For Protobuf only: - Specify a different message definition by using a key-value pair `protobuf_name=`. You must use the fully qualified name, which includes the package name, for example, `value_schema_latest:protobuf_name=com.example.manufacturing.SensorData`. - To specify both a different subject and message definition, separate the key-value pairs with a comma, for example: `value_schema_latest:subject=my_protobuf_schema,protobuf_name=com.example.manufacturing.SensorData`. > 📝 **NOTE** > > If you don’t specify the fully qualified Protobuf message name, Redpanda pauses the data translation to the Iceberg table until you fix the topic misconfiguration. ## [](#how-iceberg-modes-translate-to-table-format)How Iceberg modes translate to table format Redpanda generates an Iceberg table with the same name as the topic. In each mode, Redpanda writes to a `redpanda` table column that stores a single Iceberg [struct](https://iceberg.apache.org/spec/#nested-types) per record, containing nested columns of the metadata from each record, including the record key, headers, timestamp, the partition it belongs to, and its offset. For example, if you produce to a topic `ClickEvent` according to the following Avro schema: ```avro { "type": "record", "name": "ClickEvent", "fields": [ { "name": "user_id", "type": "int" }, { "name": "event_type", "type": "string" }, { "name": "ts", "type": "string" } ] } ``` The `key_value` mode writes to the following table format: ```sql CREATE TABLE ClickEvent ( redpanda struct< partition: integer, timestamp: timestamptz, offset: long, headers: array>, key: binary, timestamp_type: integer >, value binary ) ``` Use `key_value` mode if you want to use the Iceberg data in its semi-structured format. The `value_schema_id_prefix` and `value_schema_latest` modes can use the schema to translate to the following table format: ```sql CREATE TABLE ClickEvent ( redpanda struct< partition: integer, timestamp: timestamptz, offset: long, headers: array>, key: binary, timestamp_type: integer >, user_id integer NOT NULL, event_type string, ts string ) ``` As you produce records to the topic, the data also becomes available in object storage for Iceberg-compatible clients to consume. You can use the same analytical tools to [read the Iceberg topic data](../query-iceberg-topics/) in a data lake as you would for a relational database. If Redpanda fails to translate the record to the columnar format as defined by the schema, it writes the record to a dead-letter queue (DLQ) table. See [Troubleshoot errors](../about-iceberg-topics/#troubleshoot-errors) for more information. > 📝 **NOTE** > > You cannot use schemas to parse or decode record keys for Iceberg. The record keys are always stored in binary format in the `redpanda.key` column. ### [](#schema-types-translation)Schema types translation Redpanda supports direct translations of the following types to Iceberg value domains: #### Avro | Avro type | Iceberg type | | --- | --- | | boolean | boolean | | int | int | | long | long | | float | float | | double | double | | bytes | binary | | string | string | | record | struct | | array | list | | map | map | | fixed | fixed* | | decimal | decimal | | uuid | uuid* | | date | date | | time | time* | | timestamp | timestamp | \*These types are not currently supported in Unity Catalog managed Iceberg tables. There are some cases where the Avro type does not map directly to an Iceberg type and Redpanda applies the following transformations: - Enums are translated into the Iceberg `string` type. - Different flavors of time (such as `time-millis`) and timestamp (such as `timestamp-millis`) types are translated to the same Iceberg `time` and `timestamp` types, respectively. - Avro unions are flattened to Iceberg structs with optional fields. For example: - The union `["int", "long", "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 LONG NULLABLE, 2 FLOAT NULLABLE>`. - The union `["int", null, "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 FLOAT NULLABLE>`. - Two-field unions that contain `null` are represented as a single optional field only (no struct). For example, the union `["null", "long"]` is represented as `long`. Some Avro types are not supported: - The Avro `duration` logical type is ignored. - The Avro `null` type is ignored and not represented in the Iceberg schema. - Recursive types are not supported. #### Protobuf | Protobuf type | Iceberg type | | --- | --- | | bool | boolean | | double | double | | float | float | | int32 | int | | sint32 | int | | int64 | long | | sint64 | long | | sfixed32 | int | | sfixed64 | long | | string | string | | bytes | binary | | map | map | | message | struct | There are some cases where the Protobuf type does not map directly to an Iceberg type and Redpanda applies the following transformations: - Repeated values are translated into Iceberg `list` types. - Enums are translated into the Iceberg `string` type. - `uint32` and `fixed32` are translated into Iceberg `long` types as that is the existing semantic for unsigned 32-bit values in Iceberg. - `uint64` and `fixed64` values are translated into their Base-10 string representation. - `google.protobuf.Timestamp` is translated into `timestamp` in Iceberg. Recursive types are not supported. #### JSON Schema Requirements: - Only JSON Schema Draft-07 is currently supported. - You must declare the JSON Schema dialect using the `$schema` keyword, for example `"$schema": "http://json-schema.org/draft-07/schema#"`. - You must use a JSON Schema that constrains JSON documents to a strict type so Redpanda can translate to Iceberg. In most cases this means each subschema uses the `type` keyword, but a subschema can also use `$ref` if the referenced schema resolves to a strict type. Valid JSON Schema example ```json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "productId": { "type": "integer" }, "tags": { "type": "array", "items": { "type": "string" } } } } ``` | JSON type | Iceberg type | Notes | | --- | --- | --- | | array | list | The keywords items and additionalItems must be used to constrain element types. | | boolean | boolean | | | null | | The null type is only supported as a nullability marker, either in a type array (for example, ["string", "null"]) or in an exclusive oneOf nullable pattern. | | number | double | | | integer | long | | | string | string | The format keyword can be used for custom Iceberg types. See format annotation translation for details. | | object | struct or map | Use properties to define struct fields and constrain their types. additionalProperties: false is supported for closed objects.If additionalProperties contains a schema, it translates to an Iceberg map.You cannot combine properties and additionalProperties in an object if additionalProperties is set to a schema. | | format value | Iceberg type | | --- | --- | | date-time | timestamptz | | date | date | | time | time | The following keywords have specific behavior: - The `$ref` keyword is supported for internal references resolved from schema resources declared in the same document (using `$id`), including relative and absolute URI forms. References to external resources and references to unknown keywords are not supported. A root-level `$ref` schema is not supported. - The `oneOf` keyword is supported only for the nullable serializer pattern where exactly one branch is `{"type":"null"}` and the other branch is a non-null schema (`T|null`). - In Iceberg output, Redpanda writes all fields as nullable regardless of serializer nullability annotations. The following are not supported for JSON Schema: - The `$dynamicRef` keyword - The `default` keyword - Conditional typing (`if`, `then`, `else`, `dependencies` keywords) - Boolean JSON Schema combinations (`allOf`, `anyOf`, and non-nullable `oneOf` patterns) - Dynamic object members with the `patternProperties` keyword - The `additionalProperties` keyword when set to `true` --- # Page 439: Use Iceberg Catalogs **URL**: https://docs.redpanda.com/redpanda-cloud/manage/iceberg/use-iceberg-catalogs.md --- # Use Iceberg Catalogs --- title: Use Iceberg Catalogs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: iceberg/use-iceberg-catalogs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: iceberg/use-iceberg-catalogs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/use-iceberg-catalogs.adoc description: Learn how to access Redpanda topic data stored in Iceberg tables, using table metadata or a catalog integration. page-git-created-date: "2025-04-04" page-git-modified-date: "2025-09-23" --- To read from the Redpanda-generated [Iceberg table](../about-iceberg-topics/), your Iceberg-compatible client or tool needs access to the catalog to retrieve the table metadata and know the current state of the table. The catalog provides the current table metadata, which includes locations for all the table’s data files. You can configure Redpanda to either connect to a REST-based catalog, or use a filesystem-based catalog. For production deployments, Redpanda recommends [using an external REST catalog](#rest) to manage Iceberg metadata. This enables built-in table maintenance, safely handles multiple engines and tools accessing tables at the same time, facilitates data governance, and maximizes data discovery. However, if it is not possible to use a REST catalog, you can [use the filesystem-based catalog](#object-storage) (`object_storage` catalog type), which does not require you to maintain a separate service to access the Iceberg data. In either case, you use the catalog to load, query, or refresh the Iceberg table as you produce to the Redpanda topic. See the documentation for your query engine or Iceberg-compatible tool for specific guidance on adding the Iceberg tables to your data warehouse or lakehouse using the catalog. After you have selected a catalog type at the cluster level and [enabled the Iceberg integration](../about-iceberg-topics/#enable-iceberg-integration) for a topic, you cannot switch to another catalog type. ## [](#rest)Connect to a REST catalog > 📝 **NOTE** > > Redpanda connects to an Iceberg catalog that you provision and manage. Redpanda does not create or manage the catalog service, its databases, or any associated network configuration. Connect to an Iceberg REST catalog using the standard [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) supported by many catalog providers. Use this catalog integration type with REST-enabled Iceberg catalog services, such as [Databricks Unity](https://docs.databricks.com/en/data-governance/unity-catalog/index.html) and [Snowflake Open Catalog](https://other-docs.snowflake.com/en/opencatalog/overview). > 💡 **TIP** > > This section provides general guidance on using REST catalogs with Redpanda. For instructions on integrating with specific REST catalog services, see the following: > > - [AWS Glue Data Catalog](../iceberg-topics-aws-glue/) > > - [Databricks Unity Catalog](../iceberg-topics-databricks-unity/) > > - [Snowflake with Open Catalog](../redpanda-topics-iceberg-snowflake-catalog/) ### [](#prerequisites)Prerequisites For BYOVPC clusters, you must: 1. Enable secrets management, which allows you to store and use secrets in your cluster’s Iceberg catalog authentication properties. Secrets management is enabled by default for AWS if you follow the guide to [creating a new BYOVPC cluster](../../../get-started/cluster-types/byoc/aws/vpc-byo-aws/). For GCP, follow the guides to enable secrets management for a [new BYOVPC cluster](../../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/) or an [existing BYOVPC cluster](../../../get-started/cluster-types/byoc/gcp/enable-secrets-byovpc-gcp/). 2. Ensure that your network security settings allow egress traffic from the Redpanda network to the catalog service endpoints. ### [](#limitations)Limitations The Iceberg integration for Redpanda Cloud supports multiple Iceberg catalogs across different cloud platforms, with progressive levels of release maturity. Each combination of cloud provider and catalog integration is tested and released independently. The following matrix shows the current status of Iceberg integrations across different cloud providers and catalogs. Check this matrix regularly as Redpanda Cloud continues to expand GA coverage for Iceberg topics. | | Databricks Unity Catalog | Snowflake Open Catalog | AWS Glue Data Catalog | Google BigQuery | | --- | --- | --- | --- | --- | | AWS | Supported | Beta | Beta | N/A | | GCP | Supported | Beta | N/A | Beta | | Azure | Beta | Beta | N/A | N/A | Other REST catalogs, such as Apache Polaris, Dremio Nessie (to be [merged with Polaris](https://www.dremio.com/newsroom/polaris-catalog-to-be-merged-with-nessie-now-available-on-github/)), and the Apache reference implementation, have been tested but are not regularly verified. For more information, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). ### [](#set-cluster-properties)Set cluster properties To connect to a REST catalog, set the following cluster configuration properties: - `[iceberg_catalog_type](../../../reference/properties/cluster-properties/#iceberg_catalog_type)`: `rest` - `[iceberg_rest_catalog_endpoint](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_endpoint)`: The endpoint URL for your Iceberg catalog. You either manage this directly, or you have this managed by an external catalog service. > 📝 **NOTE** > > You must set `iceberg_rest_catalog_endpoint` at the same time that you set `iceberg_catalog_type` to `rest`. #### [](#configure-authentication)Configure authentication To authenticate with the REST catalog, set the following cluster properties: - `[iceberg_rest_catalog_authentication_mode](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_authentication_mode)`: The authentication mode to use for the REST catalog. Choose from `oauth2`, `aws_sigv4`, `bearer`, or `none` (default). You must use `aws_sigv4` for [AWS Glue Data Catalog](../iceberg-topics-aws-glue/). Redpanda generally recommends using `oauth2` for REST catalogs. - For `oauth2`, also configure the following properties: - `[iceberg_rest_catalog_oauth2_server_uri](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_oauth2_server_uri)`: The OAuth endpoint URI used to retrieve tokens for REST catalog authentication. If left unset, the deprecated catalog endpoint `/v1/oauth/tokens` is used as the token endpoint instead. - `[iceberg_rest_catalog_client_id](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_client_id)`: The ID used to query the OAuth token endpoint for REST catalog authentication. - `[iceberg_rest_catalog_client_secret](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_client_secret)`: The secret used with the client ID to query the OAuth token endpoint for REST catalog authentication. - For `bearer`, configure the `[iceberg_rest_catalog_token](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_token)` property with your bearer token. Redpanda uses the bearer token unconditionally and does not attempt to refresh the token. Only use the bearer authentication mode for ad hoc or testing purposes. For REST catalogs that use self-signed certificates, also configure these properties: - `[iceberg_rest_catalog_trust](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_trust)`: The contents of a certificate chain to trust for the REST catalog. - `[iceberg_rest_catalog_crl](../../../reference/properties/cluster-properties/#iceberg_rest_catalog_crl)`: The contents of a certificate revocation list for `iceberg_rest_catalog_trust`. See [Cluster Configuration Properties](../../../reference/properties/cluster-properties/) for the full list of cluster properties to configure for a catalog integration. ### [](#store-a-secret-for-rest-catalog-authentication)Store a secret for REST catalog authentication To store a secret that you can reference in your catalog authentication cluster properties, you must create the secret using `rpk` or the Data Plane API. Secrets are stored in the secret management solution of your cloud provider. Redpanda retrieves the secrets at runtime. For more information, see [Introduction to rpk](../../rpk/intro-to-rpk/) and [Cloud API Overview](/api/doc/cloud-dataplane/topic/topic-cloud-api-overview). If you need to configure any of the following properties, you must set their values using secrets: - `iceberg_rest_catalog_client_secret` - `iceberg_rest_catalog_crl` - `iceberg_rest_catalog_token` - `iceberg_rest_catalog_trust` To create a new secret: #### rpk Run the following `rpk` command: ```bash rpk security secret create --name --value --scopes redpanda_cluster ``` Replace the placeholders with your own values: - ``: The name of the secret you want to add. The secret name is also its ID. Use only the following characters: `^[A-Z][A-Z0-9_]*$`. - ``: The value of the secret. #### Cloud API 1. Authenticate and make a `GET /v1/clusters/{id}` request to [retrieve the Data Plane API URL](../../api/cloud-dataplane-api/#get-data-plane-api-url) for your cluster. 2. Make a request to [`POST /v1/secrets`](/api/doc/cloud-dataplane/operation/operation-secretservice_createsecret). You must use a Base64-encoded secret. ```bash curl -X POST "https:///v1/secrets" \ -H 'accept: application/json'\ -H 'authorization: Bearer '\ -H 'content-type: application/json' \ -d '{"id":"","scopes":["SCOPE_REDPANDA_CLUSTER"],"secret_data":""}' ``` You must include the following values: - ``: The base URL for the Data Plane API. - ``: The API key you generated during authentication. - ``: The name of the secret you want to add. The secret name is also its ID. Use only the following characters: `^[A-Z][A-Z0-9_]*$`. - ``: The Base64-encoded secret. - This scope: `"SCOPE_REDPANDA_CLUSTER"`. The response returns the name and scope of the secret. You can now [reference the secret in your cluster configuration](#use-a-secret-in-cluster-configuration). ### [](#use-a-secret-in-cluster-configuration)Use a secret in cluster configuration To set the cluster property to use the value of the secret, use `rpk` or the Control Plane API. For example, to use a secret for the `iceberg_rest_catalog_client_secret` property, run: #### rpk ```bash rpk cluster config set iceberg_rest_catalog_client_secret '${secrets.}' ``` #### Cloud API Make a request to the [`PATCH /v1/clusters/`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint of the Control Plane API. ```bash curl -H "Authorization: Bearer " -X PATCH \ "https://api.cloud.redpanda.com/v1/clusters/" \ -H 'accept: application/json'\ -H 'content-type: application/json' \ -d '{"cluster_configuration": { "custom_properties": { "iceberg_rest_catalog_client_secret": "${secrets.}" } } }' ``` You must include the following values: - ``: The ID of the Redpanda cluster. - ``: The API key you generated during authentication. - ``: The name of the secret you created earlier. ### [](#example-rest-catalog-configuration)Example REST catalog configuration Suppose you configure the following Redpanda cluster properties for connecting to a REST catalog: ```yaml iceberg_catalog_type: rest iceberg_rest_catalog_endpoint: http://catalog-service:8181 iceberg_rest_catalog_authentication_mode: oauth2 iceberg_rest_catalog_oauth2_server_uri: iceberg_rest_catalog_client_id: iceberg_rest_catalog_client_secret: ``` If you use Apache Spark as a processing engine, your Spark configuration might look like the following. This example uses a catalog named `streaming`: ```spark spark.sql.catalog.streaming = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.streaming.type = rest spark.sql.catalog.streaming.uri = http://catalog-service:8181 spark.sql.catalog.streaming.warehouse = # You may need to configure additional properties based on your object storage provider. # See https://iceberg.apache.org/docs/latest/spark-configuration/#catalog-configuration and https://spark.apache.org/docs/latest/configuration.html # For example, for AWS S3: # spark.sql.catalog.streaming.io-impl = org.apache.iceberg.aws.s3.S3FileIO # spark.sql.catalog.streaming.s3.endpoint = http:// ``` > 📝 **NOTE** > > Redpanda recommends setting credentials in environment variables so Spark can securely access your Iceberg data in object storage. For example, for AWS, use `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. The Spark engine can use the REST catalog to automatically discover the topic’s Iceberg table. Using Spark SQL, you can query the Iceberg table directly by specifying the catalog name, the namespace, and the table name: ```sql SELECT * FROM streaming.redpanda.; ``` The Iceberg table name is the name of your Redpanda topic. > 💡 **TIP** > > You may need to explicitly create a table for the Iceberg data in your query engine. For an example, see [Query Iceberg Topics using Snowflake and Open Catalog](../redpanda-topics-iceberg-snowflake-catalog/). ## [](#object-storage)Integrate filesystem-based catalog (`object_storage`) By default, Iceberg topics use the filesystem-based catalog (`[iceberg_catalog_type](../../../reference/properties/cluster-properties/#iceberg_catalog_type)` cluster property set to `object_storage`). Redpanda stores the table metadata in [HadoopCatalog](https://iceberg.apache.org/docs/latest/java-api-quickstart/#using-a-hadoop-catalog) format in the same object storage bucket or container as the data files. If using the `object_storage` catalog type, you provide the object storage URI of the table’s `metadata.json` file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables. > 📝 **NOTE** > > The `metadata.json` file points to a specific Iceberg table snapshot. In your query engine, you must update your tables whenever a new snapshot is created so that they point to the latest snapshot. See the [official Iceberg documentation](https://iceberg.apache.org/docs/latest/maintenance/) for more information, and refer to the documentation for your query engine or Iceberg-compatible tool for specific guidance on Iceberg table update or refresh. ### [](#example-filesystem-based-catalog-configuration)Example filesystem-based catalog configuration To configure Apache Spark to use a filesystem-based catalog, specify at least the following properties: ```spark spark.sql.catalog.streaming = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.streaming.type = hadoop # URI for table metadata: AWS S3 example spark.sql.catalog.streaming.warehouse = s3a:///redpanda-iceberg-catalog # You may need to configure additional properties based on your object storage provider. # See https://iceberg.apache.org/docs/latest/spark-configuration/#spark-configuration and https://spark.apache.org/docs/latest/configuration.html # For example, for AWS S3: # spark.hadoop.fs.s3.impl = org.apache.hadoop.fs.s3a.S3AFileSystem # spark.hadoop.fs.s3a.endpoint = http:// # spark.sql.catalog.streaming.s3.endpoint = http:// ``` > 📝 **NOTE** > > Redpanda recommends setting credentials in environment variables so Spark can securely access your Iceberg data in object storage. For example, for AWS, use `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. Depending on your processing engine, you may need to also create a new table to point the data lakehouse to the table location. ### [](#specify-metadata-location)Specify metadata location The base path for the filesystem-based catalog if using the `object_storage` catalog type is `redpanda-iceberg-catalog`. > 💡 **TIP** > > For an end-to-end example of using the filesystem-based catalog to access Iceberg topics, see the [Getting Started with Iceberg Topics on Redpanda BYOC](https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup) blog post. ## [](#next-steps)Next steps - [Query Iceberg Topics](../query-iceberg-topics/) - [Query Iceberg Topics using AWS Glue](../iceberg-topics-aws-glue/) - [Query Iceberg Topics using Databricks and Unity Catalog](../iceberg-topics-databricks-unity/) - [Query Iceberg Topics using Snowflake and Open Catalog](../redpanda-topics-iceberg-snowflake-catalog/) --- # Page 440: Upgrades and Maintenance **URL**: https://docs.redpanda.com/redpanda-cloud/manage/maintenance.md --- # Upgrades and Maintenance --- title: Upgrades and Maintenance latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: maintenance page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: maintenance.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/maintenance.adoc description: Learn how Redpanda Cloud manages maintenance operations. page-git-created-date: "2025-03-11" page-git-modified-date: "2026-03-12" --- As a fully-managed service, the Redpanda Cloud [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) handles all maintenance operations, such as upgrades to your software and infrastructure. Here, _control plane_ refers to the Redpanda Cloud managed service that orchestrates cluster operations, not the Kubernetes control plane. For BYOC and Dedicated deployments, Redpanda manages all maintenance operations for the underlying infrastructure and Kubernetes, ensuring high availability. This includes Kubernetes version upgrades (both the Kubernetes control plane and worker nodes), security patches, and VM image updates. You do not need to act on Kubernetes end-of-life or deprecation notices from your cloud provider (for example, EKS, GKE, or AKS version warnings). Redpanda handles these upgrades on your behalf, targeting completion before the Kubernetes version reaches end of life. Redpanda runs maintenance operations on clusters in a rolling fashion, accompanied by a series of health checks, so there is no disruption to the availability of your service. As part of the Kafka protocol, recycling nodes triggers client connections to be restarted. All mainstream client libraries support automatic reconnections when a restart occurs. ## [](#maintenance-windows)Maintenance windows Redpanda Cloud may run maintenance operations on any day, at any time. You can override this default and schedule a specific maintenance window on your cluster’s **Cluster settings** page. If you select a **Scheduled** maintenance window, then Redpanda Cloud runs operations on the day and time specified. Maintenance windows typically take six hours. All operations begin during your maintenance window, but some operations may complete after the window closes. All times are in Coordinated Universal Time (UTC). > 💡 **TIP** > > Redpanda Cloud maintenance cycles always start on Tuesdays. Clusters scheduled for maintenance on Tuesdays are updated first, and clusters scheduled on Mondays are updated last. Keep this in mind when sequencing updates for multiple clusters. ## [](#minor-upgrades)Minor upgrades During your defined maintenance window, Redpanda Cloud runs minor upgrades. Minor upgrades include standard Redpanda state changes that clients handle gracefully, such as leader elections. | Category | Details | | --- | --- | | Impact | Minimal. | | Examples | Patches to known issues.Cluster rolling restart.Upgrade Redpanda to a fully backward-compatible version. | | Frequency | Minor upgrades could happen multiple times a day. | | Communication | Prior communication happens only if necessary.There could be email notifications, updated documentation, release notes, or communication from your Redpanda account team. | | Timing | At Redpanda’s discretion during your defined maintenance window. | ## [](#major-upgrades)Major upgrades Major upgrades may require code changes to customer applications, such as Kafka clients or API integrations. | Category | Details | | --- | --- | | Impact | Potentially large. | | Examples | Upgrade Kafka to a version that is not fully backward-compatible with the previous version.Update an API version.Security update that materially changes cluster or client throughput. | | Frequency | Rare. | | Communication | Email notifications may be sent to registered users with details about the change and available options.There could be updated documentation, release notes, and communication from your Redpanda account team. | | Timing | Major upgrades may be coordinated with customers, but the final date set by Redpanda is not negotiable. | ## [](#deprecations)Deprecations Deprecations indicate future removal of features that you can currently use. There is no guarantee of equivalent functionality in new versions. Deprecations could be included in major upgrades. | Category | Details | | --- | --- | | Impact | Potentially large, if you depend on the feature being deprecated. | | Examples | Remove a feature from the UI.Shut down an API version.Remove a connector as an option. | | Frequency | Rare. | | Communication | Email notifications may be sent to registered users with details about the change and available alternatives.There could be updated documentation, release notes, and communication from your Redpanda account team. | | Timing | At Redpanda’s discretion. | See also: [Cloud API Deprecation Policy](/api/doc/cloud-controlplane/topic/topic-deprecation-policy) ### [](#deprecated-features)Deprecated features | Deprecated in | Feature | Details | | --- | --- | --- | | November 2025 | Subset of TLS v1.2 cipher suites | The following TLSv1.2 cipher suites will no longer be used for managed services such as Schema Registry, HTTP Proxy, and Kafka API:AES128-GCM-SHA256AES256-GCM-SHA384ECDHE-RSA-AES128-SHAAES128-SHAAES128-CCMECDHE-RSA-AES256-SHAAES256-SHAAES256-CCMSee also: Cloud API Deprecation Policy | | May 2025 | Cloud API beta versions | The Cloud Control Plane API versions v1beta1 and v1beta2, and Data Plane API versions v1alpha1 and v1alpha2 are deprecated. These Cloud API versions will be removed in a future release and are not recommended for use.The deprecation timeline is:Announcement date: May 27, 2025End-of-support date: November 28, 2025Retirement date: May 28, 2026See the Cloud API Deprecation Policy for more information. | | March 2025 | Serverless Standard | For a better customer experience, the Serverless Standard and Serverless Pro products merged into a single offering. Serverless Standard is deprecated.All existing Serverless Standard clusters will be migrated to the new Serverless platform (with higher usage limits, 99.9% SLA, and additional regions) on August 31, 2025.Retirement date: August 30, 2025 | | February 2025 | Private Service Connect v1 | The Redpanda GCP Private Service Connect v2 service provides the ability to allow requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs.To check the version of your Private Service Connect attachment, run:gcloud compute service-attachments list --filter="region:( ${GCP_REGION} )"The attachment name should show the suffix psc2; for example, projects/my-gcp-project/regions/us-west1/serviceAttachments/rp-d0f0mqk5ktzznib2j9g-psc2. If the name shows the suffix psc, then you have the deprecated version. To upgrade, contact Redpanda Support. | --- # Page 441: Monitor Redpanda Cloud **URL**: https://docs.redpanda.com/redpanda-cloud/manage/monitor-cloud.md --- # Monitor Redpanda Cloud --- title: Monitor Redpanda Cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: monitor-cloud page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: monitor-cloud.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/monitor-cloud.adoc description: Learn how to configure monitoring on your BYOC or Dedicated cluster to maintain system health and optimize performance. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-12-11" --- You can configure monitoring on your cluster to maintain system health and optimize performance. You can monitor Redpanda with [Prometheus](https://prometheus.io/) or with any other monitoring and alerting tool, such as Datadog, New Relic, Elastic Cloud, Google Cloud, or Azure. Redpanda Cloud exports Redpanda metrics for all brokers and connectors from a single OpenMetrics endpoint. This endpoint can be found on the **Overview** page for your cluster, under **How to connect** and **Prometheus**. > 📝 **NOTE** > > - To maximize performance, Redpanda exports some metrics only when the underlying feature is in use. For example, a metric for consumer groups, [`redpanda_kafka_consumer_group_committed_offset`](../../reference/public-metrics-reference/#redpanda_kafka_consumer_group_committed_offset), is only exported when groups are registered. > > - Operating system-level and node-level metrics (such as CPU, memory, disk, and network usage) are not available through this endpoint. For infrastructure monitoring, use your cloud provider’s native monitoring tools (such as Azure Monitor, AWS CloudWatch, or Google Cloud Monitoring). ## [](#configure-redpanda-monitoring)Configure Redpanda monitoring To monitor a Redpanda Cloud cluster: 1. On the Redpanda Cloud **Overview** page for your cluster, under **How to connect**, click the **Prometheus** tab. 2. Click the copy icon for **Prometheus YAML** to copy the contents to your clipboard. The YAML contains the Prometheus scrape target configuration, as well as authentication, for the cluster. ![How to connect screenshot](../../shared/_images/cloud_metrics.png) ```yaml - job_name: redpandaCloud-sample static_configs: - targets: - console-..byoc.cloud.redpanda.com metrics_path: /api/cloud/prometheus/public_metrics basic_auth: username: prometheus password: "" scheme: https ``` 3. Save these settings to Prometheus or another monitoring tool, replacing the following placeholders: - `.`: ID and identifier from the HTTPS endpoint. - ``: Copy and paste the onscreen Prometheus password. ## [](#configure-datadog)Configure Datadog To monitor a BYOC or Dedicated cluster in [Datadog](https://www.datadoghq.com/): 1. Follow the steps to configure Redpanda monitoring. 2. In Datadog, define the `openmetrics_endpoint` URL for that monitored cluster. The integration configuration should look similar to the following: ```yaml instances: # The endpoint to collect metrics from. - openmetrics_endpoint: https://console-..fmc.cloud.redpanda.com/api/cloud/prometheus/public_metrics use_openmetrics: true collect_counters_with_distributions: true auth_type: basic username: prom_user password: prom_pass ``` 3. Restart the Datadog agent. > 📝 **NOTE** > > Because the OpenMetrics endpoint in Redpanda Cloud aggregates Redpanda metrics for all cluster services, only a single Datadog agent is required. The agent must run in a container in your own container infrastructure. Redpanda does not support launching this container inside a Dedicated or BYOC Kubernetes cluster. For more information, see the [Datadog documentation](https://docs.datadoghq.com/integrations/redpanda/?tab=host) and [Redpanda Datadog integration](https://github.com/DataDog/integrations-extras/tree/master/redpanda). ## [](#use-redpanda-monitoring-examples)Use Redpanda monitoring examples For hands-on learning, Redpanda provides a repository with examples of monitoring Redpanda with Prometheus and Grafana: [redpanda-data/observability](https://github.com/redpanda-data/observability/tree/main/cloud). ![Example Redpanda Ops Dashboard^](https://github.com/redpanda-data/observability/blob/main/docs/images/Ops%20Dashboard.png?raw=true) It includes [example Grafana dashboards](https://github.com/redpanda-data/observability#grafana-dashboards) and a [sandbox environment](https://github.com/redpanda-data/observability#sandbox-environment) in which you launch a Dockerized Redpanda cluster and create a custom workload to monitor with dashboards. > 💡 **TIP** > > Use [`rpk generate grafana-dashboard`](../../reference/rpk/rpk-generate/rpk-generate-grafana-dashboard/) to generate a sample dashboard from the examples repository that you can import into a Grafana instance. > > For example, to generate the sample Serverless dashboard, run: > > ```bash > rpk generate grafana-dashboard --dashboard serverless > ``` ## [](#monitor-health-and-performance)Monitor health and performance This section provides guidelines and example queries using Redpanda’s public metrics to optimize your system’s performance and monitor its health. To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines. > 💡 **TIP** > > For counter type metrics, a broker restart causes the count to reset to zero in tools like Prometheus and Grafana. Redpanda recommends wrapping counter metrics in a rate query to account for broker restarts, for example: > > ```promql > rate(redpanda_kafka_records_produced_total[5m]) > ``` ### [](#redpanda-architecture)Redpanda architecture Understanding the unique aspects of Redpanda’s architecture and data path can improve your performance, debugging, and tuning skills: - Redpanda replicates partitions across brokers in a cluster using [Raft](https://raft.github.io/), where each partition is a Raft consensus group. A message written from the Kafka API flows down to the Raft implementation layer that eventually directs it to a broker to be stored. Metrics about the Raft layer can reveal the health of partitions and data flowing within Redpanda. - Redpanda is designed with a [thread-per-core](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#thread-per-core) model that it implements with the [Seastar](https://seastar.io/) library. With each application thread pinned to a CPU core, when observing or analyzing the behavior of a specific application, monitor the relevant metrics with the label for the specific [shard](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#shard), if available. ### [](#infrastructure-resources)Infrastructure resources The underlying infrastructure of your system should have sufficient margins to handle peaks in processing, storage, and I/O loads. Monitor infrastructure health with the following queries. #### [](#cpu-usage)CPU usage For the total CPU uptime, monitor [`redpanda_uptime_seconds_total`](../../reference/public-metrics-reference/#redpanda_uptime_seconds_total). Monitoring its rate of change with the following query can help detect unexpected dips in uptime: ```promql rate(redpanda_uptime_seconds_total[5m]) ``` For the total CPU busy (non-idle) time, monitor [`redpanda_cpu_busy_seconds_total`](../../reference/public-metrics-reference/#redpanda_cpu_busy_seconds_total). To detect unexpected idling, you can query the rate of change as a fraction of the shard that is in use at a given point in time. ```promql rate(redpanda_cpu_busy_seconds_total[5m]) ``` > 💡 **TIP** > > While CPU utilization at the host-level might appear high (for example, 99-100% utilization) when I/O events like message arrival occur, the actual Redpanda process utilization is likely low. System-level metrics such as those provided by the `top` command can be misleading. > > This high host-level CPU utilization happens because Redpanda uses Seastar, which runs event loops on every core (also referred to as a _reactor_), constantly polling for the next task. This process never blocks and will increment clock ticks. It doesn’t necessarily mean that Redpanda is busy. > > Use [`redpanda_cpu_busy_seconds_total`](../../reference/public-metrics-reference/#redpanda_cpu_busy_seconds_total) to monitor the actual Redpanda CPU utilization. When it indicates close to 100% utilization over a given period of time, make sure to also monitor produce and consume [latency](#latency) as they may then start to increase as a result of resources becoming overburdened. #### [](#memory-availability-and-pressure)Memory availability and pressure To monitor memory, use [`redpanda_memory_available_memory`](../../reference/public-metrics-reference/#redpanda_memory_available_memory), which includes both free memory and reclaimable memory from the batch cache. This provides a more accurate picture than using allocated memory alone, since allocated does not include reclaimable cache memory. To monitor the fraction of memory available: ```promql min(redpanda_memory_available_memory / (redpanda_memory_free_memory + redpanda_memory_allocated_memory)) ``` To monitor memory pressure (fraction of memory being used), which may be more intuitive for alerting: ```promql min(redpanda_memory_available_memory / redpanda_memory_allocated_memory) ``` You can also monitor the lowest available memory available since the process started to understand historical memory pressure: ```promql min(redpanda_memory_available_memory_low_water_mark / (redpanda_memory_free_memory + redpanda_memory_allocated_memory)) ``` #### [](#disk-used)Disk used To monitor the fraction of disk consumed, use a formula with [`redpanda_storage_disk_free_bytes`](../../reference/public-metrics-reference/#redpanda_storage_disk_free_bytes) and [`redpanda_storage_disk_total_bytes`](../../reference/public-metrics-reference/#redpanda_storage_disk_total_bytes): ```promql 1 - (sum(redpanda_storage_disk_free_bytes) / sum(redpanda_storage_disk_total_bytes)) ``` Also monitor [`redpanda_storage_disk_free_space_alert`](../../reference/public-metrics-reference/#redpanda_storage_disk_free_space_alert) for an alert when available disk space is low or degraded. #### [](#iops)IOPS For read and write I/O operations per second (IOPS), monitor the [`redpanda_io_queue_total_read_ops`](../../reference/public-metrics-reference/#redpanda_io_queue_total_read_ops) and [`redpanda_io_queue_total_write_ops`](../../reference/public-metrics-reference/#redpanda_io_queue_total_write_ops) counters: ```promql rate(redpanda_io_queue_total_read_ops[5m]), rate(redpanda_io_queue_total_write_ops[5m]) ``` ### [](#throughput)Throughput While maximizing the rate of messages moving from producers to brokers then to consumers depends on tuning each of those components, the total throughput of all topics provides a system-level metric to monitor. When you observe abnormal, unhealthy spikes or dips in producer or consumer throughput, look for correlation with changes in the number of active connections ([`redpanda_rpc_active_connections`](../../reference/public-metrics-reference/#redpanda_rpc_active_connections)) and logged errors to drill down to the root cause. The total throughput of a cluster can be measured by the producer and consumer rates across all topics. To observe the total producer and consumer rates of a cluster, monitor [`redpanda_rpc_received_bytes`](../../reference/public-metrics-reference/#redpanda_rpc_received_bytes) for producer traffic and [`redpanda_rpc_sent_bytes`](../../reference/public-metrics-reference/#redpanda_rpc_sent_bytes) for consumer traffic. Filter both metrics using the `redpanda_server` label with the value `kafka`. #### [](#producer-throughput)Producer throughput For the produce rate, create a query to get the produce rate across all topics: ```promql rate(redpanda_rpc_received_bytes{redpanda_server="kafka"}[$__rate_interval]) ``` #### [](#consumer-throughput)Consumer throughput For the consume rate, create a query to get the total consume rate across all topics: ```promql rate(redpanda_rpc_sent_bytes{redpanda_server="kafka"}[$__rate_interval]) ``` #### [](#identify-high-throughput-clients)Identify high-throughput clients Use [`rpk cluster connections list`](../../reference/rpk/rpk-cluster/rpk-cluster-connections-list/) or the [`GET /v1/monitoring/kafka/connections`](/api/doc/cloud-dataplane/operation/operation-monitoringservice_listkafkaconnections) endpoint in the Data Plane API to identify which client connections are driving the majority of, or the change in, the produce or consume throughput for a cluster. For example, to list connections with a current produce throughput greater than 1MB, run: ##### rpk ```bash rpk cluster connections list --filter-raw="recent_request_statistics.produce_bytes > 1000000" --order-by="recent_request_statistics.produce_bytes desc" ``` ```bash UID STATE USER CLIENT-ID IP:PORT NODE SHARD OPEN-TIME IDLE PROD-TPUT/SEC FETCH-TPUT/SEC REQS/MIN b20601a3-624c-4a8c-ab88-717643f01d56 OPEN UNAUTHENTICATED perf-producer-client 127.0.0.1:55012 0 0 9s 0s 78.9MB 0B 292 ``` ##### Data Plane API ```bash curl \ --request GET 'https:///v1/monitoring/kafka/connections' \ --header "Authorization: Bearer $ACCESS_TOKEN" \ --data '{"filter":"recent_request_statistics.produce_bytes > 1000000", "order_by":"recent_request_statistics.produce_bytes desc"}' | jq ``` Show example API response ```bash { "connections": [ { "node_id": 0, "shard_id": 0, "uid": "b20601a3-624c-4a8c-ab88-717643f01d56", "state": "KAFKA_CONNECTION_STATE_OPEN", "open_time": "2025-10-15T14:15:15.755065000Z", "close_time": "1970-01-01T00:00:00.000000000Z", "authentication_info": { "state": "AUTHENTICATION_STATE_UNAUTHENTICATED", "mechanism": "AUTHENTICATION_MECHANISM_UNSPECIFIED", "user_principal": "" }, "listener_name": "", "tls_info": { "enabled": false }, "source": { "ip_address": "127.0.0.1", "port": 55012 }, "client_id": "perf-producer-client", "client_software_name": "apache-kafka-java", "client_software_version": "3.9.0", "transactional_id": "my-tx-id", "group_id": "", "group_instance_id": "", "group_member_id": "", "api_versions": { "18": 4, "22": 3, "3": 12, "24": 3, "0": 7 }, "idle_duration": "0s", "in_flight_requests": { "sampled_in_flight_requests": [ { "api_key": 0, "in_flight_duration": "0.000406892s" } ], "has_more_requests": false }, "total_request_statistics": { "produce_bytes": "78927173", "fetch_bytes": "0", "request_count": "4853", "produce_batch_count": "4849" }, "recent_request_statistics": { "produce_bytes": "78927173", "fetch_bytes": "0", "request_count": "4853", "produce_batch_count": "4849" } } ] } ``` You can adjust the filter and sorting criteria as necessary. ### [](#latency)Latency Latency should be consistent between produce and fetch sides. It should also be consistent over time. Take periodic snapshots of produce and fetch latencies, including at upper percentiles (95%, 99%), and watch out for significant changes over a short duration. In Redpanda, the latency of produce and fetch requests includes the latency of inter-broker RPCs that are born from Redpanda’s internal implementation using Raft. #### [](#kafka-consumer-latency)Kafka consumer latency To monitor Kafka consumer request latency, use the [`redpanda_kafka_request_latency_seconds`](../../reference/public-metrics-reference/#redpanda_kafka_request_latency_seconds) histogram with the label `redpanda_request="consume"`. For example, create a query for the 99th percentile: ```promql histogram_quantile(0.99, sum(rate(redpanda_kafka_request_latency_seconds_bucket{redpanda_request="consume"}[5m])) by (le, provider, region, instance, namespace, pod)) ``` You can monitor the rate of Kafka consumer requests using `redpanda_kafka_request_latency_seconds_count` with the `redpanda_request="consume"` label: rate(redpanda\_kafka\_request\_latency\_seconds\_count{redpanda\_request="consume"}\[5m\]) #### [](#kafka-producer-latency)Kafka producer latency To monitor Kafka producer request latency, use the [`redpanda_kafka_request_latency_seconds`](../../reference/public-metrics-reference/#redpanda_kafka_request_latency_seconds) histogram with the `redpanda_request="produce"` label. For example, create a query for the 99th percentile: ```promql histogram_quantile(0.99, sum(rate(redpanda_kafka_request_latency_seconds_bucket{redpanda_request="produce"}[5m])) by (le, provider, region, instance, namespace, pod)) ``` You can monitor the rate of Kafka producer requests with `redpanda_kafka_request_latency_seconds_count` with the `redpanda_request="produce"` label: ```promql rate(redpanda_kafka_request_latency_seconds_count{redpanda_request="produce"}[5m]) ``` #### [](#internal-rpc-latency)Internal RPC latency To monitor Redpanda internal RPC latency, use the [`redpanda_rpc_request_latency_seconds`](../../reference/public-metrics-reference/#redpanda_rpc_request_latency_seconds) histogram with the `redpanda_server="internal"` label. For example, create a query for the 99th percentile latency: ```promql histogram_quantile(0.99, (sum(rate(redpanda_rpc_request_latency_seconds_bucket{redpanda_server="internal"}[5m])) by (le, provider, region, instance, namespace, pod))) ``` You can monitor the rate of internal RPC requests with [`redpanda_rpc_request_latency_seconds`](../../reference/public-metrics-reference/#redpanda_rpc_request_latency_seconds) histogram’s count: ```promql rate(redpanda_rpc_request_latency_seconds_count[5m]) ``` ### [](#partition-health)Partition health The health of Kafka partitions often reflects the health of the brokers that host them. Thus, when alerts occur for conditions such as under-replicated partitions or more frequent leadership transfers, check for unresponsive or unavailable brokers. With Redpanda’s internal implementation of the Raft consensus protocol, the health of partitions is also reflected in any errors in the internal RPCs exchanged between Raft peers. #### [](#leadership-changes)Leadership changes Stable clusters have a consistent balance of leaders across all brokers, with few to no leadership transfers between brokers. To observe changes in leadership, monitor the [`redpanda_raft_leadership_changes`](../../reference/public-metrics-reference/#redpanda_raft_leadership_changes) counter. For example, use a query to get the total rate of increase of leadership changes for a cluster: ```promql sum(rate(redpanda_raft_leadership_changes[5m])) ``` #### [](#under-replicated-partitions)Under-replicated partitions A healthy cluster has partition data fully replicated across its brokers. An under-replicated partition is at higher risk of data loss. It also adds latency because messages must be replicated before being committed. To know when a partition isn’t fully replicated, create an alert for the [`redpanda_kafka_under_replicated_replicas`](../../reference/public-metrics-reference/#redpanda_kafka_under_replicated_replicas) gauge when it is greater than zero: ```promql redpanda_kafka_under_replicated_replicas > 0 ``` Under-replication can be caused by unresponsive brokers. When an alert on `redpanda_kafka_under_replicated_replicas` is triggered, identify the problem brokers and examine their logs. #### [](#leaderless-partitions)Leaderless partitions A healthy cluster has a leader for every partition. A partition without a leader cannot exchange messages with producers or consumers. To identify when a partition doesn’t have a leader, create an alert for the [`redpanda_cluster_unavailable_partitions`](../../reference/public-metrics-reference/#redpanda_cluster_unavailable_partitions) gauge when it is greater than zero: ```promql redpanda_cluster_unavailable_partitions > 0 ``` Leaderless partitions can be caused by unresponsive brokers. When an alert on `redpanda_cluster_unavailable_partitions` is triggered, identify the problem brokers and examine their logs. #### [](#raft-rpcs)Raft RPCs Redpanda’s Raft implementation exchanges periodic status RPCs between a broker and its peers. The [`redpanda_node_status_rpcs_timed_out`](../../reference/public-metrics-reference/#redpanda_node_status_rpcs_timed_out) gauge increases when a status RPC times out for a peer, which indicates that a peer may be unresponsive and may lead to problems with partition replication that Raft manages. Monitor for non-zero values of this gauge, and correlate it with any logged errors or changes in partition replication. ### [](#consumers)Consumer group lag Consumer group lag is an important performance indicator that measures the difference between the broker’s latest (max) offset and the consumer group’s last committed offset. The lag indicates how current the consumed data is relative to real-time production. A high or increasing lag means that consumers are processing messages slower than producers are generating them. A decreasing or stable lag implies that consumers are keeping pace with producers, ensuring real-time or near-real-time data consumption. By monitoring consumer lag, you can identify performance bottlenecks and make informed decisions about scaling consumers, tuning configurations, and improving processing efficiency. A high maximum lag may indicate that a consumer is experiencing connectivity problems or cannot keep up with the incoming workload. A high or increasing total lag (lag sum) suggests that the consumer group lacks sufficient resources to process messages at the rate they are produced. In such cases, scaling the number of consumers within the group can help, but only up to the number of partitions available in the topic. If lag persists despite increasing consumers, repartitioning the topic may be necessary to distribute the workload more effectively and improve processing efficiency. Redpanda provides the following methods for monitoring consumer group lag: - [Dedicated gauges](#dedicated-gauges): Redpanda brokers can internally calculate consumer group lag and expose two dedicated gauges. This method is recommended for environments where your observability platform does not support complex queries required to calculate the lag from offset metrics. Enabling these gauges may add a small amount of additional processing overhead to the brokers. - [Offset-based calculation](#offset-based-calculation): You can use your observability platform to calculate consumer group lag from offset metrics. Use this method if your observability platform supports functions, such as `max()`, and you prefer to avoid additional processing overhead on the broker. #### [](#dedicated-gauges)Dedicated gauges Redpanda can internally calculate consumer group lag and expose it as two dedicated gauges. - [`redpanda_kafka_consumer_group_lag_max`](../../reference/public-metrics-reference/#redpanda_kafka_consumer_group_lag_max): Reports the maximum lag observed among all partitions for a consumer group. This metric helps pinpoint the partition with the greatest delay, indicating potential performance or configuration issues. - [`redpanda_kafka_consumer_group_lag_sum`](../../reference/public-metrics-reference/#redpanda_kafka_consumer_group_lag_sum): Aggregates the lag across all partitions, providing an overall view of data consumption delay for the consumer group. To enable these dedicated gauges, you must enable consumer group metrics in your cluster properties. Add the following to your Redpanda configuration: - [`enable_consumer_group_metrics`](../../reference/properties/cluster-properties/#enable_consumer_group_metrics): A list of properties to enable for consumer group metrics. You must add the `consumer_lag` property to enable consumer group lag metrics. Set this value equal to the scrape interval of your metrics collection system. Aligning these intervals ensures synchronized data collection, reducing the likelihood of missing or misaligned lag measurements. For example: ```bash rpk cluster config set enable_consumer_group_metrics '["group", "partition", "consumer_lag"]' ``` When these properties are enabled, Redpanda computes and exposes the `redpanda_kafka_consumer_group_lag_max` and `redpanda_kafka_consumer_group_lag_sum` gauges to the `/public_metrics` endpoint. #### [](#offset-based-calculation)Offset-based calculation If your environment is sensitive to the performance overhead of the [dedicated gauges](#dedicated-gauges), use the offset-based calculation method to calculate consumer group lag. This method requires your observability platform to support functions like `max()`. Redpanda provides two metrics to calculate consumer group lag: - [`redpanda_kafka_max_offset`](../../reference/public-metrics-reference/#redpanda_kafka_max_offset): The broker’s latest offset for a partition. - [`redpanda_kafka_consumer_group_committed_offset`](../../reference/public-metrics-reference/#redpanda_kafka_consumer_group_committed_offset): The last committed offset for a consumer group on that partition. For example, here’s a typical query to compute consumer lag: ```promql max by(redpanda_namespace, redpanda_topic, redpanda_partition)(redpanda_kafka_max_offset{redpanda_namespace="kafka"}) - on(redpanda_topic, redpanda_partition) group_right max by(redpanda_group, redpanda_topic, redpanda_partition)(redpanda_kafka_consumer_group_committed_offset) ``` ### [](#services)Services Monitor the health of specific Redpanda services with the following metrics. #### [](#schema-registry)Schema Registry Schema Registry request latency: ```promql histogram_quantile(0.99, (sum(rate(redpanda_schema_registry_request_latency_seconds_bucket[5m])) by (le, provider, region, instance, namespace, pod))) ``` Schema Registry request rate: ```promql rate(redpanda_schema_registry_request_latency_seconds_count[5m]) + sum without(redpanda_status)(rate(redpanda_schema_registry_request_errors_total[5m])) ``` Schema Registry request error rate: ```promql rate(redpanda_schema_registry_request_errors_total[5m]) ``` #### [](#rest-proxy)REST proxy REST proxy request latency: ```promql histogram_quantile(0.99, (sum(rate(redpanda_rest_proxy_request_latency_seconds_bucket[5m])) by (le, provider, region, instance, namespace, pod))) ``` REST proxy request rate: ```promql rate(redpanda_rest_proxy_request_latency_seconds_count[5m]) + sum without(redpanda_status)(rate(redpanda_rest_proxy_request_errors_total[5m])) ``` REST proxy request error rate: ```promql rate(redpanda_rest_proxy_request_errors_total[5m]) ``` ### [](#data-transforms)Data transforms See [Monitor Data Transforms](../../develop/data-transforms/monitor/). ## [](#references)References - [Metrics Reference](../../reference/public-metrics-reference/) --- # Page 442: Mountable Topics **URL**: https://docs.redpanda.com/redpanda-cloud/manage/mountable-topics.md --- # Mountable Topics --- title: Mountable Topics latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: mountable-topics page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: mountable-topics.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/mountable-topics.adoc description: Safely attach and detach Tiered Storage topics to and from a cluster. page-git-created-date: "2024-12-04" page-git-modified-date: "2025-04-08" --- For topics with Tiered Storage enabled, you can unmount a topic to safely detach it from a cluster and keep the topic data in the cluster’s object storage bucket or container. You can remount the detached topic to the origin cluster, allowing you to hibernate a topic and free up system resources taken up by the topic. ## [](#prerequisites)Prerequisites [Install `rpk`](../rpk/rpk-install/) or [authenticate](/api/doc/cloud-dataplane/authentication) to the Cloud API. If using the API, make sure that you have the correct [Data Plane API URL](../api/cloud-dataplane-api/#get-data-plane-api-url). ## [](#unmount-a-topic-from-a-cluster-to-object-storage)Unmount a topic from a cluster to object storage When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive a message in the protocol replies indicating that the topic is no longer available: - Produce requests receive an `invalid_topic_exception` or `resource_is_being_migrated` response from the broker. - Consume requests receive an `invalid_topic_exception` response from the broker. An unmounted topic in object storage is detached from all clusters. The original cluster releases ownership of the topic. > 📝 **NOTE** > > The unmounted topic is deleted in the source cluster, but can be mounted back again from object storage. ### rpk In your cluster, run this command to unmount a topic to object storage: ```none rpk cluster storage unmount / ``` ### Cloud API To unmount topics from a cluster using the Cloud API, issue a POST request to the `/v1alpha2/cloud-storage/unmount` endpoint. Specify the names of the desired topics in the request body: ```bash curl -X POST "/v1alpha2/cloud-storage/topics/unmount" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"topics":""}' ``` You can use the ID returned by the command to [monitor the progress](#monitor-progress) of the unmount operation using `rpk` or the API. ## [](#mount-a-topic-to-a-cluster)Mount a topic to a cluster ### rpk 1. In your target cluster, run this command to list the topics that are available to mount from object storage: ```none rpk cluster storage list-mountable ``` The command output returns a `LOCATION` value in the format `//`. Redpanda assigns an `initial-revision` number to a topic upon creation. The location value uniquely identifies a topic in object storage if multiple topics had the same name when they were unmounted from different origin clusters. For example: ```none TOPIC NAMESPACE LOCATION testtopic kafka testtopic/67f5505a-32f3-4677-bcad-3c75a1a702a6/10 ``` You can use the location as the topic reference instead of just the topic name to uniquely identify a topic to mount in the next step. 2. Mount a topic from object storage: ```none rpk cluster storage mount ``` Replace `` with the name of the topic to mount. If there are multiple topics wih the same name in object storage, you are required to use the location value from `rpk cluster storage list-mountable` to uniquely identify a topic. You can also specify a new name for the topic as you mount it to the target cluster: ```none rpk cluster storage mount --to ``` You only use the new name for the topic in the target cluster. This name does not persist if you unmount this topic again. Redpanda keeps the original name in object storage if you remount the topic later. ### Cloud API 1. List the topics that are available to mount from object storage by making a GET request to the `/v1alpha2/cloud-storage/topics/mountable` endpoint. ```none curl "/v1alpha2/cloud-storage/topics/mountable" ``` The response object contains an array of topics: ```bash "topics": [ { "name": "topic-1-name", "topic_location": "topic-1-name//" }, { "name": "topic-2-name", "topic_location": "topic-2-name//" } ] ``` The `topic_location` is the unique topic location in object storage, in the format `//`. Redpanda assigns the number `initial-revision` to a topic upon creation. You can use `topic-location` as the topic reference instead of just the topic name to identify a unique topic to mount in the next step. 2. To mount topics to a target cluster using the Cloud API, make a POST request to the `/cloud-storage/topics/mount` endpoint. Specify the names of the topics in the request body: ```none curl -X POST "/v1alpha2/cloud-storage/topics/mount" -d { "topics": [ { "alias": "", "source_topic_reference": "//" }, { "source_topic_reference": "" } ] } ``` - You may have multiple topics with the same name that are available to mount from object storage. This can happen if you have unmounted topics with this name from different clusters. To uniquely identify a source topic, use `//` as the topic reference. - To rename a topic in the target cluster, use the optional `alias` object in the request body. The following example shows how to specify a new name for topic 1 in the target cluster, while topic 2 retains its original name in the target cluster. You can use the ID returned by the operation to [monitor its progress](#monitor-progress) using `rpk` or the API. When the mount operation is complete, the target cluster handles produce and consume workloads for the topics. ## [](#monitor-progress)Monitor progress ### rpk To list active mount and unmount operations, run the command: ```none rpk cluster storage list-mount ``` ### Cloud API Issue a GET request to the `/cloud-storage/mount-tasks` endpoint to view the status of topic mount and unmount operations: ```bash curl "/v1alpha2/cloud-storage/mount-tasks" \ -H "Authorization: Bearer " \ -H "accept: application/json" ``` You can also retrieve the status of a specific operation by running the command: ### rpk ```none rpk cluster storage status-mount ``` ### Cloud API ```bash curl "/v1alpha2/cloud-storage/mount-tasks/" \ -H "Authorization: Bearer " ``` `` is the unique identifier of the operation. Redpanda returns this ID when you start a mount or unmount. You can also retrieve the ID by listing [existing operations](#monitor-progress). The response returns the IDs and state of existing mount and unmount operations ("migrations"): | State | Unmount operation (outbound) | Mount operation (inbound) | | --- | --- | --- | | planned | Redpanda validates the mount or unmount operation definition. | | preparing | Redpanda flushes topic data, including topic manifests, to the destination bucket or container in object storage. | Redpanda recreates the topics in a disabled state in the target cluster. The cluster allocates partitions but does not add log segments yet. Topic metadata is populated from the topic manifests found in object storage. | | prepared | The operation is ready to execute. In this state, the cluster still accepts client reads and writes for the topics. | Topics exist in the cluster but clients do not yet have access to consume or produce. | | executing | The cluster rejects client reads and writes for the topics. Redpanda uploads any remaining topic data that has not yet been copied to object storage. Uncommitted transactions involving the topic are aborted. | The target cluster checks that the topic to be mounted has not already been mounted in any cluster. | | executed | All unmounted topic data from the cluster is available in object storage. | The target cluster has verified that the topic has not already been mounted. | | cut_over | Redpanda deletes topic metadata from the cluster, and marks the data in object storage as available for mount operations. | The topic data in object storage is no longer available to mount to any clusters. | | finished | The operation is complete. | The operation is complete. The target cluster starts to handle produce and consume requests. | | canceling | Redpanda is in the process of canceling the mount or unmount operation. | | cancelled | The mount or unmount operation is cancelled. | ## [](#cancel-a-mount-or-unmount-operation)Cancel a mount or unmount operation You can cancel a topic mount or unmount by running the command: ### rpk ```none rpk cluster storage cancel-mount ``` ### Cloud API ```bash curl -X POST "/v1alpha2/cloud-storage/mount-tasks/" \ -H "Authorization: Bearer " \ -H "accept: application/json" \ -H "content-type: application/json" \ -d '{"action":"ACTION_CANCEL"}' ``` You cannot cancel mount and unmount operations in the following [states](#monitor-progress): - `planned` (but you may still delete a planned mount or unmount) - `cut_over` - `finished` - `canceling` - `cancelled` ## [](#additional-considerations)Additional considerations Redpanda prevents you from mounting the same topic to multiple clusters at once. This ensures that multiple clusters don’t write to the same location in object storage and corrupt the topic. If you attempt to mount a topic where the name matches a topic already in the target cluster, Redpanda fails the operation and emits a warning message in the logs. --- # Page 443: Redpanda CLI **URL**: https://docs.redpanda.com/redpanda-cloud/manage/rpk.md --- # Redpanda CLI --- title: Redpanda CLI latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/rpk/index.adoc description: The rpk tool is a single binary application that provides a way to interact with your Redpanda clusters from the command line. page-git-created-date: "2024-07-25" page-git-modified-date: "2024-08-07" --- - [Introduction to rpk](intro-to-rpk/) Learn about `rpk` and how to use it to interact with your Redpanda cluster. - [Install or Update rpk](rpk-install/) Install or update `rpk` to interact with Redpanda from the command line. - [Specify Broker Addresses for rpk](broker-admin/) Learn how and when to specify Redpanda broker addresses for `rpk` commands, so `rpk` knows where to run Kafka-related commands. - [rpk Profiles](config-rpk-profile/) Use `rpk profile` to simplify your development experience with multiple Redpanda clusters by saving and reusing configurations for different clusters. --- # Page 444: Specify Broker Addresses for rpk **URL**: https://docs.redpanda.com/redpanda-cloud/manage/rpk/broker-admin.md --- # Specify Broker Addresses for rpk --- title: Specify Broker Addresses for rpk latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/broker-admin page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/broker-admin.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/rpk/broker-admin.adoc description: Learn how and when to specify Redpanda broker addresses for rpk commands, so rpk knows where to run Kafka-related commands. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- For `rpk` to know where to run Kafka-related commands, you must provide the broker addresses for each broker of a Redpanda cluster. You can specify these addresses as IP addresses or as hostnames, using any of these methods: - Command line flag (`-X brokers`) - Environment variable setting (`RPK_BROKERS`) - Configuration file setting in `redpanda.yaml` (`rpk.kafka_api.brokers`) Command line flag settings take precedence over environment variable settings and configuration file settings. If the command line does not contain the `-X brokers` settings, the environment variable settings are used. If the environment variables are not set, the values in the configuration file are used. ## [](#command-line-flags)Command line flags Broker addresses are required for communicating with the Kafka API. Provide these addresses with the `-X brokers` flag for commands related to Kafka broker tasks, such as [`rpk topic create`](../../../reference/rpk/rpk-topic/rpk-topic-create/), [`rpk topic produce`](../../../reference/rpk/rpk-topic/rpk-topic-produce/), and [`rpk topic consume`](../../../reference/rpk/rpk-topic/rpk-topic-consume/). The following table shows which `rpk` commands require the `-X brokers` flag. | Command | Address flag required | | --- | --- | | rpk cluster info | -X brokers | | rpk cluster metadata | -X brokers | | rpk group | -X brokers | | rpk security acl | -X brokers | | rpk topic | -X brokers | ## [](#environment-variable-settings)Environment variable settings Environment variable settings last for the duration of the shell session, or until you set the variable to a different setting. Configure the environment variable `RPK_BROKERS` for broker addresses, so you don’t have to include the `-X brokers` flag each time you run an `rpk` command. For example, to configure three brokers on a single machine running on localhost: ```bash export RPK_BROKERS="192.168.72.34:9092,192.168.72.35:9092,192.168.72.36.9092" ``` ## [](#configuration-file-settings)Configuration file settings As each Redpanda broker starts up, a `redpanda.yaml` configuration file is automatically generated for that broker. This file contains a section for `rpk` settings, which includes Kafka API settings. The `kafka_api` section contains the address and port for each broker. The default address is `0.0.0.0`, and the default port is 9092. You can edit this line and replace it with the IP addresses of your Redpanda brokers. The following example shows the addresses and port numbers for three brokers. ```yaml rpk: kafka_api: brokers: - 192.168.72.34:9092 - 192.168.72.35:9092 - 192.168.72.36.9092 ``` > 📝 **NOTE** > > If you do not update the default addresses in the `redpanda.yaml` file, you must provide the required addresses on the command line or by setting the corresponding environment variable. --- # Page 445: rpk Profiles **URL**: https://docs.redpanda.com/redpanda-cloud/manage/rpk/config-rpk-profile.md --- # rpk Profiles --- title: rpk Profiles latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/config-rpk-profile page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/config-rpk-profile.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/rpk/config-rpk-profile.adoc description: Use rpk profile to simplify your development experience with multiple Redpanda clusters by saving and reusing configurations for different clusters. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-08" --- Use rpk profiles to simplify your development experience using `rpk` with multiple Redpanda clusters by saving and reusing configurations for different clusters. > 💡 **TIP** > > **rpk profiles are the recommended way to configure rpk**. They provide persistent, reusable configurations that work across sessions and are easier to manage than environment variables or command-line flags. > ⚠️ **CAUTION** > > Profile files may contain sensitive information such as passwords or SASL credentials. Do not commit `rpk.yaml` files to version control systems like Git. ## [](#about-rpk-profiles)About rpk profiles An rpk profile contains a reusable configuration for a Redpanda cluster. When running `rpk`, you can create a profile, configure it for a cluster you’re working with, and use it repeatably when running an `rpk` command for the cluster. You can create different profiles for different Redpanda clusters. For example, your local cluster, development cluster, and production cluster can each have their own profile, with all of their information managed locally by rpk. You set a unique name for each profile. A profile saves rpk-specific command properties. For details, see [Specify command properties](../intro-to-rpk/#specify-configuration-properties). All `rpk` commands can read configuration values from a profile. You pass a profile to an `rpk` command by setting the `--profile` flag. For example, the command `rpk topic produce dev-topic --profile dev` gets its configuration from the profile named `dev`. ## [](#quickstart)Quickstart Create a profile with authentication and TLS to quickly set up cluster access instead of using environment variables or connection flags: ```bash rpk profile create \ --set brokers= \ --set admin.hosts= \ --set user= \ --set pass= \ --set sasl.mechanism= \ --set tls.enabled=true \ --description "" ``` Replace `` with your desired SASL mechanism (`SCRAM-SHA-256`, `SCRAM-SHA-512`, or `PLAIN`). When you create a profile, rpk automatically switches to use that profile so you don’t need to pass `--profile` flags every time. Check the active profile: ```bash rpk profile current ``` Now all `rpk` commands use this profile automatically: ```bash rpk topic list rpk topic create ``` You can change profiles by running: ```bash rpk profile use ``` For environment variables and other configuration methods, see [rpk -X options](../../../reference/rpk/rpk-x-options/). ## [](#work-with-rpk-profiles)Work with rpk profiles The primary tasks for working with rpk profiles: - Create one or more profiles. - Choose the profile to use. - Edit or set default values across all profiles and values for a single profile. - Call an `rpk` command with a profile. - Delete unused profiles. ### [](#create-profile)Create profile To create a new profile, run [`rpk profile create`](../../../reference/rpk/rpk-profile/rpk-profile-create/): ```bash rpk profile create [flags] ``` An rpk profile can be generated from different sources: - A `redpanda.yaml` file, using the `--from-redpanda` flag. - A different rpk profile, using the `--from-profile` flag. - A Redpanda Cloud cluster, using the `--from-cloud` flag. > 📝 **NOTE** > > You must provide a profile name when creating a profile that isn’t generated from a Redpanda Cloud cluster with the `--from-cloud` flag. After the profile is created, rpk switches to the newly created profile. You can specify the configuration during creation with the `--set [key=value]` flag. To simplify configuration, the `--set` flag supports autocompletion of valid keys, suggesting key names based on their `-X` format. > 📝 **NOTE** > > You should always use and set the `--description` flag to describe your profiles. The description is printed in the output of [`rpk profile list`](../../../reference/rpk/rpk-profile/rpk-profile-list/). Created profiles are stored in an `rpk.yaml` file in a default local OS directory (for example, `~/.config/rpk/` for Linux and `~/Library/Application Support/rpk/` for MacOS). All profiles created by a developer are stored in the same `rpk.yaml` file. ### [](#choose-profile-to-use)Choose profile to use With multiple created profiles, choose the profile to use with [`rpk profile use`](../../../reference/rpk/rpk-profile/rpk-profile-use/): ```bash rpk profile use ``` ### [](#set-or-edit-configuration-values)Set or edit configuration values You can customize settings for a single profile. To set a profile’s configuration: - Use [`rpk profile set`](../../../reference/rpk/rpk-profile/rpk-profile-set/) to set `key=value` pairs of configuration options to write to the profile’s section of `rpk.yaml`. - Use [`rpk profile edit`](../../../reference/rpk/rpk-profile/rpk-profile-edit/) to edit the profile’s section of the `rpk.yaml` file in your default editor. You can configure settings that apply to all profiles. To set these `globals`: - Use [`rpk profile set-globals`](../../../reference/rpk/rpk-profile/rpk-profile-set-globals/) to set `key=value` pairs to write to the globals section of `rpk.yaml`. - Use [`rpk profile edit-globals`](../../../reference/rpk/rpk-profile/rpk-profile-edit-globals/) to edit the globals section of the `rpk.yaml` file in your default editor. > 💡 **TIP** > > For a list of all the available properties that can be set in your profile, see [`rpk -X options`](../../../reference/rpk/rpk-x-options/). #### [](#customize-command-prompt-per-profile)Customize command prompt per profile A configurable field of an rpk profile is the `prompt` field. It enables the customization of the command prompt for a profile, so information about the in-use profile can be displayed within your command prompt. The format string is intended for a `PS1` prompt. For details on the prompt format string, see the [`rpk profile prompt`](../../../reference/rpk/rpk-profile/rpk-profile-prompt/) reference. The `rpk profile prompt` command prints the ANSI-escaped text of the `prompt` field for the in-use profile. You can call `rpk profile prompt` in your shell’s (rc) configuration file to assign your `PS1`. For example, to customize your bash prompt for a `dev` rpk profile , first call `rpk profile edit dev` to set its `prompt` field: ```yaml name: dev prompt: hi-red, "[%n]" ``` - `hi-red` sets the text to high-intensity red - `%n` is a variable for the profile name Then in `.bashrc`, set `PS1` to include a call to `rpk profile prompt`: ```bash export PS1='\u@\h\n$(rpk profile prompt)% ' ``` > 📝 **NOTE** > > When setting your `PS1` variable, use single quotation marks and not double quotation marks, because double quotation marks aren’t reevaluated after every command. The resulting prompt looks like this: username@hostname\[dev\]% ### [](#use-profile-with-rpk-command)Use profile with `rpk` command An rpk command that can use a profile supports the `--profile ` flag. When the `--profile` flag is set for an rpk command, the configuration for the cluster that rpk is interfacing with will be read from the named profile. See the [rpk commands reference](../../../reference/rpk/) for commands that support profiles. ### [](#delete-profile)Delete profile To delete a profile, run [`rpk profile delete`](../../../reference/rpk/rpk-profile/rpk-profile-delete/). ## [](#related-topics)Related topics For details about all commands for rpk profiles, see the [`rpk profile`](../../../reference/rpk/rpk-profile/rpk-profile/) reference page and its sub-pages. --- # Page 446: Introduction to rpk **URL**: https://docs.redpanda.com/redpanda-cloud/manage/rpk/intro-to-rpk.md --- # Introduction to rpk --- title: Introduction to rpk latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/intro-to-rpk page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/intro-to-rpk.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/rpk/intro-to-rpk.adoc description: Learn about rpk and how to use it to interact with your Redpanda cluster. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- The `rpk` command line interface tool is designed to manage your entire Redpanda cluster, without the need to run a separate script for each function, as with Apache Kafka. The `rpk` commands handle everything from configuring brokers to high-level general Redpanda tasks. For example, you can use `rpk` to monitor your cluster’s health, perform tuning, and implement access control lists (ACLs) and other security features. You can also use `rpk` to perform basic streaming tasks, such as creating topics, producing to topics, and consuming from topics. After you install `rpk`, you can use it to: - Manage Redpanda - Set up access control lists (ACLs) and other security features - Create topics, produce to topics, and consume from topics See also: - [Install or Update rpk](../rpk-install/) - [rpk Profiles](../config-rpk-profile/) ## [](#specify-configuration-properties)Specify configuration properties You can specify `rpk` command properties in the following ways: - Create an [`rpk profile`](../config-rpk-profile/). - Specify the appropriate flag on the command line. - Define the corresponding [environment variables](#environment-variables). Environment variable settings only last for the duration of a shell session. Command line flag settings take precedence over the corresponding environment variables, and environment variables take precedence over configuration file settings. If a required flag is not specified on the command line, Redpanda searches the environment variable. If the environment variable is not set, the value in the `rpk.yaml` configuration file is used, if that file is available, otherwise the value in the `redpanda.yaml` configuration file is used. > 💡 **TIP** > > If you specify `rpk` command properties in the configuration files or as environment variables, you don’t need to specify them again on the command line. ### [](#common-configuration-properties)Common configuration properties Every `rpk` command supports a set of common configuration properties. You can set one or more options in an `rpk` command by using the `-X` flag: ```bash rpk -X -X ``` Get a list of available options with `-X list`: ```bash rpk -X list ``` Or, get a detailed description about each option with `-X help`: ```bash rpk -X help ``` Every `-X` option can be translated into an environment variable by prefixing it with `RPK_` and replacing periods (`.`) with underscores (`_`). For example, the flag `tls.enabled` has the equivalent environment variable `RPK_TLS_ENABLED`. Some of the common configuration properties apply across all `rpk` commands as defaults. These default properties have keys with names starting with `globals`, and they’re viewable in `rpk -X list` and `rpk -X help`. For more details, see [`rpk -X options`](../../../reference/rpk/rpk-x-options/). ### [](#environment-variables)Environment variables `rpk` supports environment variables through `RPK_*` that correspond to `-X` options. For a comprehensive list and configuration examples, see: - [rpk profiles](../config-rpk-profile/) - Create and manage persistent configurations (recommended) - [rpk -X options](../../../reference/rpk/rpk-x-options/) - Complete configuration reference including environment variables ## [](#next-steps)Next steps - [Install or Update rpk](../rpk-install/) - [rpk Command reference](../../../reference/rpk/) --- # Page 447: Install or Update rpk **URL**: https://docs.redpanda.com/redpanda-cloud/manage/rpk/rpk-install.md --- # Install or Update rpk --- title: Install or Update rpk latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-install page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-install.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/rpk/rpk-install.adoc description: Install or update rpk to interact with Redpanda from the command line. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- The `rpk` tool is a single binary application that provides a way to interact with your Redpanda clusters from the command line. For example, you can use `rpk` to do the following: - Monitor your cluster’s health - Create, produce, and consume from topics - Set up access control lists (ACLs) and other security features Redpanda Cloud deployments should always use the latest version of `rpk`. ## [](#check-rpk-version)Check rpk version To check your current version of the rpk binary, run `rpk --version`. The following example lists the latest version of `rpk`. If your installed version is lower than this latest version, then update `rpk`. For a list of versions, see [Redpanda releases](https://github.com/redpanda-data/redpanda/releases/). ```bash rpk --version ``` ```bash rpk version 26.1.3 (rev 65d85f6) ``` ## [](#install-or-update-rpk-on-linux)Install or update rpk on Linux To install, or update to, the latest version of `rpk` for Linux, run: ### amd64 ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-linux-amd64.zip -d ~/.local/bin/ ``` ### arm64 ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-arm64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-linux-arm64.zip -d ~/.local/bin/ ``` > 💡 **TIP** > > You can use `rpk` on Windows only with [WSL](https://learn.microsoft.com/windows/wsl/install). However, commands that require Redpanda to be installed on your machine are not supported, such as [`rpk container`](../../../../current/reference/rpk/rpk-container/rpk-container/) commands, [`rpk iotune`](../../../../current/reference/rpk/rpk-iotune/), and [`rpk redpanda`](../../../../current/reference/rpk/rpk-redpanda/rpk-redpanda/) commands. ## [](#install-or-update-rpk-on-macos)Install or update rpk on macOS ### Homebrew 1. If you don’t have Homebrew installed, [install it](https://brew.sh/). 2. To install or update `rpk`, run: ```bash brew install redpanda-data/tap/redpanda ``` ### Manual Download To install or update `rpk` through a manual download, choose the option for your system architecture. For example, if you have an M1 or newer chip, select **Apple Silicon**. #### Intel macOS To install, or update to, the latest version of `rpk` for Intel macOS, run: ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-darwin-amd64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-darwin-amd64.zip -d ~/.local/bin/ ``` To install, or update to, a version other than the latest, run: ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/download/v/rpk-darwin-amd64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-darwin-amd64.zip -d ~/.local/bin/ ``` #### Apple Silicon To install, or update to, the latest version of `rpk` for Apple Silicon, run: ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-darwin-arm64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-darwin-arm64.zip -d ~/.local/bin/ ``` To install, or update to, a version other than the latest, run: ```bash curl -LO https://github.com/redpanda-data/redpanda/releases/download/v/rpk-darwin-arm64.zip && mkdir -p ~/.local/bin && export PATH="~/.local/bin:$PATH" && unzip rpk-darwin-arm64.zip -d ~/.local/bin/ ``` ## [](#next-steps)Next steps For the complete list of `rpk` commands and their syntax, see the [rpk reference](../../../reference/rpk/). --- # Page 448: Schema Registry **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg.md --- # Schema Registry --- title: Schema Registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/index.adoc description: Redpanda's Schema Registry provides the interface to store and manage event schemas. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Redpanda Schema Registry](schema-reg-overview/) Redpanda's Schema Registry provides the interface to store and manage event schemas. - [Use Schema Registry](schema-reg-ui/) Perform common Schema Registry management operations in Redpanda Cloud. - [Use the Schema Registry API](schema-reg-api/) Perform common Schema Registry management operations with the API. - [Schema Registry Authorization](schema-reg-authorization/) Learn how to set up and manage Schema Registry Authorization using ACL definitions that control user access to specific Schema Registry operations. - [Schema ID Validation](schema-id-validation/) Learn about schema ID validation for clients using SerDes that produce to Redpanda brokers, and learn how to configure Redpanda to inspect and reject records with invalid schema IDs. - [Deserialization](record-deserialization/) Learn how Redpanda Cloud deserializes messages. - [Programmable Push Filters](programmable-push-filters/) Learn how to filter Kafka records in Redpanda Cloud based on your provided JavaScript code. - [Edit Topic Configuration](edit-topic-configuration/) Use Redpanda Cloud to edit the configuration of existing topics in a cluster. --- # Page 449: Edit Topic Configuration **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/edit-topic-configuration.md --- # Edit Topic Configuration --- title: Edit Topic Configuration latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/edit-topic-configuration page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/edit-topic-configuration.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/edit-topic-configuration.adoc description: Use Redpanda Cloud to edit the configuration of existing topics in a cluster. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-04-08" --- Use Redpanda Cloud to edit the configuration of existing topics in a cluster. 1. In the menu, go to **Topics**. 2. Select a topic, and open the **Configuration** tab. 3. Click the pencil icon in the row of the property that you want to edit. 4. Make your changes, and click **Save changes**. --- # Page 450: Programmable Push Filters **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/programmable-push-filters.md --- # Programmable Push Filters --- title: Programmable Push Filters latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/programmable-push-filters page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/programmable-push-filters.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/programmable-push-filters.adoc description: Learn how to filter Kafka records in Redpanda Cloud based on your provided JavaScript code. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- You can use push-down filters in Redpanda Cloud to search through large Kafka topics that may contain millions of records. Filters are JavaScript functions executed on the backend, evaluating each record individually. Your function must return a boolean: - `true`: record is included in the frontend results. - `false`: record is skipped. Multiple filters combine logically with `AND` conditions. ## [](#add-a-javascript-filter)Add a JavaScript filter To add a JavaScript filter: 1. Navigate to the topic’s **Messages** page. 2. Click **Add filter > JavaScript Filter**. 3. Define your JavaScript filtering logic in the provided input area. ## [](#resource-usage-and-performance)Resource usage and performance JavaScript filters are executed on the backend, consuming CPU and network resources. The performance of your filter depends on the complexity of your JavaScript code and the volume of data being processed. Complex JavaScript logic or large data volumes may increase CPU load and network usage. ## [](#available-javascript-properties)Available JavaScript properties Redpanda Cloud injects these properties into your JavaScript context: | Property | Description | Type | | --- | --- | --- | | headers | Record headers as key-value pairs (ArrayBuffers) | Object | | key | Decoded record key | String | | keySchemaID | Schema Registry ID for key (if present) | Number | | partitionId | Partition ID of the record | Number | | offset | Record offset within partition | Number | | timestamp | Timestamp as JavaScript Date object | Date | | value | Decoded record value | Object/String | | valueSchemaID | Schema Registry ID for value (if present) | Number | > 📝 **NOTE** > > Values, keys, and headers are deserialized before being injected into your script. ## [](#javascript-filter-examples)JavaScript filter examples ### [](#filter-by-header-value)Filter by header value **Scenario:** Records tagged with headers specifying customer plan type. Sample header data (string value) ```json headers: { "plan_type": "premium" } ``` JavaScript filter ```javascript let headerValue = headers["plan_type"]; if (headerValue) { let stringValue = String.fromCharCode(...new Uint8Array(headerValue)); return stringValue === "premium"; } return false; ``` **Scenario:** Records include a header with JSON-encoded customer metadata. Sample header data (JSON value) ```json headers: { "customer": "{"orgID":"123-abc","name":"ACME Inc."}" } ``` JavaScript filter ```javascript let headerValue = headers["customer"]; if (headerValue) { let stringValue = String.fromCharCode(headerValue); let valueObj = JSON.parse(stringValue); return valueObj["orgID"] === "123-abc"; } return false; ``` ### [](#filter-by-timestamp)Filter by timestamp **Scenario:** Retrieve records from a promotional event. JavaScript filter ```javascript return timestamp.getMonth() === 10 && timestamp.getDate() === 24; ``` ### [](#filter-by-schema-id)Filter by schema ID **Scenario:** Filter customer activity records based on Avro schema version. JavaScript filter ```javascript return valueSchemaID === 204; ``` ### [](#filter-json-record-values)Filter JSON record values **Scenario:** Filter transactions by customer ID. Sample JSON record ```json { "transaction_id": "abc123", "customer_id": "cust789", "amount": 59.99 } ``` JavaScript filter (top-level property) ```javascript return value.customer_id === "cust789"; ``` **Scenario:** Filter orders by item availability. Sample JSON record ```json { "order_id": "ord456", "inventory": { "item_id": "itm001", "status": "in_stock" } } ``` JavaScript filter (nested property) ```javascript return value.inventory.status === "in_stock"; ``` **Scenario:** Filter products missing price information. JavaScript filter (property absence) ```javascript return !value.hasOwnProperty("price"); ``` ### [](#filter-string-keys)Filter string keys **Scenario:** Filter sensor data records by IoT device ID. JavaScript filter ```javascript return key === "sensor-device-1234"; ``` --- # Page 451: Deserialization **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/record-deserialization.md --- # Deserialization --- title: Deserialization latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/record-deserialization page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/record-deserialization.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/record-deserialization.adoc description: Learn how Redpanda Cloud deserializes messages. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- In Redpanda, the messages exchanged between producers and consumers contain raw bytes. Schemas work as an agreed-upon format, like a contract, for producers and consumers to serialize and deserialize those messages. If a producer breaks this contract, consumers can fail. Redpanda Cloud automatically tries to deserialize incoming messages and displays them in human-readable format. It tests different deserialization strategies until it finds one with no errors. If no deserialization attempts are successful, Redpanda Cloud renders the byte array in a hex viewer. Sometimes, the payload is displayed in hex bytes because it’s encrypted or because it uses a serializer that Redpanda Cloud cannot deserialize. When this happens, Redpanda Cloud displays troubleshooting information. You can also download the raw bytes of the message to feed it directly to your client deserializer or share it with a support team. All deserialized messages are rendered as JSON objects and can be used as JavaScript objects in [JavaScript filters (push filters)](../programmable-push-filters/). ## [](#display-messages-in-a-specific-format)Display messages in a specific format Redpanda Cloud tries to automatically identify the correct deserialization type by decoding the message’s key, value, or header with all available deserialization methods. To display your messages in another format: 1. Open your topic. 2. Click the cog icon. 3. Click **Deserialization**. 4. Choose a new deserializer for either the keys or values in your messages. Supported deserializers include: - Plain text - Kafka’s internal binary formats; for example, the `__consumer_offsets` topic - JSON - JSON with Schema Registry encoding - Smile - XML - Avro with Schema Registry encoding - Protobuf - Protobuf with Schema Registry encoding - Messagepack (for topics explicitly enabled to test MessagePack) - UTF-8 / strings - `uint8`, `uint16`, `uint32`, `uint64` ## [](#suggested-reading)Suggested reading - [Redpanda Schema Registry](../schema-reg-overview/) --- # Page 452: Schema ID Validation **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/schema-id-validation.md --- # Schema ID Validation --- title: Schema ID Validation latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/schema-id-validation page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/schema-id-validation.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/schema-id-validation.adoc description: Learn about schema ID validation for clients using SerDes that produce to Redpanda brokers, and learn how to configure Redpanda to inspect and reject records with invalid schema IDs. page-git-created-date: "2026-02-04" page-git-modified-date: "2026-02-04" --- You can use server-side schema ID validation for clients using Confluent’s SerDes format that produce to Redpanda brokers. You can also configure Redpanda to inspect and reject records with schema IDs that aren’t valid according to the configured Subject Name strategy and registered with the Schema Registry. ## [](#about-schema-id-validation)About schema ID validation Records produced to a topic may use a serializer/deserializer client library, such as Confluent’s SerDes library, to encode their keys and values according to a schema. When a client produces a record, the _schema ID_ for the topic is encoded in the record’s payload header. The schema ID must be associated with a subject and a version in the Schema Registry. That subject is determined by the _subject name strategy_, which maps the topic and schema onto a subject. A client may be misconfigured with either the wrong schema or the wrong subject name strategy, resulting in unexpected data on the topic. A produced record for an unregistered schema shouldn’t be stored by brokers or fetched by consumers. Yet, it may not be detected or dropped until after it’s been fetched and a consumer deserializes its mismatched schema ID. Schema ID validation enables brokers (servers) to detect and drop records that were produced with an incorrectly configured subject name strategy, that don’t conform to the SerDes wire format, or encode an incorrect schema ID. With schema ID validation, records associated with unregistered schemas are detected and dropped earlier, by a broker rather than a consumer. > ❗ **IMPORTANT** > > Schema ID validation doesn’t verify that a record’s payload is correctly encoded according to the associated schema. Schema ID validation only checks that the schema ID encoded in the record is registered in the Schema Registry. ## [](#configure-schema-id-validation)Configure schema ID validation To use schema ID validation: - [Enable the feature in Redpanda](#enable-schema-id-validation) - [Customize the subject name strategy per topic on the client](#set-subject-name-strategy-per-topic) ### [](#enable-schema-id-validation)Enable schema ID validation By default, server-side schema ID validation is disabled in Redpanda. To enable schema ID validation, change the [`enable_schema_id_validation`](../../../reference/properties/cluster-properties/#enable_schema_id_validation) cluster property from its default value of `none` to either `redpanda` or `compat`: - `none`: Schema validation is disabled (no schema ID checks are done). Associated topic properties cannot be modified. - `redpanda`: Schema validation is enabled. Only Redpanda topic properties are accepted. - `compat`: Schema validation is enabled. Both Redpanda and compatible topic properties are accepted. See [Configure Cluster Properties](../../cluster-maintenance/config-cluster/). ### [](#set-subject-name-strategy-per-topic)Set subject name strategy per topic The subject name strategies supported by Redpanda: | Subject Name Strategy | Subject Name Source | Subject Name Format (Key) | Subject Name Format (Value) | | --- | --- | --- | --- | | TopicNameStrategy | Topic name | -key | -value | | RecordNameStrategy | Fully-qualified record name | | | | TopicRecordNameStrategy | Both topic name and fully-qualified record name | - | - | When [schema ID validation is enabled](#enable-schema-id-validation), Redpanda uses `TopicNameStrategy` by default. To customize the subject name strategy per topic, set the following client topic properties: - Set `redpanda.key.schema.id.validation` to `true` to enable key schema ID validation for the topic, and set `redpanda.key.subject.name.strategy` to the desired subject name strategy for keys of the topic (default: `TopicNameStrategy`). - Set `redpanda.value.schema.id.validation` to `true` to enable value schema ID validation for the topic, and set `redpanda.value.subject.name.strategy` to the desired subject name strategy for values of the topic (default: `TopicNameStrategy`). > 📝 **NOTE** > > The `redpanda.` properties have corresponding `confluent.` properties. > > | Redpanda property | Confluent property | > | --- | --- | > | redpanda.key.schema.id.validation | confluent.key.schema.validation | > | redpanda.key.subject.name.strategy | confluent.key.subject.name.strategy | > | redpanda.value.schema.id.validation | confluent.value.schema.validation | > | redpanda.value.subject.name.strategy | confluent.value.subject.name.strategy | The `redpanda.` **and `confluent.`** properties are compatible. Either or both can be set simultaneously. If `subject.name.strategy` is prefixed with `confluent.`, the available subject name strategies must be prefixed with `io.confluent.kafka.serializers.subject.`. For example, `io.confluent.kafka.serializers.subject.TopicNameStrategy`. > 📝 **NOTE** > > To support schema ID validation for compressed topics, a Redpanda broker decompresses each batch written to it so it can access the schema ID. ### [](#configuration-examples)Configuration examples Create a topic with with `RecordNameStrategy`: ```bash rpk topic create topic_foo \ --topic-config redpanda.value.schema.id.validation=true \ --topic-config redpanda.value.subject.name.strategy=RecordNameStrategy \ -X brokers=:9092 ``` Alter a topic to `RecordNameStrategy`: ```bash rpk topic alter-config topic_foo \ --set redpanda.value.schema.id.validation=true \ --set redpanda.value.subject.name.strategy=RecordNameStrategy \ -X brokers=:9092 ``` --- # Page 453: Use the Schema Registry API **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/schema-reg-api.md --- # Use the Schema Registry API --- title: Use the Schema Registry API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/schema-reg-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/schema-reg-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/schema-reg-api.adoc description: Perform common Schema Registry management operations with the API. page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-12" --- Schemas provide human-readable documentation for an API. They verify that data conforms to an API, support the generation of serializers for data, and manage the compatibility of evolving APIs, allowing new versions of services to be rolled out independently. > 📝 **NOTE** > > The Schema Registry is built into Redpanda, and you can use it with the API or the UI. This section describes operations available in the [Schema Registry API](/api/doc/schema-registry/). The Redpanda Schema Registry has API endpoints that allow you to perform the following tasks: - Register schemas for a subject. When data formats are updated, a new version of the schema can be registered under the same subject, allowing for backward and forward compatibility. - Retrieve schemas of specific versions. - Retrieve a list of subjects. - Retrieve a list of schema versions for a subject. - Configure schema compatibility checking. - Query supported serialization formats. - Delete schemas from the registry. The following examples cover the basic functionality of the Redpanda Schema Registry based on an example Avro schema called `sensor_sample`. This schema contains fields that represent a measurement from a sensor for the value of the `sensor` topic, as defined below. ```json { "type": "record", "name": "sensor_sample", "fields": [ { "name": "timestamp", "type": "long", "logicalType": "timestamp-millis" }, { "name": "identifier", "type": "string", "logicalType": "uuid" }, { "name": "value", "type": "long" } ] } ``` ## [](#prerequisites)Prerequisites To run the sample commands and code in each example, follow these steps to set up Redpanda and other tools: 1. You need a running Redpanda cluster. If you don’t have one, you can [create a cluster](../../../get-started/cluster-types/serverless/) using Redpanda Serverless. In these examples, it is assumed that the Schema Registry is available locally at `[http://localhost:8081](http://localhost:8081)`. If the Schema Registry is hosted on a different address or port in your cluster, change the URLs in the examples. 2. Download the [jq utility](https://stedolan.github.io/jq/download/). 3. Install [curl](https://curl.se/) or [Python](https://www.python.org/). You can also use [`rpk`](../../rpk/intro-to-rpk/) to interact with the Schema Registry. The [`rpk registry`](../../../reference/rpk/rpk-registry/rpk-registry/) set of commands call the different API endpoints as shown in the curl and Python examples. If using Python, install the [Requests module](https://requests.readthedocs.io/en/latest/user/install/#install), then create an interactive Python session: ```python import requests import json def pretty(text): print(json.dumps(text, indent=2)) base_uri = "http://localhost:8081" ``` ## [](#query-supported-schema-formats)Query supported schema formats To get the supported data serialization formats in the Schema Registry, make a GET request to the `/schemas/types` endpoint: ### Curl ```bash curl -s "http://localhost:8081/schemas/types" | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/schemas/types').json() pretty(res) ``` This returns the supported serialization formats: \[ "JSON", "PROTOBUF", "AVRO" \] ## [](#register-a-schema)Register a schema A schema is registered in the registry with a _subject_, which is a name that is associated with the schema as it evolves. Subjects are typically in the form `-key` or `-value`. To register the `sensor_sample` schema, make a POST request to the `/subjects/sensor-value/versions` endpoint with the Content-Type `application/vnd.schemaregistry.v1+json`: ### rpk ```bash rpk registry schema create sensor-value --schema ~/code/tmp/sensor_sample.avro ``` ### Curl ```bash curl -s \ -X POST \ "http://localhost:8081/subjects/sensor-value/versions" \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"}' \ | jq ``` To normalize the schema, add the query parameter `?normalize=true` to the endpoint. ### Python ```python sensor_schema = { "type": "record", "name": "sensor_sample", "fields": [ { "name": "timestamp", "type": "long", "logicalType": "timestamp-millis" }, { "name": "identifier", "type": "string", "logicalType": "uuid" }, { "name": "value", "type": "long" } ] } res = requests.post( url=f'{base_uri}/subjects/sensor-value/versions', data=json.dumps({ 'schema': json.dumps(sensor_schema) }), headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json() pretty(res) ``` This returns the version `id` unique for the schema in the Redpanda cluster: ### rpk SUBJECT VERSION ID TYPE sensor-value 1 1 AVRO ### Curl ```json { "id": 1 } ``` When you register an evolved schema for an existing subject, the version `id` is incremented by 1. ## [](#retrieve-a-schema)Retrieve a schema To retrieve a registered schema from the registry, make a GET request to the `/schemas/ids/{id}` endpoint: ### rpk ```bash rpk registry schema get --id 1 ``` ### Curl ```bash curl -s \ "http://localhost:8081/schemas/ids/1" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/schemas/ids/1').json() pretty(res) ``` The rpk output returns the subject and version, and the HTTP response returns the schema: ### rpk SUBJECT VERSION ID TYPE sensor-value 1 1 AVRO ### Curl ```json { "schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}" } ``` ## [](#list-registry-subjects)List registry subjects To list all registry subjects, make a GET request to the `/subjects` endpoint: ### rpk ```bash rpk registry subject list --format json ``` ### Curl ```bash curl -s \ "http://localhost:8081/subjects" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/subjects').json() pretty(res) ``` This returns the subject: ```json [ "sensor-value" ] ``` ## [](#retrieve-schema-versions-of-a-subject)Retrieve schema versions of a subject To query the schema versions of a subject, make a GET request to the `/subjects/{subject}/versions` endpoint. For example, to get the schema versions of the `sensor-value` subject: ### Curl ```bash curl -s \ "http://localhost:8081/subjects/sensor-value/versions" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/subjects/sensor-value/versions').json() pretty(res) ``` This returns the version ID: ```json [ 1 ] ``` ## [](#retrieve-a-subjects-specific-version-of-a-schema)Retrieve a subject’s specific version of a schema To retrieve a specific version of a schema associated with a subject, make a GET request to the `/subjects/{subject}/versions/{version}` endpoint: ### rpk ```bash rpk registry schema get sensor-value --schema-version 1 ``` ### Curl ```bash curl -s \ "http://localhost:8081/subjects/sensor-value/versions/1" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/subjects/sensor-value/versions/1').json() pretty(res) ``` The rpk output returns the subject, and for HTTP requests, its associated schema as well: ### rpk SUBJECT VERSION ID TYPE sensor-value 1 1 AVRO ### Curl ```json { "subject": "sensor-value", "id": 1, "version": 1, "schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}" } ``` To get the latest version, use `latest` as the version ID: ### rpk ```bash rpk registry schema get sensor-value --schema-version latest ``` ### Curl ```bash curl -s \ "http://localhost:8081/subjects/sensor-value/versions/latest" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/subjects/sensor-value/versions/latest').json() pretty(res) ``` To get only the schema, append `/schema` to the endpoint path: ### Curl ```bash curl -s \ "http://localhost:8081/subjects/sensor-value/versions/latest/schema" \ | jq . ``` ### Python ```python res = requests.get(f'{base_uri}/subjects/sensor-value/versions/latest/schema').json() pretty(res) ``` ```json { "type": "record", "name": "sensor_sample", "fields": [ { "name": "timestamp", "type": "long", "logicalType": "timestamp-millis" }, { "name": "identifier", "type": "string", "logicalType": "uuid" }, { "name": "value", "type": "long" } ] } ``` ## [](#configure-schema-compatibility)Configure schema compatibility As applications change and their schemas evolve, you may find that producer schemas and consumer schemas are no longer compatible. You decide how you want a consumer to handle data from a producer that uses an older or newer schema. Applications are often modeled around a specific business object structure. As applications change and the shape of their data changes, producer schemas and consumer schemas may no longer be compatible. You can decide how a consumer handles data from a producer that uses an older or newer schema, and reduce the chance of consumers hitting deserialization errors. You can configure different types of schema compatibility, which are applied to a subject when a new schema is registered. The Schema Registry supports the following compatibility types: - `BACKWARD` (**default**) - Consumers using the new schema (for example, version 10) can read data from producers using the previous schema (for example, version 9). - `BACKWARD_TRANSITIVE` - Consumers using the new schema (for example, version 10) can read data from producers using all previous schemas (for example, versions 1-9). - `FORWARD` - Consumers using the previous schema (for example, version 9) can read data from producers using the new schema (for example, version 10). - `FORWARD_TRANSITIVE` - Consumers using any previous schema (for example, versions 1-9) can read data from producers using the new schema (for example, version 10). - `FULL` - A new schema and the previous schema (for example, versions 10 and 9) are both backward and forward compatible with each other. - `FULL_TRANSITIVE` - Each schema is both backward and forward compatible with all registered schemas. - `NONE` - No schema compatibility checks are done. ### [](#compatibility-uses-and-constraints)Compatibility uses and constraints - A consumer that wants to read a topic from the beginning (for example, an AI learning process) benefits from backward compatibility. It can process the whole topic using the latest schema. This allows producers to remove fields and add attributes. - A real-time consumer that doesn’t care about historical events but wants to keep up with the latest data (for example, a typical streaming application) benefits from forward compatibility. Even if producers change the schema, the consumer can carry on. - Full compatibility can process historical data and future data. This is the safest option, but it limits the changes that can be done. This only allows for the addition and removal of optional fields. If you make changes that are not inherently backward-compatible, you may need to change compatibility settings or plan a transitional period, updating producers and consumers to use the new schema while the old one is still accepted. | Schema format | Backward-compatible tasks | Not backward-compatible tasks | | --- | --- | --- | | Avro | Add fields with default valuesMake fields nullable | Remove fieldsChange data types of fieldsChange enum valuesChange field constraintsChange record of field names | | Protobuf | Add fieldsRemove fields | Remove required fieldsChange data types of fields | | JSON | Add optional propertiesRelax constraints, for example:Decrease a minimum value or increase a maximum valueDecrease minItems, minLength, or minProperties; increase maxItems, maxLength, maxPropertiesAdd more property types (for example, "type": "integer" to "type": ["integer", "string"])Add more enum valuesReduce multipleOf by an integral factorRelaxing additional properties if additionalProperties was not previously specified as falseRemoving a uniqueItems property that was false | Remove propertiesAdd required propertiesChange property names and typesTighten or add constraints | To set the compatibility type for a subject, make a PUT request to `/config/{subject}` with the specific compatibility type: #### rpk ```bash rpk registry compatibility-level set sensor-value --level BACKWARD ``` #### Curl ```bash curl -s \ -X PUT \ "http://localhost:8081/config/sensor-value" \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ -d '{"compatibility": "BACKWARD"}' \ | jq . ``` #### Python ```python res = requests.put( url=f'{base_uri}/config/sensor-value', data=json.dumps( {'compatibility': 'BACKWARD'} ), headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json() pretty(res) ``` This returns the new compatibility type: #### rpk SUBJECT LEVEL ERROR sensor-value BACKWARD #### Curl ```json { "compatibility": "BACKWARD" } ``` If you POST an incompatible schema change, the request returns an error. For example, if you try to register a new schema with the `value` field’s type changed from `long` to `int`, and compatibility is set to `BACKWARD`, the request returns an error due to incompatibility: #### Curl ```bash curl -s \ -X POST \ "http://localhost:8081/subjects/sensor-value/versions" \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"int\"}]}"}' \ | jq ``` #### Python ```python sensor_schema["fields"][2]["type"] = "int" res = requests.post( url=f'{base_uri}/subjects/sensor-value/versions', data=json.dumps({ 'schema': json.dumps(sensor_schema) }), headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json() pretty(res) ``` The request returns this error: ```json { "error_code": 409, "message": "Schema being registered is incompatible with an earlier schema for subject \"{sensor-value}\"" } ``` For an example of a compatible change, register a schema with the `value` field’s type changed from `long` to `double`: #### Curl ```bash curl -s \ -X POST \ "http://localhost:8081/subjects/sensor-value/versions" \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"double\"}]}"}' \ | jq ``` #### Python ```python sensor_schema["fields"][2]["type"] = "double" res = requests.post( url=f'{base_uri}/subjects/sensor-value/versions', data=json.dumps({ 'schema': json.dumps(sensor_schema) }), headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json() pretty(res) ``` A successful registration returns the schema’s `id`: ```json { "id": 2 } ``` ## [](#reference-a-schema)Reference a schema To build more complex schema definitions, you can add a reference to other schemas. The following example registers a Protobuf schema in subject `test-simple` with a message name `Simple`. ### rpk ```bash rpk registry schema create test-simple --schema simple.proto ``` ```none SUBJECT VERSION ID TYPE test-simple 1 2 PROTOBUF ``` ### Curl ```bash curl -X POST -H 'Content-type: application/vnd.schemaregistry.v1+json' http://127.0.0.1:8081/subjects/test-simple/versions -d '{"schema": "syntax = \"proto3\";\nmessage Simple {\n string id = 1;\n}","schemaType": "PROTOBUF"}' ``` ```json {"id":2} ``` This schema is then referenced in a new schema in a different subject named `import`. ### rpk ```bash # --references flag takes the format {name}:{subject}:{schema version} rpk registry schema create import --schema import_schema.proto --references simple:test-simple:2 ``` ```none SUBJECT VERSION ID TYPE import 1 3 PROTOBUF ``` ### Curl ```bash curl -X POST -H 'Content-type: application/vnd.schemaregistry.v1+json' http://127.0.0.1:8081/subjects/import/versions -d '{"schema": "syntax = \"proto3\";\nimport \"simple\";\nmessage Test3 {\n Simple id = 1;\n}","schemaType": "PROTOBUF", "references": [{"name": "simple", "subject": "test-simple", "version":1}]}' ``` ```json {"id":3} ``` You cannot delete a schema when it is used as a reference. ### rpk ```bash rpk registry schema delete test-simple --schema-version 1 ``` ```none One or more references exist to the schema {magic=1,keytype=SCHEMA,subject=test-simple,version=1} ``` ### Curl ```bash curl -X DELETE -H 'Content-type: application/vnd.schemaregistry.v1+json' http://127.0.0.1:8081/subjects/test-simple/versions/1 ``` ```json {"error_code":42206,"message":"One or more references exist to the schema {magic=1,keytype=SCHEMA,subject=test-simple,version=1}"} ``` Call the `/subjects/test-simple/versions/1/referencedby` endpoint to see the schema IDs that reference version 1 for subject `test-simple`. ### rpk ```bash rpk registry schema references test-simple --schema-version 1 ``` ```none SUBJECT VERSION ID TYPE import 1 3 PROTOBUF ``` ### Curl ```bash curl -H 'Content-type: application/vnd.schemaregistry.v1+json' http://127.0.0.1:8081/subjects/test-simple/versions/1/referencedby ``` ```json [3] ``` ## [](#delete-a-schema)Delete a schema The Schema Registry API provides DELETE endpoints for deleting a single schema or all schemas of a subject: - `/subjects/{subject}/versions/{version}` - `/subjects/{subject}` Schemas cannot be deleted if any other schemas reference it. A schema can be soft deleted (impermanently) or hard deleted (permanently), based on the boolean query parameter `permanent`. A soft deleted schema can be retrieved and re-registered. A hard deleted schema cannot be recovered. ### [](#soft-delete-a-schema)Soft delete a schema To soft delete a schema, make a DELETE request with the subject and version ID (where `permanent=false` is the default parameter value): #### rpk ```bash rpk registry schema delete sensor-value --schema-version 1 ``` #### Curl ```bash curl -s \ -X DELETE \ "http://localhost:8081/subjects/sensor-value/versions/1" \ | jq . ``` #### Python ```python res = requests.delete(f'{base_uri}/subjects/sensor-value/versions/1').json() pretty(res) ``` This returns the ID of the soft deleted schema: #### rpk ```none Successfully deleted schema. Subject: "sensor-value", version: "1" ``` #### Curl ```none 1 ``` Doing a soft delete for an already deleted schema returns an error: #### rpk ```none Subject 'sensor-value' Version 1 was soft deleted. Set permanent=true to delete permanently ``` #### Curl ```json { "error_code": 40406, "message": "Subject 'sensor-value' Version 1 was soft deleted.Set permanent=true to delete permanently" } ``` To list subjects of soft-deleted schemas, make a GET request with the `deleted` parameter set to `true`, `/subjects?deleted=true`: #### rpk ```bash rpk registry subject list --deleted ``` #### Curl ```bash curl -s \ "http://localhost:8081/subjects?deleted=true" \ | jq . ``` #### Python ```python payload = { 'deleted' : 'true' } res = requests.get(f'{base_uri}/subjects', params=payload).json() pretty(res) ``` This returns all subjects, including deleted ones: ```json [ "sensor-value" ] ``` To undo a soft deletion, first follow the steps to [retrieve the schema](#retrieve-a-schema-of-a-subject), then [register the schema](#register-a-schema). ### [](#hard-delete-a-schema)Hard delete a schema > ⚠️ **CAUTION** > > Redpanda doesn’t recommend hard (permanently) deleting schemas in a production system. > > The DELETE APIs are primarily used during the development phase, when schemas are being iterated and revised. To hard delete a schema, use the `--permanent` flag with the `rpk registry schema delete` command, or for curl or Python, make two DELETE requests with the second request setting the `permanent` parameter to `true` (`/subjects/{subject}/versions/{version}?permanent=true`): #### rpk ```bash rpk registry schema delete sensor-value --schema-version 1 --permanent ``` #### Curl ```bash curl -s \ -X DELETE \ "http://localhost:8081/subjects/sensor-value/versions/1" \ | jq . curl -s \ -X DELETE \ "http://localhost:8081/subjects/sensor-value/versions/1?permanent=true" \ | jq . ``` #### Python ```python res = requests.delete(f'{base_uri}/subjects/sensor-value/versions/1').json() pretty(res) payload = { 'permanent' : 'true' } res = requests.delete(f'{base_uri}/subjects/sensor-value/versions/1', params=payload).json() pretty(res) ``` Each request returns the version ID of the deleted schema: #### rpk ```none Successfully deleted schema. Subject: "sensor-value", version: "1" ``` #### Curl ```json 1 1 ``` A request for a hard-deleted schema returns an error: #### rpk ```none Subject 'sensor-value' not found. ``` #### Curl ```json { "error_code": 40401, "message": "Subject 'sensor-value' not found." } ``` ## [](#set-schema-registry-mode)Set Schema Registry mode The `/mode` endpoint allows you to put Schema Registry in read-only, read-write, or import mode. - In read-write mode (the default), you can both register and look up schemas. - In [read-only mode](#use-readonly-mode-for-disaster-recovery), you can only look up schemas. This mode is most useful for standby clusters in a disaster recovery setup. - In [import mode](#use-import-mode-for-migration), you can only register schemas. This mode is most useful for target clusters in a migration setup. If authentication is enabled on Schema Registry, only superusers can change global and subject-level modes. > ⚠️ **CAUTION** > > **Breaking change in Redpanda 25.3:** In Redpanda versions before 25.3, you could specify a schema ID or version when registering a schema in read-write mode. > > Starting with 25.3, read-write mode returns an error when you try to register a schema with a specific ID or version. If you have custom scripts that rely on the ability to specify an ID or version with Redpanda 25.2 and earlier, you must do either of the following: > > - Omit the ID and version fields when registering a schema. The schema will be registered under a new ID and version. > > - Change the Schema Registry or the subject to import mode. ### [](#get-global-mode)Get global mode To [query the global mode](/api/doc/schema-registry/operation/operation-get_mode) for Schema Registry: #### rpk ```bash rpk registry mode get --global ``` #### Curl ```bash curl http://localhost:8081/mode ``` ### [](#set-global-mode)Set global mode Set the mode for Schema Registry at a global level. This mode applies to all subjects that do not have a specific mode set. #### rpk ```bash rpk registry mode set --mode --global ``` #### Curl ```bash curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"mode": }' http://localhost:8081/mode ``` Replace the `` placeholder with the desired mode: - `READONLY` - `READWRITE` - `IMPORT` ### [](#get-mode-for-a-subject)Get mode for a subject To look up the mode for a specific subject: #### rpk ```bash rpk registry mode get ``` #### Curl ```bash curl http://localhost:8081/mode/?defaultToGlobal=true ``` This request returns the mode that is enforced. If the subject is set to a specific mode (to override the global mode), it returns the override mode. Otherwise, it returns the global mode. To retrieve the subject-level override if it exists, use: ```bash curl http://localhost:8081/mode/ ``` This request returns an error if there is no specific mode set for the subject. ### [](#set-mode-for-a-subject)Set mode for a subject #### rpk ```bash rpk registry mode set --mode READONLY ``` #### Curl ```bash curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"mode": "READONLY"}' http://localhost:8081/mode/ ``` ### [](#use-readonly-mode-for-disaster-recovery)Use READONLY mode for disaster recovery A read-only Schema Registry does not accept direct writes. An active production cluster can replicate schemas to a read-only Schema Registry to keep it in sync, for example using Redpanda’s [Schema Migration tool](https://github.com/redpanda-data/schema-migration/). Users in the disaster recovery (DR) site cannot update schemas directly, so the DR cluster has an exact replica of the schemas in production. In a failover due to a disaster or outage, you can set Schema Registry to read-write mode, taking over for the failed cluster and ensuring availability. ### [](#use-import-mode-for-migration)Use IMPORT mode for migration Set the target Schema Registry to import mode to: - Bypass compatibility checks when registering schemas. - Specify a specific schema ID and version for the registered schema, so you can retain the same IDs and version from the original Schema Registry and keep topic data associated with the correct schema. To enable import mode, you must have: - Either superuser access, or a Schema Registry ACL with the `alter_configs` operation on the `registry` resource. See [Enable Schema Registry Authorization](../schema-reg-authorization/#enable-schema-registry-authorization) to learn how to enable schema registry authorization for your cluster. - An empty registry or subject. That is, either no schemas have ever been registered, or you must [hard-delete](#hard-delete-a-schema) all schemas that were registered. To bypass the check for an empty registry when setting the global mode to import: #### rpk ```bash rpk registry mode set --mode IMPORT --global --force ``` #### Curl ```bash curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"mode": "IMPORT"}' http://localhost:8081/mode?force=true ``` Use import mode to register a schema with a specific ID and version: #### rpk ```bash rpk registry schema create --schema order.proto --id 1 --schema-version 4 ``` #### Curl ```bash curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "syntax = \"proto3\";\nmessage Order {\n string id = 1;\n}", "schemaType": "PROTOBUF", "id": 1, "version": 4}' http://localhost:8081/subjects//versions ``` ## [](#retrieve-serialized-schemas)Retrieve serialized schemas Starting in Redpanda version 25.2, the following endpoints return serialized schemas (Protobuf only) using the `format=serialized` query parameter: | Operation | Path | | --- | --- | | Retrieve a schema | GET /schemas/ids/{id}?format=serialized | | Check if a schema is already registered for a subject | POST /subjects/{subject}?format=serialized | | Retrieve a subject’s specific version of a schema | GET /subjects/{subject}/versions/{version}?format=serialized | | Get the unescaped schema only for a subject | GET /subjects/{subject}/versions/{version}/schema?format=serialized | The `serialized` format returns the Protobuf schema in its wire binary format in Base64. - Passing an empty string (`format=''`) returns the schema in the current (default) format. - For Avro, `resolved` is a valid value, but it is not currently supported and returns a 501 Not Implemented error. - For Protobuf, `serialized` and `ignore_extensions` are valid, but only `serialized` is currently supported; passing `ignore_extensions` returns a 501 Not Implemented error. - Cross-schema conditions such as `resolved` with Protobuf or `serialized` with Avro are ignored and the schema is returned in the default format. ## [](#suggested-reading)Suggested reading - [Redpanda Schema Registry](../schema-reg-overview/) - [rpk registry](../../../reference/rpk/rpk-registry/rpk-registry/) - [Schema Registry API](/api/doc/schema-registry/) - [Monitor Schema Registry service-level metrics](../../monitor-cloud/#service-level-queries) - [Deserialization](../record-deserialization/#schema-registry) --- # Page 454: Schema Registry Authorization **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/schema-reg-authorization.md --- # Schema Registry Authorization --- title: Schema Registry Authorization latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/schema-reg-authorization page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/schema-reg-authorization.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/schema-reg-authorization.adoc description: Learn how to set up and manage Schema Registry Authorization using ACL definitions that control user access to specific Schema Registry operations. page-git-created-date: "2025-08-19" page-git-modified-date: "2025-08-19" --- Schema Registry Authorization enables fine-grained restriction of operations to Schema Registry resources by user or role through access control lists (ACLs). ## [](#about-schema-registry-authorization)About Schema Registry Authorization Schema Registry Authorization allows you to control which users and applications can perform specific operations within the Redpanda Schema Registry. This ensures that only authorized entities can read, write, modify, delete, or configure schemas and their settings. Before v25.2, Schema Registry supported authentication, but once a user was authenticated, they had full access to all Schema Registry operations, including reading, modifying, and deleting schemas and configuration both per-subject and globally. Starting in v25.2, Schema Registry Authorization provides fine-grained access control through ACLs. You can now restrict access to specific subjects and operations. ### [](#how-to-manage-schema-registry-authorization)How to manage Schema Registry Authorization You can manage Schema Registry Authorization in the following ways: - **rpk**: Use the [`rpk security acl create`](../../../reference/rpk/rpk-security/rpk-security-acl-create/) command, just like you would for other Kafka ACLs. - **Schema Registry API**: Use the [Redpanda Schema Registry API](/api/doc/schema-registry/operation/operation-get_security_acls) endpoints. - **Redpanda Cloud**: After enabling Schema Registry Authorization for your cluster, you can use Redpanda Cloud to manage Schema Registry ACLs. See [Configure ACLs](../../../security/authorization/acl/). ### [](#schema-registry-acl-resource-types)Schema Registry ACL resource types Schema Registry Authorization introduces two new ACL resource types in addition to the standard Kafka ACL resources (`topic`, `group`, `cluster`, and `transactional_id`): - `registry`: Controls whether or not to grant ACL access to global, or top-level Schema Registry operations. Specify using the flag `registry-global`. - `subject`: Controls ACL access for specific Schema Registry subjects. Specify using the flag `registry-subject`. ## [](#supported-operations)Supported operations Redpanda Schema Registry ACLs support the following specific subset of Schema Registry endpoints and operations: > 📝 **NOTE** > > Not all Kafka operations are supported when using Redpanda Schema Registry ACLs. | Endpoint | HTTP method | Operation | Resource | | --- | --- | --- | --- | | /config | GET | describe_configs | registry | | /config | PUT | alter_configs | registry | | /config/{subject} | GET | describe_configs | subject | | /config/{subject} | PUT | alter_configs | subject | | /config/{subject} | DELETE | alter_configs | subject | | /mode | GET | describe_configs | registry | | /mode | PUT | alter_configs | registry | | /mode/{subject} | GET | describe_configs | subject | | /mode/{subject} | PUT | alter_configs | subject | | /mode/{subject} | DELETE | alter_configs | subject | | /schemas/types | GET | none/open | - | | /schemas/ids/{id} | GET | read | subject | | /schemas/ids/{id}/versions | GET | describe | registry | | /schemas/ids/{id}/subjects | GET | describe | registry | | /subjects | GET | describe | subject | | /subjects/{subject} | POST | read | subject | | /subjects/{subject} | DELETE | delete | subject | | /subjects/{subject}/versions | GET | describe | subject | | /subjects/{subject}/versions | POST | write | subject | | /subjects/{subject}/versions/{version} | GET | read | subject | | /subjects/{subject}/versions/{version} | DELETE | delete | subject | | /subjects/{subject}/versions/schema | GET | read | subject | | /subjects/{subject}/versions/referencedby | GET | describe | registry | | /compatibility/subjects/{subject}/versions/{version} | POST | read | subject | | /status/ready | GET | none/open | - | | /security/acls | GET | describe | cluster | | /security/acls | POST | alter | cluster | | /security/acls | DELETE | alter | cluster | For additional guidance on these operations, see the [Redpanda Schema Registry API](/api/doc/schema-registry/operation/operation-get_security_acls). ### [](#operation-definitions)Operation definitions You can use the following operations to control access to Schema Registry resources: - **`read`**: Allows user to read schemas and their content. Required for consuming messages that use Schema Registry, fetching specific schema versions, and reading schema content by ID. - **`write`**: Allows user to register new schemas and schema versions. Required for producing messages with new schemas and updating existing subjects with new schema versions. - **`delete`**: Allows user to delete schema versions and subjects. Required for cleanup operations and removing deprecated schemas. - **`describe`**: Allows user to list and describe Schema Registry resources. Required for discovering available subjects, listing schema versions, and viewing metadata. - **`describe_configs`**: Allows user to read configuration settings. Required for viewing compatibility settings, reading modes (IMPORT/READWRITE), and checking global or per-subject configurations. - **`alter_configs`**: Allows user to modify configuration settings. Required for changing compatibility levels, setting IMPORT mode for migrations, and updating global or per-subject configurations. ### [](#common-use-cases)Common use cases The following examples show which operations are required for common Schema Registry tasks: #### [](#schema-registry-migration)Schema Registry migration When migrating schemas between clusters, you must have **different ACLs for source and target clusters**. **Source cluster (read-only):** ```bash # Read schemas from source Schema Registry rpk security acl create \ --allow-principal User:migrator-user \ --operation read,describe \ --registry-global \ --brokers ``` This grants: - `read` - Read schemas by ID from source - `describe` - List all subjects in source > 📝 **NOTE** > > The `describe_configs` operation is required to read Schema Registry configuration settings, including compatibility modes and IMPORT mode status. **Target cluster (read-write):** ```bash # Write schemas to target Schema Registry and manage IMPORT mode rpk security acl create \ --allow-principal User:migrator-user \ --operation write,describe,alter_configs,describe_configs \ --registry-global \ --brokers ``` This grants: - `write` - Register schemas in target with preserved IDs - `describe` - List all subjects in target - `alter_configs` - Set IMPORT mode on target Schema Registry - `describe_configs` - Read compatibility settings and mode > ❗ **IMPORTANT** > > **Schema Registry ACLs are only for Schema Registry operations.** For complete data migration, you must also use Kafka ACLs: > > - **Topics:** READ (source), WRITE/CREATE/DESCRIBE/ALTER (target) > > - **Consumer groups:** READ (source), CREATE/READ (target) > > - **Cluster:** DESCRIBE (both), CREATE (target) > > > See [Configure Access Control Lists](../../../security/authorization/acl/) for Kafka ACL configuration. > 📝 **NOTE** > > The target Schema Registry must be in IMPORT mode to preserve schema IDs during migration. Only superusers or principals with `alter_configs` permission on the `registry` resource can change the global mode. See [Set global mode](../schema-reg-api/#set-global-mode). #### [](#complete-migration-setup-workflow)Complete migration setup workflow For a complete migration setup, follow this workflow: 1. **Bootstrap superusers** - Configure superusers using `.bootstrap.yaml` before enabling authentication 2. **Create migration user** - Create dedicated migration user with minimal required permissions 3. **Configure Schema Registry ACLs** - Grant read access on source, read-write access on target 4. **Configure Kafka ACLs** - Grant topic read/write, consumer group, and cluster permissions 5. **Enable SASL authentication** - Enable SASL/SCRAM-SHA-256 on both clusters 6. **Enable ACL authorization** - Enable `kafka_enable_authorization` and `schema_registry_enable_authorization` 7. **Set target to IMPORT mode** - Enable IMPORT mode on target Schema Registry 8. **Start migration** - Begin data and schema migration 9. **Verify ACLs** - Test that permissions work correctly and restrictions are enforced 10. **Complete migration** - Disable IMPORT mode after migration completes For a complete working example with Docker Compose, see the [Redpanda Migrator Demo](https://github.com/redpanda-data/redpanda-labs/tree/main/docker-compose/redpanda-migrator-demo). > 📝 **NOTE** > > **Schema Registry Internal Client Authentication:** When SASL authentication is enabled on your Kafka cluster, the Schema Registry’s internal Kafka client must also be configured with SASL credentials. Configure these using node-level properties: > > ```bash > --set schema_registry_client.scram_username= > --set schema_registry_client.scram_password= > --set schema_registry_client.sasl_mechanism=SCRAM-SHA-256 > ``` > > Without these credentials, Schema Registry operations that interact with Kafka (like storing schema data) will fail with "broker\_not\_available" errors. #### [](#read-only-access-for-consumers)Read-only access for consumers Applications that only consume messages with schemas require: ```bash # For consuming with schema validation rpk security acl create \ --allow-principal consumer-app \ --operation read \ --registry-subject "orders-*" \ --resource-pattern-type prefixed ``` This allows: - Reading schema content by ID (embedded in messages) - Viewing specific schema versions This does _not_ allow listing all subjects or modifying schemas. #### [](#producer-access)Producer access Applications that produce messages with schemas require: ```bash # For producing with new schemas rpk security acl create \ --allow-principal producer-app \ --operation read,write,describe \ --registry-subject "orders-*" \ --resource-pattern-type prefixed ``` This allows: - Checking if schemas already exist (`describe`) - Reading existing schema versions (`read`) - Registering new schema versions (`write`) #### [](#schema-administrator-access)Schema administrator access Schema administrators who manage compatibility and cleanup require: ```bash # For full schema management rpk security acl create \ --allow-principal schema-admin \ --operation all \ --registry-global ``` This grants all operations, including: - Managing compatibility settings - Deleting deprecated schemas - Viewing and modifying configurations - Listing all subjects and schemas ### [](#pattern-based-acls-for-schema-registry)Pattern-based ACLs for Schema Registry When using subject name patterns (like `orders-*`), always specify `--resource-pattern-type prefixed`: ```bash # Correct - matches all subjects starting with "orders-" rpk security acl create \ --allow-principal User:app \ --operation read \ --registry-subject "orders-" \ --resource-pattern-type prefixed # Incorrect - treats "orders-*" as literal subject name rpk security acl create \ --allow-principal User:app \ --operation read \ --registry-subject "orders-*" ``` Pattern types: - **`prefixed`** - Matches subjects starting with the specified string (for example, `orders-` matches `orders-value`, `orders-key`) - **`literal`** - Matches exact subject name only (default if not specified) > 💡 **TIP** > > Redpanda recommends using the topic naming strategy where subjects follow the pattern `-key` or `-value`. With this strategy, you can use a single prefixed ACL to grant access to both key and value subjects for a topic. > > Example: `--registry-subject "orders-" --resource-pattern-type prefixed` grants access to both `orders-key` and `orders-value` subjects. ## [](#enable-schema-registry-authorization)Enable Schema Registry Authorization ### [](#prerequisites)Prerequisites Before you can enable Schema Registry Authorization, you must have: - `rpk` v25.2+ installed. For installation instructions, see [rpk installation](../../rpk/rpk-install/). - Cluster administrator permissions to modify cluster configurations. For example, to enable management of Schema Registry ACLs by the principal `schema_registry_admin`, run: + \[,bash\] ---- rpk security acl create --allow-principal schema\_registry\_admin --cluster --operation alter ---- ### [](#enable-authorization)Enable authorization To enable Schema Registry Authorization for your cluster, run: ```bash rpk cluster config set schema_registry_enable_authorization true ``` For details, see [`schema_registry_enable_authorization`](../../../reference/properties/cluster-properties/#schema_registry_enable_authorization). ## [](#create-and-manage-schema-registry-acls)Create and manage Schema Registry ACLs This section shows you how to create and manage ACLs for Schema Registry resources. ### [](#create-an-acl-for-a-topic-and-schema-registry-subject)Create an ACL for a topic and Schema Registry subject This example creates an ACL that allows the principal `panda` to read from both the topic `bar` and the Schema Registry subject `bar-value`. This pattern is common when you want to give a user or application access to both the Kafka topic and its associated schema. ```bash rpk security acl create --allow-principal panda --operation read --topic bar --registry-subject bar-value PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:panda * SUBJECT bar-value LITERAL READ ALLOW User:panda * TOPIC bar LITERAL READ ALLOW ``` ### [](#create-an-acl-for-global-schema-registry-access)Create an ACL for global Schema Registry access This example grants the user `jane` global read and write access to the Schema Registry, plus read and write access to the topic `private`. The `--registry-global` flag creates ACLs for all [global Schema Registry operations](#supported-operations). ```bash rpk security acl create --allow-principal jane --operation read,write --topic private --registry-global PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:jane * REGISTRY LITERAL READ ALLOW User:jane * REGISTRY LITERAL WRITE ALLOW User:jane * TOPIC private LITERAL READ ALLOW User:jane * TOPIC private LITERAL WRITE ALLOW ``` User `jane` now has global `read` and `write` access to the Schema Registry and to the topic `private`. ### [](#create-a-role-with-schema-registry-acls)Create a role with Schema Registry ACLs You can combine Schema Registry ACLs with [role-based access control (RBAC)](../../../security/authorization/rbac/rbac_dp/) to create reusable roles. This approach simplifies permission management when you need to assign the same set of permissions to multiple users. This example creates a role called `SoftwareEng` and assigns it ACLs for both topic and Schema Registry access: ```bash # Create the role rpk security role create SoftwareEng # Create ACLs for the role rpk security acl create \ --operation read,write \ --topic private \ --registry-subject private-key,private-value \ --allow-role SoftwareEng # You can add more ACLs to this role later rpk security acl create --allow-role "SoftwareEng" [additional-acl-flags] ``` After creating the role, assign it to users: ```bash rpk security role assign SoftwareEng --principal User:john,User:jane Successfully assigned role "SoftwareEng" to NAME PRINCIPAL-TYPE john User jane User ``` ### [](#troubleshooting-acl-creation)Troubleshooting ACL creation When creating ACLs that include Schema Registry subjects, you might encounter errors if the subject doesn’t exist or if there are configuration issues. #### [](#subject-not-found)Subject not found Sometimes an ACL for a Kafka topic is created successfully, but the Schema Registry subject ACL fails: ```bash rpk security acl create --allow-principal alice --operation read --topic bar --registry-subject bar-value PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:alice * SUBJECT bar-value LITERAL READ ALLOW Not found User:alice * TOPIC bar LITERAL READ ALLOW ``` In this example, the ACL for topic `bar` was created successfully, but the ACL for Schema Registry subject `bar-value` failed with a "Not found" error. **Common causes:** - Incorrect Schema Registry URL configuration - Using the incorrect version of Redpanda #### [](#debugging-with-verbose-output)Debugging with verbose output To get more detailed information about ACL creation failures, use the `-v` flag for verbose logging. In this case, the user gets a `Not found` error after attempting to create two ACLs, one for the subject and one for the topic: ```bash rpk security acl create --allow-principal alice --operation read --topic bar --registry-subject bar-value -v 12:17:33.911 DEBUG opening connection to broker {"addr": "127.0.0.1:9092", "broker": "seed_0"} 12:17:33.912 DEBUG connection opened to broker {"addr": "127.0.0.1:9092", "broker": "seed_0"} 12:17:33.912 DEBUG issuing api versions request {"broker": "seed_0", "version": 4} 12:17:33.912 DEBUG wrote ApiVersions v4 {"broker": "seed_0", "bytes_written": 31, "write_wait": 13.416µs", "time_to_write": "17.75µs", "err": null} 12:17:33.912 DEBUG read ApiVersions v4 {"broker": "seed_0", "bytes_read": 266, "read_wait": 16.209µs", "time_to_read": "8.360666ms", "err": null} 12:17:33.920 DEBUG connection initialized successfully {"addr": "127.0.0.1:9092", "broker": "seed_0"} 12:17:33.920 DEBUG wrote CreateACLs v2 {"broker": "seed_0", "bytes_written": 43, "write_wait": 9.0985ms, "time_to_write": "14µs", "err": null} 12:17:33.935 DEBUG read CreateACLs v2 {"broker": "seed_0", "bytes_read": 19, "read_wait": 23.792µs, "time_to_read": "14.323041ms", "err": null} 12:17:33.935 DEBUG sending request {"method": "POST", "URL: "http://127.0.0.1:8081/security/acls", "has_bearer": false, "has_basic_auth": false} PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:alice * SUBJECT bar-value LITERAL READ ALLOW Not found User:alice * TOPIC bar LITERAL READ ALLOW ``` The `Not found` error occurs in the request: `12:17:33.935 DEBUG sending request {"method": "POST", "URL: "http://127.0.0.1:8081/security/acls", "has_bearer": false, "has_basic_auth": false}`. This typically means the endpoint is unavailable. Verify: - You’re on Redpanda v25.2+. - `schema_registry_enable_authorization` is set to `true`. - Your rpk Schema Registry URL points to the correct host/scheme/port. Upgrade if needed and correct configuration before retrying. #### [](#inconsistent-listener-configuration)Inconsistent listener configuration This error occurs when the user tries to create an ACL for a principal: ```bash rpk security acl create --allow-principal "superuser" --operation "all" --registry-global -v 13:07:02.810 DEBUG opening connection to broker {"addr": "seed-036d6a67.d2hiu9c8ljef72usuu20.fmc.prd.cloud.redpanda.com:9092", "broker": "seed_0"} ... 13:07:03.304 DEBUG sending request {"method": "POST", "URL": "https://127.0.0.1:8080/security/acls", "has_bearer": false, "has_basic_auth": true} PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:superuser * REGISTRY LITERAL ALL ALLOW unable to POST "https://127.0.0.1:8080/security/acls": Post "https://127.0.0.1:8080/security/acls": http: server gave HTTP response to HTTPS client ``` When using Schema Registry Authorization, ensure that your Kafka brokers and Schema Registry address target the same cluster and that the Schema Registry address uses the correct scheme/host/port. In the example above, `rpk` communicates with a remote broker (`…​:9092`) but posts to a local Schema Registry address over HTTPS (`[https://127.0.0.1:8080/security/acls](https://127.0.0.1:8080/security/acls)`), while the local Schema Registry appears to be HTTP-only. To align them: \* Set the correct Schema Registry address (host and scheme) for the target cluster. \* Ensure TLS settings match the Schema Registry endpoint (HTTP vs HTTPS). \* Avoid mixing remote broker addresses with a local Schema Registry address unless it is intentional and properly configured. See [rpk registry](../../../reference/rpk/rpk-registry/rpk-registry/) for Schema Registry configuration commands. #### [](#resource-names-do-not-appear)Resource names do not appear The following output appears to suggest that there are missing resource names for the registry resource types: ```bash rpk security acl create --allow-principal jane --operation read,write --topic private --registry-global PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR User:jane * REGISTRY LITERAL READ ALLOW User:jane * REGISTRY LITERAL WRITE ALLOW User:jane * TOPIC private LITERAL READ ALLOW User:jane * TOPIC private LITERAL WRITE ALLOW ``` When using the `--registry-global` option, be aware that `REGISTRY` resource types are global and apply to all of Schema Registry. They do not have a resource name because they are not tied to a specific resource. There are no resource names missing here. #### [](#schema-registry-broker_not_available-errors)Schema Registry "broker\_not\_available" errors If Schema Registry operations fail with `broker_not_available` errors after enabling SASL: ```bash {"error_code":50302,"message":"{ node: -1 }, { error_code: broker_not_available [8] }"} ``` **Cause:** The Schema Registry’s internal Kafka client is not configured with SASL credentials. **Solution:** Configure the Schema Registry client credentials: ```bash rpk cluster config set schema_registry_client.scram_username rpk cluster config set schema_registry_client.scram_password rpk cluster config set schema_registry_client.sasl_mechanism SCRAM-SHA-256 ``` Then restart the Schema Registry service. #### [](#pattern-based-acl-not-working)Pattern-based ACL not working If a pattern-based ACL (like `orders-*`) is not matching expected subjects: **Cause:** Missing `--resource-pattern-type prefixed` flag. **Solution:** Recreate the ACL with the correct pattern type: ```bash # Delete incorrect ACL rpk security acl delete \ --allow-principal User:app \ --operation read \ --registry-subject "orders-*" # Create correct ACL with pattern type rpk security acl create \ --allow-principal User:app \ --operation read \ --registry-subject "orders-" \ --resource-pattern-type prefixed ``` > 📝 **NOTE** > > Pattern matching uses the string without the asterisk when using `prefixed` type. ## [](#suggested-reading)Suggested reading - [Redpanda Schema Registry](../schema-reg-overview/) - [rpk registry](../../../reference/rpk/rpk-registry/rpk-registry/) - [Schema Registry API](/api/doc/schema-registry/) - [Monitor Schema Registry service-level metrics](../../monitor-cloud/#service-level-queries) - [Deserialization](../record-deserialization/#schema-registry) --- # Page 455: Redpanda Schema Registry **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/schema-reg-overview.md --- # Redpanda Schema Registry --- title: Redpanda Schema Registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/schema-reg-overview page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/schema-reg-overview.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/schema-reg-overview.adoc description: Redpanda's Schema Registry provides the interface to store and manage event schemas. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- In Redpanda, the messages exchanged between producers and consumers contain raw bytes. Schemas enable producers and consumers to share the information needed to serialize and deserialize those messages. They register and retrieve the schemas they use in the Schema Registry to ensure data verification. Schemas are versioned, and the registry supports configurable compatibility modes between schema versions. When a producer or a consumer requests to register a schema change, the registry checks for schema compatibility and returns an error for an incompatible change. Compatibility modes can ensure that data flowing through a system is well-structured and easily evolves. > ❗ **IMPORTANT** > > **Schema size best practice**: Schema Registry works best with schemas of 128KB in size or less. Large schemas can consume significant memory resources and may cause system instability or crashes, particularly in memory-constrained environments. For Protobuf and Avro schemas, Redpanda recommends using schema [references](../schema-reg-api/#reference-a-schema) to break up large schemas into smaller constituent parts. > 📝 **NOTE** > > The Schema Registry is built directly into the Redpanda binary. It runs out of the box with Redpanda’s default configuration, and it requires no new binaries to install and no new services to deploy or maintain. You can use it with the [Schema Registry API](../schema-reg-api/) or [Redpanda Cloud](../schema-reg-ui/). ## [](#schema-terminology)Schema terminology **Schema**: A schema is an external mechanism to describe the structure of data and its encoding. Producer clients and consumer clients use a schema as an agreed-upon format for sending and receiving messages. Schemas enable a loosely coupled, data-centric architecture that minimizes dependencies in code, between teams, and between producers and consumers. **Subject**: A subject is a logical grouping for schemas. When data formats are updated, a new version of the schema can be registered under the same subject, allowing for backward and forward compatibility. A subject may have more than one schema version assigned to it, with each schema having a different numeric ID. **Serialization format**: A serialization format defines how data is converted into bytes that are transmitted and stored. Serialization, by producers, converts an event into bytes. Redpanda then stores these bytes in topics. Deserialization, by consumers, converts the bytes of arrays back into the desired data format. Redpanda’s Schema Registry supports Avro, Protobuf, and JSON serialization formats. **Normalization**: Normalization is the process of converting a schema into a canonical form. When a schema is normalized, it can be compared and considered equivalent to another schema that may contain minor syntactic differences. Schema normalization allows you to more easily manage schema versions and compatibility by prioritizing meaningful logical changes. Normalization is supported for Avro, JSON, and Protobuf formats during both schema registration and lookup for a subject. ## [](#redpanda-design-overview)Redpanda design overview Every broker allows mutating REST calls, so there’s no need to configure leadership or failover strategies. Schemas are stored in a compacted topic, and the registry uses optimistic concurrency control at the topic level to detect and avoid collisions. > ❗ **IMPORTANT** > > The Schema Registry publishes an internal topic, `_schemas`, as its backend store. This internal topic is reserved strictly for schema metadata and support purposes. **Do not directly edit or manipulate the `_schemas` topic unless directed to do so by Redpanda Support.** Redpanda Schema Registry uses the default port 8081. ## [](#wire-format)Wire format With Schema Registry, producers and consumers can use a specific message format, called the wire format. The wire format facilitates a seamless transfer of data by ensuring that clients easily access the correct schema in the Schema Registry for a message. The wire format is a sequence of bytes consisting of the following: 1. The "magic byte," a single byte that always contains the value of 0. 2. A four-byte integer containing the schema ID. 3. The rest of the serialized message. ![Schema Registry wire format](../../../shared/_images/schema-registry-wire-format.png) In the serialization process, the producer hands over the message to a key/value serializer that is part of the respective language-specific SDK. The serializer first checks whether the schema ID for the given subject exists in the local schema cache. The serializer derives the subject name based on several strategies, such as the topic name. You can also explicitly set the subject name. If the schema ID isn’t in the cache, the serializer registers the schema in the Schema Registry and collects the resulting schema ID in the response. In either case, when the serializer has the schema ID, it pads the beginning of the message with the magic byte and the encoded schema ID, and returns the byte sequence to the producer to write to the topic. In the deserialization process, the consumer fetches messages from the broker and hands them over to a deserializer. The deserializer first checks the presence of the magic byte and rejects the message if it doesn’t follow the wire format. The deserializer then reads the schema ID and checks whether that schema exists in its local cache. If it finds the schema, it deserializes the message according to that schema. Otherwise, the deserializer retrieves the schema from the Schema Registry using the schema ID, then the deserializer proceeds with deserialization. ## [](#schema-examples)Schema examples To experiment with schemas from applications, see the clients in [redpanda-labs](https://github.com/redpanda-data/redpanda-labs/tree/main). For a basic end-to-end example, the following Protobuf schema contains information about products: a unique ID, name, price, and category. It has a schema ID of 1, and the Topic name strategy, with a topic of Orders. (The Topic strategy is suitable when you want to group schemas by the topics to which they are associated.) ```json syntax = "proto3"; message Product { int32 ProductID = 1; string ProductName = 2; double Price = 3; string Category = 4; } ``` The producer then does something like this: ```json from kafka import KafkaProducer from productpy import Product # This imports the prototyped schema # Create a Kafka producer producer = KafkaProducer(bootstrap_servers='your_kafka_brokers') # Create a Product message product_message = Product( ProductID=123, ProductName="Example Product", Price=45.99, Category="Electronics" ) # Produce the Product message to the "Orders" topic producer.send('Orders', key='product_key', value=product_message.SerializeToString()) ``` To add an additional field for product variants, like size or color, the new schema (version 2, ID 2) would look like this: ```json syntax = "proto3"; message Product { int32 ProductID = 1; string ProductName = 2; double Price = 3; string Category = 4; repeated string Variants = 5; } ``` You would want the compatibility setting to accommodate adding new fields without breakage. Adding an optional new field to a schema is inherently backward-compatible. New consumers can process events written with the new schema, and older consumers can ignore it. ## [](#json-schema)JSON Schema All CRUD operations are supported for the JSON Schema (`json-schema`), and Redpanda supports [all published JSON Schema specifications](https://json-schema.org/specification), which include: - draft-04 - draft-06 - draft-07 - 2019-09 - 2020-12 ### [](#limitations)Limitations Schemas are held in subjects. Subjects have a compatibility configuration associated with them, either directly specified by a user, or inherited by the default. See `PUT /config` and `PUT/config/{subject}` in the [Schema Registry API](/api/doc/schema-registry/). If you have inserted a second schema into a subject where the compatibility level is anything but `NONE`, then any JSON Schema containing the following items are rejected: - `$ref` - `$defs` (`definitions` prior to draft 2019-09) - `dependentSchemas` / `dependentRequired` (`dependencies` prior to draft 2019-09) - `prefixItems` Consequently, you cannot [structure a complex schema](https://json-schema.org/understanding-json-schema/structuring) using these features. ## [](#next-steps)Next steps - [Use the Schema Registry API](../schema-reg-api/) ## [](#suggested-reading)Suggested reading - [Schema Registry API](/api/doc/schema-registry/) - [Deserialization](../record-deserialization/) - [Monitor Schema Registry service-level metrics](../../monitor-cloud/#service-level-queries) --- # Page 456: Use Schema Registry **URL**: https://docs.redpanda.com/redpanda-cloud/manage/schema-reg/schema-reg-ui.md --- # Use Schema Registry --- title: Use Schema Registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: schema-reg/schema-reg-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: schema-reg/schema-reg-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/schema-reg/schema-reg-ui.adoc description: Perform common Schema Registry management operations in Redpanda Cloud. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- In Redpanda Cloud, the **Schema Registry** menu lists registered and verified schemas, including their serialization format and versions. Select an individual schema to see which topics it applies to. > 📝 **NOTE** > > The Schema Registry is built into Redpanda, and you can use it with the Schema Registry API or with the UI. This section describes Schema Registry operations available in the UI. ## [](#create-or-edit-a-schema)Create or edit a schema A schema is registered in the registry with a _subject_, which is a name that is associated with the schema as it evolves. To register a schema, click **Create new schema**. 1. On the **Create schema** page, select the strategy type for how to derive the subject name. - **Topic** (default): The subject name is derived from the Redpanda topic name. See [Topic strategy use case](#topic-strategy-use-case). - **Record**: The subject name is derived from the Kafka record name. See [Record strategy use case](#record-strategy-use-case). - **TopicRecord**: The subject name is derived from both topic name and record name, allowing for finer-grained schema organization. See [TopicRecord strategy use case](#topicrecord-strategy-use-case). - **Custom**: The subject name is user-defined. 2. Select the serialization format with the schema definition. 3. To build more complex schema definitions, add a reference to other schemas. For example, the two `import` statements are references to the `PhoneNumber` and `Address` schemas: ```json { syntax = "proto3"; import "PhoneNumber.proto"; import "Address.proto"; message Person { string name = 1; string email = 2; PhoneNumber phone = 3; repeated Address address = 4; } } ``` 4. After registering a schema, you can add a new version to it, change its compatibility, or delete it. ### [](#topic-strategy-use-case)Topic strategy use case The Topic strategy is suitable when you want to group schemas by the topics to which they are associated. Suppose you’re tracking product order information in a topic named `Transactions`. When a producer sends records to the `OrderInfo` topic, you want the record names to look something like: - `Transactions - Record1` - `Transactions - Record2` Where `Record1` and `Record2` are unique identifiers. This is usually defined in your producer settings. Create your schema with the Topic strategy, and the subject name is always `Transactions`, with all customer transactions under the same topic. ### [](#record-strategy-use-case)Record strategy use case The Record strategy is most useful when you have multiple schemas within a topic and need more granular categorization that’s influenced by the record name. Suppose there’s an `Events` topic with event types A and B. You may want each of those event types to have their own subject, their own schemas, and their own fully-qualified record names (for example, `com.example.EventTypeA`). If each event type has its own schema with the Record strategy, then when producers send these event types to the `Events` topic, their subjects are those record names: - `com.example.EventTypeA` - `com.example.EventTypeB` The record names in the Events topic look like this: - `Events-com.example.EventTypeA-Record1` - `Events-com.example.EventTypeB-Record1` - `Events-com.example.EventTypeA-Record2` - `Events-com.example.EventTypeB-Record2` ### [](#topicrecord-strategy-use-case)TopicRecord strategy use case The TopicRecord strategy is suitable when you want to organize schemas based on both topics and logical record types. Suppose there’s a microservices architecture where different services produce to the same topic: `SharedEvents`. Each microservice has a schema of its own for the shared events, but each schema uses the TopicRecord strategy. This results in the following subject names: - `SharedEvents-com.example.MicroserviceAEvent` - `SharedEvents-com.example.MicroserviceBEvent` The record names look like this: - `SharedEvents-com.example.MicroserviceAEvent-Record1` - `SharedEvents-com.example.MicroserviceBEvent-Record1` - `SharedEvents-com.example.MicroserviceAEvent-Record2` - `SharedEvents-com.example.MicroserviceBEvent-Record2` This allows for multiple schemas to govern the same shared events for different microservices, allowing granular organization. ## [](#configure-schema-compatibility)Configure schema compatibility Applications are often modeled around a specific business object structure. As applications change and the shape of their data changes, producer schemas and consumer schemas may no longer be compatible. You can decide how a consumer handles data from a producer that uses an older or newer schema, and reduce the chance of consumers hitting deserialization errors. You can configure different types of schema compatibility, which are applied to a subject when a new schema is registered. The Schema Registry supports the following compatibility types: - `BACKWARD` (**default**) - Consumers using the new schema (for example, version 10) can read data from producers using the previous schema (for example, version 9). - `BACKWARD_TRANSITIVE` - Consumers using the new schema (for example, version 10) can read data from producers using all previous schemas (for example, versions 1-9). - `FORWARD` - Consumers using the previous schema (for example, version 9) can read data from producers using the new schema (for example, version 10). - `FORWARD_TRANSITIVE` - Consumers using any previous schema (for example, versions 1-9) can read data from producers using the new schema (for example, version 10). - `FULL` - A new schema and the previous schema (for example, versions 10 and 9) are both backward and forward compatible with each other. - `FULL_TRANSITIVE` - Each schema is both backward and forward compatible with all registered schemas. - `NONE` - No schema compatibility checks are done. ### [](#compatibility-uses-and-constraints)Compatibility uses and constraints - A consumer that wants to read a topic from the beginning (for example, an AI learning process) benefits from backward compatibility. It can process the whole topic using the latest schema. This allows producers to remove fields and add attributes. - A real-time consumer that doesn’t care about historical events but wants to keep up with the latest data (for example, a typical streaming application) benefits from forward compatibility. Even if producers change the schema, the consumer can carry on. - Full compatibility can process historical data and future data. This is the safest option, but it limits the changes that can be done. This only allows for the addition and removal of optional fields. If you make changes that are not inherently backward-compatible, you may need to change compatibility settings or plan a transitional period, updating producers and consumers to use the new schema while the old one is still accepted. | Schema format | Backward-compatible tasks | Not backward-compatible tasks | | --- | --- | --- | | Avro | Add fields with default valuesMake fields nullable | Remove fieldsChange data types of fieldsChange enum valuesChange field constraintsChange record of field names | | Protobuf | Add fieldsRemove fields | Remove required fieldsChange data types of fields | | JSON | Add optional propertiesRelax constraints, for example:Decrease a minimum value or increase a maximum valueDecrease minItems, minLength, or minProperties; increase maxItems, maxLength, maxPropertiesAdd more property types (for example, "type": "integer" to "type": ["integer", "string"])Add more enum valuesReduce multipleOf by an integral factorRelaxing additional properties if additionalProperties was not previously specified as falseRemoving a uniqueItems property that was false | Remove propertiesAdd required propertiesChange property names and typesTighten or add constraints | ## [](#delete-a-schema)Delete a schema Select a schema to soft-delete a version of it or all schemas of its subject. Schemas cannot be deleted if any other schemas reference it. A soft-deleted schema can be recovered, but a permanently-deleted schema cannot be recovered. Redpanda does not recommend permanently deleting schemas in a production environment. ## [](#suggested-reading)Suggested reading - [Redpanda Schema Registry](../schema-reg-overview/) --- # Page 457: Redpanda Terraform Provider **URL**: https://docs.redpanda.com/redpanda-cloud/manage/terraform-provider.md --- # Redpanda Terraform Provider --- title: Redpanda Terraform Provider latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: terraform-provider page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: terraform-provider.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/terraform-provider.adoc description: Use the Redpanda Terraform provider to create and manage Redpanda Cloud resources. page-git-created-date: "2024-10-10" page-git-modified-date: "2026-03-11" --- The [Redpanda Terraform provider](https://registry.terraform.io/providers/redpanda-data/redpanda/latest) allows you to manage your Redpanda Cloud infrastructure as code using [Terraform](https://www.terraform.io/). Terraform is an infrastructure-as-code tool that enables you to define, automate, and version-control your infrastructure configurations. With the Redpanda Terraform provider, you can manage: - [ACLs](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/acl) - [Clusters](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/cluster) - [Networks](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/network) - [Pipelines (Redpanda Connect)](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/pipeline) - [Resource groups](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/resource_group) - [Roles](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/role) - [Role assignments](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/role_assignments) - [Schemas](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/schema) - [Schema Registry ACLs](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/schema_registry_acl) - [Serverless clusters](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/serverless_cluster) - [Serverless private links](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/serverless_private_link) - [Topics](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/topic) - [Users](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/user) ## [](#why-use-terraform-with-redpanda)Why use Terraform with Redpanda? - **Simplicity**: Manage all your Redpanda Cloud resources in one place. - **Automation**: Create and modify resources without manual intervention. - **Version Control**: Track and roll back changes using version control systems, such as GitHub. - **Scalability**: Scale your infrastructure as your needs grow with minimal effort. ## [](#understand-terraform-configurations)Understand Terraform configurations Terraform configurations are written in [HCL (HashiCorp Configuration Language)](https://developer.hashicorp.com/terraform/language), which is declarative. Here are the main building blocks of a Terraform configuration: ### [](#providers)Providers Providers tell Terraform how to communicate with the services you want to manage. For example, the Redpanda provider connects to the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) using client credentials. ```hcl provider "redpanda" { client_id = "" client_secret = "" } ``` ### [](#resources)Resources Resources define the infrastructure components you want to create, such as networks, clusters, or topics. Each resource block specifies the type of resource and its configuration. ```hcl resource "redpanda_network" "example" { (1) name = "example-network" (2) cloud_provider = "aws" (3) region = "us-east-1" (4) cidr_block = "10.0.0.0/20" (5) } ``` | 1 | The resource type and internal name. The first part of this resource block specifies the type of resource being created. In this case, it is a redpanda_network, which defines a network for Redpanda Cloud. Different resource types include redpanda_cluster, redpanda_topic, and others. The second part is the internal name Terraform uses to identify this specific resource within your configuration. In this case, the internal name is example. This internal name allows you to reference the resource in other parts of your configuration. For example, redpanda_network.example.id can be used to access the unique ID of the network after it is created. The name does not affect the resource in Redpanda Cloud. It is for Terraform’s internal use. | | --- | --- | | 2 | A user-defined name for the resource as it will appear in Redpanda Cloud. This is the user-facing name visible in the Redpanda UI and API. | | 3 | The cloud provider where the network is deployed, such as AWS or GCP. | | 4 | The region where the resource will be provisioned. | | 5 | The IP address range for the network. | ### [](#variables)Variables Variables allow you to parameterize your configuration, making it reusable and customizable for different environments. Use `variable` blocks to define reusable values, like `region`, which can be overridden when running Terraform. ```hcl variable "region" { default = "us-east-1" } resource "redpanda_network" "example" { name = "example-network" cloud_provider = "aws" region = var.region cidr_block = "10.0.0.0/20" } ``` ### [](#outputs)Outputs Outputs let you extract information about your infrastructure, such as cluster URLs, to use in other configurations or scripts. This example will display the cluster’s API URL after Terraform provisions the resources: ```hcl output "cluster_api_url" { value = data.redpanda_cluster.example.cluster_api_url } ``` ## [](#limitations)Limitations The following functionality is supported in the Cloud API but not in the Redpanda Terraform provider: - Creating or deleting BYOVNet clusters on Azure - Secrets - Kafka Connect > ⚠️ **WARNING** > > Do not modify `throughput_tier` after it is set. When `allow_deletion` is set to `true`, modifying `throughput_tier` forces replacement of the cluster: Terraform will destroy the existing cluster and create a new one, causing data loss. ## [](#prerequisites)Prerequisites > ❗ **IMPORTANT** > > **Redpanda Terraform Provider - Windows Support Notice** > > The Redpanda Terraform provider is not supported on Windows systems. If you’re using Windows, you must use Windows Subsystem for Linux 2 (WSL2) to run the Redpanda Terraform provider. > > To use WSL2 with the Redpanda Terraform provider: > > 1. If WSL2 is not already installed, install it by running the following command in PowerShell as Administrator: > > ```powershell > wsl --install > ``` > > Then restart your computer. > > 2. Open your WSL2 Linux distribution (e.g., Ubuntu) from the Start menu or by running `wsl` in PowerShell. > > 3. Navigate to your project directory within WSL2. > > 4. Run all Terraform commands from within your WSL2 environment: > > ```bash > # Initialize Terraform and download the Redpanda provider > terraform init > > # Plan your Redpanda infrastructure changes > terraform plan > > # Apply the configuration to create Redpanda resources > terraform apply > > # View created resources > terraform show > ``` 1. Install at least version 1.0.0 of Terraform using the [official guide](https://learn.hashicorp.com/tutorials/terraform/install-cli). 2. Create a service account in Redpanda Cloud: 1. Log in to [Redpanda Cloud](https://cloud.redpanda.com). 2. Navigate to the **Organization IAM** page and select the **Service account** tab. Click **Create service account** and provide a name for the new service account. 3. Save the client ID and client secret for authentication. ## [](#set-up-the-provider)Set up the provider To set up the provider, you need to download the provider and authenticate to the Redpanda Cloud API. You can authenticate to the Redpanda Cloud API using environment variables or static credentials in your configuration file. 1. Add the Redpanda provider to your Terraform configuration: ```hcl terraform { required_providers { redpanda = { source = "redpanda-data/redpanda" version = "~> 1.0" } } } ``` 2. Initialize Terraform to download the provider: ```bash terraform init ``` 3. Add the credentials for the Redpanda Cloud service account you set in [Prerequisites](#prerequisites). In the Redpanda Cloud UI, find the client ID and client secret under **Organization IAM → Service accounts**. Set them as environment variables, or enter them in your Terraform configuration file: ### Environment variables ```bash REDPANDA_CLIENT_ID= REDPANDA_CLIENT_SECRET= ``` ### Static credentials ```hcl provider "redpanda" { client_id = "" client_secret = "" } ``` ## [](#examples)Examples This section provides examples of using the Redpanda Terraform provider to create and manage clusters. For descriptions of resources and data sources, see the [Redpanda Terraform Provider documentation](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs). For more information on the different cluster types mentioned in these examples, see [Redpanda Cloud cluster types](../../get-started/cloud-overview/#redpanda-cloud-cluster-types). > 💡 **TIP** > > See the full list of zones and tiers available with each cloud provider in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). ### [](#create-a-byoc-cluster)Create a BYOC cluster A BYOC (Bring Your Own Cloud) cluster allows you to provision a cluster in your own cloud account. This example creates a BYOC cluster on AWS with a custom network, resource group, and cluster configuration. ```hcl terraform { required_providers { redpanda = { source = "redpanda-data/redpanda" version = "~> 1.0" } } } # Variables to parameterize the configuration variable "resource_group_name" { description = "Name of the Redpanda resource group" default = "testname" } variable "network_name" { description = "Name of the Redpanda network" default = "testname" } variable "cluster_name" { description = "Name of the Redpanda BYOC cluster" default = "test-cluster" } variable "region" { description = "Region for the Redpanda network and cluster" default = "us-east-2" } variable "cloud_provider" { description = "Cloud provider for the Redpanda network" default = "aws" } variable "zones" { description = "List of availability zones for the cluster" type = list(string) default = ["use2-az1", "use2-az2", "use2-az3"] } variable "cidr_block" { description = "CIDR block for the Redpanda network" default = "10.0.0.0/20" } variable "throughput_tier" { description = "Throughput tier for the cluster" default = "tier-1-aws-v2-x86" } # Redpanda provider configuration provider "redpanda" {} # Create a Redpanda resource group resource "redpanda_resource_group" "test" { name = var.resource_group_name } # Create a Redpanda network resource "redpanda_network" "test" { name = var.network_name resource_group_id = redpanda_resource_group.test.id cloud_provider = var.cloud_provider region = var.region cluster_type = "byoc" # Specify BYOC cluster type cidr_block = var.cidr_block } # Create a Redpanda BYOC cluster resource "redpanda_cluster" "test" { name = var.cluster_name resource_group_id = redpanda_resource_group.test.id network_id = redpanda_network.test.id cloud_provider = var.cloud_provider region = var.region cluster_type = "byoc" connection_type = "public" # Publicly accessible cluster throughput_tier = var.throughput_tier zones = var.zones allow_deletion = true # Allow the cluster to be deleted tags = { # Add metadata tags "environment" = "dev" } } ``` ### [](#create-a-dedicated-cluster)Create a Dedicated cluster A Dedicated cluster is fully managed by Redpanda and ensures consistent performance. This example provisions a cluster on AWS with specific zones and usage tiers. ```hcl terraform { required_providers { redpanda = { source = "redpanda-data/redpanda" version = "~> 1.0" } } } # Variables for configuration variable "resource_group_name" { description = "Name of the Redpanda resource group" default = "test-dedicated-group" } variable "network_name" { description = "Name of the Redpanda network" default = "dedicated-network" } variable "cluster_name" { description = "Name of the Redpanda dedicated cluster" default = "dedicated-cluster" } variable "region" { description = "Region for the Redpanda network and cluster" default = "us-west-1" } variable "cloud_provider" { description = "Cloud provider for the Redpanda network" default = "aws" } variable "zones" { description = "List of availability zones for the cluster" type = list(string) default = ["usw1-az1", "usw1-az2", "usw1-az3"] } variable "cidr_block" { description = "CIDR block for the Redpanda network" default = "10.1.0.0/20" } variable "throughput_tier" { description = "Throughput tier for the dedicated cluster" default = "tier-1-aws-v2-arm" } # Redpanda provider configuration provider "redpanda" {} # Create a Redpanda resource group resource "redpanda_resource_group" "test" { name = var.resource_group_name } # Create a Redpanda network resource "redpanda_network" "test" { name = var.network_name resource_group_id = redpanda_resource_group.test.id cloud_provider = var.cloud_provider region = var.region cluster_type = "dedicated" # Specify Dedicated cluster type cidr_block = var.cidr_block } # Create a Redpanda dedicated cluster resource "redpanda_cluster" "test" { name = var.cluster_name resource_group_id = redpanda_resource_group.test.id network_id = redpanda_network.test.id cloud_provider = var.cloud_provider region = var.region cluster_type = "dedicated" connection_type = "public" throughput_tier = var.throughput_tier zones = var.zones allow_deletion = true aws_private_link = { # Configure AWS PrivateLink for dedicated clusters enabled = true connect_console = true allowed_principals = ["arn:aws:iam::123456789024:root"] supported_regions = ["us-east-1", "us-west-2"] # Optional: Enable cross-region PrivateLink } tags = { "environment" = "dev" } } ``` ### [](#create-a-serverless-cluster)Create a Serverless cluster A Serverless cluster is cost-effective and scales automatically based on usage. This example creates a cluster in the `us-east-1` region with minimal configuration. ```hcl terraform { required_providers { redpanda = { source = "redpanda-data/redpanda" version = "~> 1.0" } } } # Redpanda provider configuration provider "redpanda" {} # Define a resource group for the Serverless cluster resource "redpanda_resource_group" "test" { name = var.resource_group_name # Name of the resource group } # Create a Serverless cluster resource "redpanda_serverless_cluster" "test" { name = var.cluster_name # Name of the Serverless cluster resource_group_id = redpanda_resource_group.test.id # Link to the resource group serverless_region = var.region # Specify the region for the cluster } # Variables for parameterizing the configuration variable "resource_group_name" { description = "Name of the Redpanda resource group" default = "testgroup" # Default name for the resource group } variable "cluster_name" { description = "Name of the Redpanda Serverless cluster" default = "testname" # Default name for the Serverless cluster } variable "region" { description = "Region for the Serverless cluster" default = "us-east-1" # Default region for the cluster } ``` ### [](#manage-an-existing-cluster)Manage an existing cluster To manage resources in existing Redpanda Cloud clusters, you must reference the cluster using the cluster ID (Redpanda ID). The following example creates a topic in a cluster with ID `byoc-cluster-id`. The `redpanda_topic` resource contains a field `cluster_api_url` that references the `data.redpanda_cluster.byoc.cluster_api_url` data resource. ```hcl data "redpanda_cluster" "byoc" { id = "byoc-cluster-id" } resource "redpanda_topic" "example" { name = "example-topic" partition_count = 3 replication_factor = 3 cluster_api_url = data.redpanda_cluster.byoc.cluster_api_url } ``` ### [](#manage-schema-registry-and-schema-registry-acls)Manage Schema Registry and Schema Registry ACLs You can also use Terraform to manage data plane resources, such as schemas and access controls, through the Redpanda Schema Registry. The Redpanda Schema Registry provides centralized management of schemas for producers and consumers, ensuring compatibility and consistency of data serialized with formats such as Avro, Protobuf, or JSON Schema. Using the Redpanda Terraform provider, you can create, update, and delete schemas as well as manage fine-grained access control for Schema Registry resources. You can use the following Terraform resources: - `redpanda_schema`: Defines and manages schemas in the Schema Registry. - `redpanda_schema_registry_acl`: Defines access control policies for Schema Registry subjects or registry-wide operations. #### [](#create-a-schema)Create a schema The `redpanda_schema` resource registers a schema in the Redpanda Schema Registry. Each schema is associated with a subject, which serves as the logical namespace for schema versioning. When you create or update a schema, Redpanda validates its compatibility level. ```hcl data "redpanda_cluster" "byoc" { id = "byoc-cluster-id" } resource "redpanda_user" "schema_user" { name = "schema-user" password = var.schema_password mechanism = "scram-sha-256" cluster_api_url = data.redpanda_cluster.byoc.cluster_api_url allow_deletion = true } resource "redpanda_schema" "user_events" { cluster_id = data.redpanda_cluster.byoc.id subject = "user_events-value" schema_type = "AVRO" schema = jsonencode({ type = "record" name = "UserEvent" fields = [ { name = "user_id", type = "string" }, { name = "event_type", type = "string" }, { name = "timestamp", type = "long" } ] }) username = redpanda_user.schema_user.name password = var.schema_password } ``` In this example: - `cluster_id` identifies the Redpanda cluster where the schema is stored. - `subject` defines the logical name under which schema versions are registered. - `schema_type` specifies the serialization type (`AVRO`, `JSON`, or `PROTOBUF`). - `schema` provides the full schema definition, encoded with `jsonencode()`. - `username` and `password` authenticate the user to the Schema Registry. #### [](#store-credentials-securely)Store credentials securely Store credentials using environment variables or sensitive Terraform variables. For short-lived credentials or CI/CD usage, use provider-level environment variables: ```bash export REDPANDA_SR_USERNAME=schema-user export REDPANDA_SR_PASSWORD="your-secret-password" ``` Or, declare a sensitive Terraform variable and inject it at runtime: ```hcl variable "schema_password" { description = "Password for the Schema Registry user" sensitive = true } ``` Then, set the value securely using an environment variable before running Terraform: ```bash export TF_VAR_schema_password="your-secret-password" ``` This avoids committing secrets to source control. #### [](#manage-schema-registry-acls)Manage Schema Registry ACLs The `redpanda_schema_registry_acl` resource configures fine-grained access control for Schema Registry subjects or registry-wide operations. Each ACL specifies which principal can perform specific operations on a subject or the registry. ```hcl resource "redpanda_schema_registry_acl" "allow_user_read" { cluster_id = data.redpanda_cluster.byoc.id principal = "User:${redpanda_user.schema_user.name}" resource_type = "SUBJECT" # SUBJECT or REGISTRY resource_name = "user_events-value" pattern_type = "LITERAL" # LITERAL or PREFIXED host = "*" operation = "READ" # READ, WRITE, DELETE, DESCRIBE, etc. permission = "ALLOW" # ALLOW or DENY username = redpanda_user.schema_user.name password = var.schema_password } ``` In this example: - `cluster_id` identifies the cluster that hosts the Schema Registry. - `principal` specifies the user or service account (for example, `User:alice`). - `resource_type` determines whether the ACL applies to a specific `SUBJECT` or the entire `REGISTRY`. - `resource_name` defines the subject name (use `*` for wildcard). - `pattern_type` controls how the resource name is matched (`LITERAL` or `PREFIXED`). - `operation` defines the permitted action (`READ`, `WRITE`, `DELETE`, etc.). - `permission` defines whether the operation is allowed or denied. - `host` specifies the host filter (typically `"*"` for all hosts). - `username` and `password` authenticate the principal to the Schema Registry. > 💡 **TIP** > > To manage Schema Registry ACLs, the user must have cluster-level `ALTER` permissions. This is typically granted through a Kafka ACL with `ALTER` on the `CLUSTER` resource. #### [](#combine-schema-and-acls)Combine schema and ACLs You can define both the schema and its ACLs in a single configuration to automate schema registration and access setup. ```hcl data "redpanda_cluster" "byoc" { id = "byoc-cluster-id" } resource "redpanda_user" "schema_user" { name = "schema-user" password = var.schema_password mechanism = "scram-sha-256" cluster_api_url = data.redpanda_cluster.byoc.cluster_api_url allow_deletion = true } resource "redpanda_schema" "user_events" { cluster_id = data.redpanda_cluster.byoc.id subject = "user_events-value" schema_type = "AVRO" schema = jsonencode({ type = "record" name = "UserEvent" fields = [ { name = "user_id", type = "string" }, { name = "event_type", type = "string" }, { name = "timestamp", type = "long" } ] }) username = redpanda_user.schema_user.name password = var.schema_password } resource "redpanda_schema_registry_acl" "user_events_acl" { cluster_id = data.redpanda_cluster.byoc.id principal = "User:${redpanda_user.schema_user.name}" resource_type = "SUBJECT" resource_name = redpanda_schema.user_events.subject pattern_type = "LITERAL" host = "*" operation = "READ" permission = "ALLOW" username = redpanda_user.schema_user.name password = var.schema_password } ``` This configuration registers an Avro schema for the `user_events` subject and grants a service account permission to read it from the Schema Registry. ## [](#delete-resources)Delete resources Terraform provides a way to clean up your infrastructure when resources are no longer needed. The `terraform destroy` command deletes all the resources defined in your configuration. > 📝 **NOTE** > > Terraform ensures that dependent resources are deleted in the correct order. For example, a cluster dependent on a network will be removed after the network. ### [](#delete-all-resources)Delete all resources 1. Navigate to the directory containing your Terraform configuration. 2. Run the following command: ```bash terraform destroy ``` 3. Review the destruction plan Terraform generates. It will list all the resources to be deleted. 4. Confirm by typing `yes` when prompted. 5. Wait for the process to complete. Terraform will delete the resources and display a summary. ### [](#delete-specific-resources)Delete specific resources If you only want to delete a specific resource rather than everything in your configuration, use the `-target` flag with `terraform destroy`. For example: ```bash terraform destroy -target=redpanda_network.example ``` This will delete only the `redpanda_network.example` resource. ## [](#suggested-reading)Suggested reading - [Redpanda Terraform Provider documentation](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs) - [Redpanda Terraform Provider examples](https://github.com/redpanda-data/terraform-provider-redpanda/tree/main/examples) - [Schema resource documentation](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/schema) - [Schema Registry ACL resource documentation](https://registry.terraform.io/providers/redpanda-data/redpanda/latest/docs/resources/schema_registry_acl) --- # Page 458: Redpanda Cloud Networking **URL**: https://docs.redpanda.com/redpanda-cloud/networking.md --- # Redpanda Cloud Networking --- title: Redpanda Cloud Networking latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/index.adoc description: Learn about Redpanda Cloud networking options and fundamentals. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Network Design and Ports](cloud-security-network/) Learn how Redpanda Cloud manages network security and connectivity. - [Choose CIDR Ranges](cidr-ranges/) Guidelines for choosing CIDR ranges when VPC peering. - [Networking: Serverless](serverless/) Learn how to configure private networking with AWS PrivateLink. - [Networking: BYOC](byoc/) Learn how to create a VPC peering connection and how to configure private networking with AWS PrivateLink, Azure Private Link, and GCP Private Service Connect. - [Networking: Dedicated](dedicated/) Learn how to create a VPC peering connection and how to configure private networking with AWS PrivateLink, Azure Private Link, and GCP Private Service Connect. --- # Page 459: Configure AWS PrivateLink with the Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/networking/aws-privatelink.md --- # Configure AWS PrivateLink with the Cloud API --- title: Configure AWS PrivateLink with the Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: aws-privatelink page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: aws-privatelink.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/aws-privatelink.adoc description: Set up AWS PrivateLink with the Cloud API. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-03-02" --- > 📝 **NOTE** > > This guide is for configuring AWS PrivateLink using the Redpanda Cloud API. To configure and manage PrivateLink on an existing public cluster, you must use the Cloud API. See [Configure PrivateLink in the Cloud UI](../configure-privatelink-in-cloud-ui/) if you want to set up the endpoint service using the Redpanda Cloud Console. The Redpanda AWS PrivateLink endpoint service provides secure access to Redpanda Cloud from your own VPC. Traffic over PrivateLink does not go through the public internet because a PrivateLink connection is treated as its own private AWS service. While your VPC has access to the Redpanda VPC, Redpanda cannot access your VPC. Consider using the PrivateLink endpoint service if you have multiple VPCs and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each client VPC can have one endpoint connected to the PrivateLink service. > > - PrivateLink allows overlapping [CIDR ranges](../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda usage tier. PrivateLink does not add extra connection limits. However, VPC peering is limited to 125 connections. See [How scalable is AWS PrivateLink?](https://aws.amazon.com/privatelink/faqs/) > > - You control which AWS principals are allowed to connect to the endpoint service. After [getting an access token](#get-a-cloud-api-access-token), you can [enable PrivateLink when creating a new cluster](#create-new-cluster-with-privatelink-endpoint-service-enabled), or you can [enable PrivateLink for existing clusters](#enable-privatelink-endpoint-service-for-existing-clusters). ## [](#prerequisites)Prerequisites - Install `rpk`. - Your Redpanda cluster and [VPC](#set-up-the-client-vpc) must be in the same region, unless you configure [cross-region PrivateLink](#cross-region-privatelink). - In this guide, you use the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) to enable the Redpanda endpoint service for your clusters. Follow the steps below to [get an access token](#get-an-access-token). - Use the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to create a new client VPC or modify an existing one to use the PrivateLink endpoint. > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 350 seconds (350000 ms). > 📝 **NOTE** > > Enabling PrivateLink changes private DNS behavior for your cluster. Before configuring connections, review [DNS resolution with PrivateLink](#dns-resolution-with-privatelink). ## [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ## [](#create-new-cluster-with-privatelink-endpoint-service-enabled)Create new cluster with PrivateLink endpoint service enabled 1. In the [Redpanda Cloud Console](https://cloud.redpanda.com/), go to **Resource groups** and select the resource group in which you want to create a cluster. Copy and store the resource group ID (UUID) from the URL in the browser. ```bash export RESOURCE_GROUP_ID= ``` 2. Call [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork) to create a network. Make sure to supply your own values in the following example request. The example uses a BYOC cluster. For a Dedicated cluster, set `"cluster_type": "TYPE_DEDICATED"`. Store the network ID (`network_id`) after the network is created to check whether you can proceed to cluster creation. - `name` - `cidr_block` - `aws_region` ```bash REGION= NETWORK_POST_BODY=`cat << EOF { "network": { "cloud_provider": "CLOUD_PROVIDER_AWS", "cluster_type": "TYPE_BYOC", "name": "", "cidr_block": "<10.0.0.0/20>", "resource_group_id": "$RESOURCE_GROUP_ID", "region": "$REGION" } } EOF` NETWORK_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" $PUBLIC_API_ENDPOINT/v1/networks | jq .metadata.network_id` echo $NETWORK_ID ``` Wait for the network to be ready before creating the cluster in the next step. You can check the state of the network creation by calling [`GET /v1/networks/{id}`](/api/doc/cloud-controlplane/operation/operation-networkservice_getnetwork). You can create the cluster when the state is `STATE_READY`. 3. Create a new cluster with the endpoint service enabled by calling [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster). In the example below, make sure to set your own values for the following fields: - `zones`: for example, `"us-west-2a","us-west-2b","us-west-2c"` - `type`: `"TYPE_BYOC"` or `"TYPE_DEDICATED"` - `tier`: for example, `"tier-1-aws-v2-arm"` - `name` - `connect_console`: Whether to enable connections to Redpanda Console (boolean) - `allowed_principals`: Amazon Resource Names (ARNs) for the AWS principals allowed to access the endpoint service. For example, for all principals in an account, use `"arn:aws:iam::account_id:root"`. See [Configure an endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permission) for details. - `supported_regions`: (Optional) List of AWS regions from which PrivateLink endpoints can connect to Redpanda. Required only for [cross-region PrivateLink](#cross-region-privatelink). For example, `["us-east-1", "us-west-2"]`. ```bash CLUSTER_POST_BODY=`cat << EOF { "cluster": { "cloud_provider": "CLOUD_PROVIDER_AWS", "connection_type": "CONNECTION_TYPE_PRIVATE", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "region": "$REGION", "zones": [ ], "throughput_tier": "", "type": "", "aws_private_link": { "enabled": true, "connect_console": true, "allowed_principals": ["",""], "supported_regions": ["",""] } } } EOF` CLUSTER_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" $PUBLIC_API_ENDPOINT/v1/clusters | jq -r .operation.metadata.cluster_id` echo $CLUSTER_ID ``` **BYOC clusters only:** Check that the cluster operation is completed by calling [`GET /v1/operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation), and passing the operation ID returned from the Create Cluster call. When the Create Cluster operation is completed (`STATE_COMPLETED`), run the following `rpk cloud` command to finish setting up your BYOC cluster: ```bash rpk cloud byoc aws apply --redpanda-id=$CLUSTER_ID ``` ## [](#enable-privatelink-endpoint-service-for-existing-clusters)Enable PrivateLink endpoint service for existing clusters > ⚠️ **CAUTION** > > Enabling PrivateLink on your VPC interrupts all communication on existing Redpanda bootstrap server and broker ports due to the change of private DNS resolution. > > To avoid disruption, consider using a staged approach to enable PrivateLink. See: [Switch from VPC peering to PrivateLink](../byoc/aws/vpc-peering-aws/#switch-from-vpc-peering-to-privatelink). 1. In the Redpanda Cloud Console, go to the cluster overview and copy the cluster ID from the **Details** section. ```bash CLUSTER_ID= ``` 2. Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster with the Redpanda Private Link Endpoint Service enabled. In the example below, make sure to set your own value for the following field: - `connect_console`: Whether to enable connections to Redpanda Console (boolean) - `allowed_principals`: Amazon Resource Names (ARNs) for the AWS principals allowed to access the endpoint service. For example, for all principals in an account, use `"arn:aws:iam::account_id:root"`. See [Configure an endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permission) for details. - `supported_regions`: (Optional) List of AWS regions from which PrivateLink endpoints can connect to Redpanda. Required only for [cross-region PrivateLink](#cross-region-privatelink). For example, `["us-east-1", "us-west-2"]`. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "aws_private_link": { "enabled": true, "connect_console": true, "allowed_principals": ["",""], "supported_regions": ["",""] } } EOF` curl -vv -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` 3. Before proceeding, check the state of the Update Cluster operation by calling [`GET /v1/operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation), and passing the operation ID returned from Update Cluster call. When the state is `STATE_READY`, proceed to the next step. 4. Check the service state by calling [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster). The `service_state` in the `aws_private_link.status` response object must be `Available` for you to [connect to the service](#access-redpanda-services-through-vpc-endpoint). ```bash curl -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID | jq '.cluster.aws_private_link.status | {service_name, service_state}' ``` ## [](#dns-resolution-with-privatelink)DNS resolution with PrivateLink PrivateLink changes how DNS resolution works for your cluster. When you query cluster hostnames outside the VPC that contains your PrivateLink endpoint, DNS may return private IP addresses that aren’t reachable from your location. To resolve cluster hostnames from other VPCs or on-premise networks, set up DNS forwarding using [Route 53 Resolver](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver.html): 1. In the VPC that contains your PrivateLink endpoint, create a Route 53 Resolver inbound endpoint. Ensure that the inbound endpoint’s security group allows inbound UDP/TCP port 53 from each VPC or on-prem network that will forward queries. 2. In each other VPC that must resolve the cluster domain, create a Resolver outbound endpoint and a forwarding rule for `` that targets the inbound endpoint IPs from the previous step. Associate the rule to those VPCs. The cluster domain is the suffix after the seed hostname. For example, if your bootstrap server URL is: `seed-3da65a4a.cki01qgth38kk81ard3g.byoc.dev.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.byoc.dev.cloud.redpanda.com`. 3. For on-premises DNS, create a conditional forwarder for `` that forwards to the inbound endpoint IPs from the earlier step (over VPN/Direct Connect). > ❗ **IMPORTANT** > > Do not configure forwarding rules to target the VPC’s Amazon-provided DNS resolver (VPC base CIDR + 2). Rules must target the IP addresses of Route 53 Resolver endpoints. ## [](#configure-privatelink-connection-to-redpanda-cloud)Configure PrivateLink connection to Redpanda Cloud When you have a PrivateLink-enabled cluster, you can create an endpoint to connect your VPC and your cluster. ### [](#get-cluster-domain)Get cluster domain Get the domain (`cluster_domain`) of the cluster from the cluster details in the Redpanda Cloud Console. For example, if the bootstrap server URL is: `seed-3da65a4a.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com`. ```bash CLUSTER_DOMAIN= ``` > 📝 **NOTE** > > Use `` as the domain you target with your DNS conditional forward (optionally also `*.` if your DNS platform requires a wildcard). ### [](#get-name-of-privatelink-endpoint-service)Get name of PrivateLink endpoint service The service name is required to [create VPC private endpoints](#create-vpc-endpoint). Run the following command to get the service name: ```bash PL_SERVICE_NAME=`curl -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID | jq -r .cluster.aws_private_link.status.service_name` ``` With the service name stored, set up your client VPC to connect to the endpoint service. ### [](#set-up-the-client-vpc)Set up the client VPC If you are not using an existing VPC, you must create a new one. > ⚠️ **CAUTION** > > [VPC peering](../byoc/aws/vpc-peering-aws/) and PrivateLink will not work at the same time if you set them up on the same VPC where your Kafka clients run. PrivateLink endpoints take priority. > > VPC peering and PrivateLink can both be used at the same time if Kafka clients are connecting from distinct VPCs. For example, in a private Redpanda cluster, you can connect your internal Kafka clients over VPC peering, and enable PrivateLink for external services. The client VPC must be in the same region as your Redpanda cluster, unless you have configured [cross-region PrivateLink](#cross-region-privatelink). To create the VPC, run: ```bash # See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for # information on profiles and credential files REGION= PROFILE= aws ec2 create-vpc --region $REGION --profile $PROFILE --cidr-block 10.0.0.0/20 # Store the client VPC ID from the command output CLIENT_VPC_ID= ``` You can also use an existing VPC. You need the VPC ID to [modify its DNS attributes](#modify-vpc-dns-attributes). ### [](#modify-vpc-dns-attributes)Modify VPC DNS attributes To modify the VPC attributes, run: ```bash aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-hostnames "{\"Value\":true}" aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-support "{\"Value\":true}" ``` These commands enable DNS hostnames and resolution for instances in the VPC. ### [](#create-security-group)Create security group You need the security group ID `security_group_id` from the command output to [add security group rules](#add-security-group-rules). To create a security group, run: ```bash aws ec2 create-security-group --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --description "Redpanda endpoint service client security group" \ --group-name "redpanda-privatelink-sg" SECURITY_GROUP_ID= ``` ### [](#add-security-group-rules)Add security group rules The following example adds security group rules that work for any broker count by opening the documented per-broker port ranges. For PrivateLink, clients connect to individual ports for each broker in ranges 32000-32500 (Kafka API) and 35000-35500 (HTTP Proxy). Opening only a few ports by broker count can break producers/consumers for topics with many partitions. See [Private service connectivity network ports](../cloud-security-network/#private-service-connectivity-network-ports). > ⚠️ **CAUTION** > > The following example uses `0.0.0.0/0` as the CIDR range for illustration. In production, replace `0.0.0.0/0` with the specific CIDR range of your client VPC or on-premises network to limit exposure. ```bash # Allow Kafka API bootstrap (seed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30292 --cidr 0.0.0.0/0 # Allow Schema Registry aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30081 --cidr 0.0.0.0/0 # Allow HTTP Proxy bootstrap aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30282 --cidr 0.0.0.0/0 # Allow Redpanda Cloud Data Plane API / Prometheus (if needed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 443 --cidr 0.0.0.0/0 # Private service connectivity broker port pools # Kafka API per-broker ports aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID \ --ip-permissions 'IpProtocol=tcp,FromPort=32000,ToPort=32500,IpRanges=[{CidrIp=0.0.0.0/0}]' # HTTP Proxy per-broker ports aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID \ --ip-permissions 'IpProtocol=tcp,FromPort=35000,ToPort=35500,IpRanges=[{CidrIp=0.0.0.0/0}]' ``` ### [](#create-vpc-subnet)Create VPC subnet You need the subnet ID `subnet_id` from the command output to [create a VPC endpoint](#create-vpc-endpoint). Run the following command, specifying the subnet availability zone name (for example, `us-west-2a`): ```bash aws ec2 create-subnet --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --availability-zone \ --cidr-block 10.0.1.0/24 SUBNET_ID= ``` You can also use an existing subnet from your VPC. You need the subnet ID to [create a VPC endpoint](#create-vpc-endpoint). ### [](#create-vpc-endpoint)Create VPC endpoint Create the interface VPC endpoint using the service name and subnet ID from the previous steps: ```bash aws ec2 create-vpc-endpoint \ --region $REGION --profile $PROFILE \ --vpc-id $CLIENT_VPC_ID \ --vpc-endpoint-type "Interface" \ --ip-address-type "ipv4" \ --service-name $PL_SERVICE_NAME \ --subnet-ids $SUBNET_ID \ --security-group-ids $SECURITY_GROUP_ID \ --private-dns-enabled ``` ## [](#access-redpanda-services-through-vpc-endpoint)Access Redpanda services through VPC endpoint After you have enabled PrivateLink for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#cross-region-privatelink)Cross-region PrivateLink By default, AWS PrivateLink only allows connections from VPCs in the same region as the endpoint service. Cross-region PrivateLink enables clients in different AWS regions to connect to your Redpanda cluster through PrivateLink. For more information about AWS cross-region PrivateLink support, see the [AWS documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-cross-region). ### [](#requirements)Requirements - The Redpanda cluster must be deployed across multiple [availability zones](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#availability-zone-az) (multi-AZ). This is an AWS limitation for cross-region PrivateLink. - Cross-region PrivateLink is configured through the `supported_regions` field in the `aws_private_link` configuration. This field only appears in the API response for multi-AZ clusters. - For BYOC clusters, the Redpanda agent IAM role must have `vpce:AllowMultiRegion` and `elasticloadbalancing:DescribeListenerAttributes` permissions. ### [](#configure-cross-region-privatelink)Configure cross-region PrivateLink To enable cross-region PrivateLink, add the `supported_regions` field to your `aws_private_link` configuration when [creating a new cluster](#create-new-cluster-with-privatelink-endpoint-service-enabled) or [enabling PrivateLink on an existing cluster](#enable-privatelink-endpoint-service-for-existing-clusters). The `supported_regions` field accepts a list of AWS region identifiers where you want to allow PrivateLink connections from. For example: ```json "aws_private_link": { "enabled": true, "connect_console": true, "allowed_principals": ["arn:aws:iam::123456789012:root"], "supported_regions": ["us-east-1", "us-west-2", "eu-west-1"] } ``` With this configuration, clients in VPCs located in `us-east-1`, `us-west-2`, and `eu-west-1` can create PrivateLink endpoints that connect to your Redpanda cluster, regardless of which region the cluster is deployed in. ### [](#create-a-cross-region-vpc-endpoint)Create a cross-region VPC endpoint When creating a VPC endpoint in a different region than your Redpanda cluster, use the same process as [creating a standard VPC endpoint](#create-vpc-endpoint), but specify both the client VPC’s region and the service region where your Redpanda cluster is deployed. > 📝 **NOTE** > > The `--service-region` option requires AWS CLI version 2.22.0 or later. Run `aws --version` to check your version and [update if necessary](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html). ```bash # CLIENT_REGION is the region where your client VPC is located # SERVICE_REGION is the region where your Redpanda cluster is deployed CLIENT_REGION= SERVICE_REGION= aws ec2 create-vpc-endpoint \ --region $CLIENT_REGION --profile $PROFILE \ --service-region $SERVICE_REGION \ --vpc-id $CLIENT_VPC_ID \ --vpc-endpoint-type "Interface" \ --ip-address-type "ipv4" \ --service-name $PL_SERVICE_NAME \ --subnet-ids $SUBNET_ID \ --security-group-ids $SECURITY_GROUP_ID \ --private-dns-enabled ``` ## [](#test-the-connection)Test the connection You can test the PrivateLink connection from any VM or container in the client VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#suggested-reading)Suggested reading - [Cloud API Overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) - [Add a BYOC VPC Peering Connection](../byoc/aws/vpc-peering-aws/) - [Add a Dedicated VPC Peering Connection](../dedicated/aws/vpc-peering/) --- # Page 460: Configure Azure Private Link in the Cloud Console **URL**: https://docs.redpanda.com/redpanda-cloud/networking/azure-private-link-in-ui.md --- # Configure Azure Private Link in the Cloud Console --- title: Configure Azure Private Link in the Cloud Console latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: azure-private-link-in-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: azure-private-link-in-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/azure-private-link-in-ui.adoc description: Set up Azure Private Link in the Redpanda Cloud Console. page-git-created-date: "2025-07-17" page-git-modified-date: "2026-02-02" --- > 📝 **NOTE** > > This guide is for configuring new clusters with Azure Private Link using the Redpanda Cloud Console. To configure and manage Private Link on an existing cluster, you must use the [Cloud API](../azure-private-link/). The Redpanda Azure Private Link service provides secure access to Redpanda Cloud from your own VNet. Traffic over Private Link does not go through the public internet because these connections are treated as their own private Azure service. While your VNet has access to the Redpanda virtual network, Redpanda cannot access your VNet. Consider using the endpoint service if you have multiple VNets and could benefit from a more simplified approach to network management: - Azure Private Link allows overlapping [CIDR ranges](../cidr-ranges/). - You control which Azure subscriptions are allowed to connect to the endpoint service. ## [](#requirements)Requirements - Your Redpanda cluster and VNet must be in the same region. - Use the [Azure command-line interface (CLI)](https://learn.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest) to create a new client VNet or modify an existing one to use the Private Link endpoint. > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 350 seconds. ## [](#enable-endpoint-service-for-new-clusters)Enable endpoint service for new clusters 1. In the Redpanda Cloud Console, create a new cluster. 2. On the **Networking** page: 1. For **Connection type**, select **Private**. 2. For **Azure Private Link**, select **Enabled**. 3. For **Allowed subscriptions**, click **Add subscription**, and enter the Azure subscription ID that can access the cluster. You can add multiple subscriptions. ## [](#access-redpanda-services-through-vnet-endpoint)Access Redpanda services through VNet endpoint To access Redpanda services, follow the steps on the cluster’s **Overview** page. In the **How to connect** section, click **Private Link**. ![Private Link tab in Overview page](../../shared/_images/private-link-tab.png) You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the connection to the endpoint service from any VM or container in the consumer VNet. If configuring a client isn’t possible right away, you can do these checks using `rpk` or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` --- # Page 461: Configure Azure Private Link with the Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/networking/azure-private-link.md --- # Configure Azure Private Link with the Cloud API --- title: Configure Azure Private Link with the Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: azure-private-link page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: azure-private-link.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/azure-private-link.adoc description: Set up Azure Private Link with the Cloud API. page-git-created-date: "2024-08-15" page-git-modified-date: "2026-02-02" --- > 📝 **NOTE** > > For UI-based configuration of Azure Private Link on new clusters, see [Configure Azure Private Link in the Cloud Console](../azure-private-link-in-ui/). The Redpanda Azure Private Link service provides secure access to Redpanda Cloud from your own virtual network. Traffic over Azure Private Link does not go through the public internet, but instead through Microsoft’s backbone network. While clients can initiate connections against the Redpanda Cloud cluster endpoints, Redpanda Cloud services cannot access your virtual networks directly. Consider using Private Link if you have multiple virtual networks and require more secure network management. To learn more, see the [Azure documentation](https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview). > 📝 **NOTE** > > - Each client VNet can have one endpoint connected to the Private Link service. > > - Private Link allows overlapping [CIDR ranges](../cidr-ranges/) in virtual networks. > > - The number of connections is limited only by your Redpanda usage tier. Private Link does not add extra connection limits. After [getting an access token](#get-a-cloud-api-access-token), you can [enable Private Link when creating a new cluster](#create-new-cluster-with-private-link-service-enabled), or you can [enable Private Link for existing clusters](#enable-private-link-service-for-existing-clusters). ## [](#requirements)Requirements - Install [`rpk`](../../manage/rpk/rpk-install/). - Install [`jq`](https://jqlang.org/download/), which is used to parse JSON values from API responses. - You will use the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/) to authenticate with Azure and configure resources in your Azure account. - You will use the [Redpanda Cloud API](/api/doc/cloud-controlplane/) to enable the Redpanda Private Link service for your clusters. Follow the steps on this page to [get an access token](#get-a-cloud-api-access-token). > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 240 seconds. ## [](#set-up-redpanda-private-link-service)Set up Redpanda Private Link Service ### [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ### [](#specify-azure-subscriptions)Specify Azure subscriptions Set the Azure subscriptions you want to use for the Private Link connection. Replace these placeholder variables: - ``: The ID of the subscription where the Redpanda cluster is provisioned. - ``: The ID of the subscription from where you initiate connections to the Private Link service. You may use the same subscription for both. ```bash export REDPANDA_CLUSTER_SUBSCRIPTION_ID= export SOURCE_CONNECTION_SUBSCRIPTION_ID= ``` If you have not yet created a cluster in Redpanda Cloud, [create a Private Link-enabled cluster](#create-new-cluster-with-private-link-service-enabled). If you already have a cluster where you want to use Private Link, see the steps to [enable Private Link for existing clusters](#enable-private-link-service-for-existing-clusters). ### [](#create-new-cluster-with-private-link-service-enabled)Create new cluster with Private Link service enabled 1. In the Redpanda Cloud Console, go to [**Resource groups**](https://cloud.redpanda.com/resource-groups) and select the Redpanda Cloud resource group in which you want to create a cluster. > 📝 **NOTE** > > Redpanda Cloud resource groups exist in your Redpanda Cloud account only. They do not correspond to Azure resource groups and do not appear in your Azure tenant. Copy and store the resource group ID (UUID) from the URL in the browser. ```bash export RESOURCE_GROUP_ID= ``` 2. Call [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork) to create a Redpanda Cloud network for the cluster. Make sure to supply your own values in the following example request. Store the network ID (`network_id`) after the network is created to check whether you can proceed to cluster creation. - `cluster-type`: `TYPE_BYOC` or `TYPE_DEDICATED` - `network-name` - `cidr_block` - `azure-region` ```bash REGION= NETWORK_POST_BODY=`cat << EOF { "network": { "cloud_provider": "CLOUD_PROVIDER_AZURE", "cluster_type": "", "name": "", "cidr_block": "<10.0.0.0/20>", "resource_group_id": "$RESOURCE_GROUP_ID", "region": "$REGION" } } EOF` NETWORK_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" $PUBLIC_API_ENDPOINT/v1/networks | jq .metadata.network_id` echo $NETWORK_ID ``` Wait for the network to be ready before creating the cluster in the next step. Check the state of the network creation by calling [`GET /v1/networks/{id}`](/api/doc/cloud-controlplane/operation/operation-networkservice_getnetwork). You can create the cluster when the state is `STATE_READY`. 3. Create a new cluster with the Private Link service enabled by calling [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster). In the following example, make sure to set your own values for the following fields: - `name` - `type`: `TYPE_BYOC` or `TYPE_DEDICATED` - `tier`: For example, `tier-1-azure`. See available Azure tiers in the [Control Plane API reference](/api/doc/cloud-controlplane/topic/topic-regions-and-usage-tiers). To learn more about tiers, see [BYOC Tiers and Regions](../../reference/tiers/byoc-tiers/) or [Dedicated Tiers and Regions](../../reference/tiers/dedicated-tiers/). - `zones`: For example, `"uksouth-az1", "uksouth-az2", "uksouth-az3"` ```bash CLUSTER_POST_BODY=`cat << EOF { "cluster": { "cloud_provider": "CLOUD_PROVIDER_AZURE", "connection_type": "CONNECTION_TYPE_PRIVATE", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "region": "$REGION", "throughput_tier": "", "type": "", "zones": [ ], "azure_private_link": { "allowed_subscriptions": ["$SOURCE_CONNECTION_SUBSCRIPTION_ID"], "enabled": true, "connect_console": true } } } EOF` CLUSTER_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" $PUBLIC_API_ENDPOINT/v1/clusters | jq -r .operation.metadata.cluster_id` echo $CLUSTER_ID ``` 4. **BYOC clusters only:** Check that the cluster operation is completed by calling [`GET /v1/operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation), and passing the operation ID returned from the Create Cluster call. When the Create Cluster operation is completed (`STATE_COMPLETED`), run the following `rpk cloud` command to finish setting up your BYOC cluster with Private Link enabled: ```bash rpk cloud byoc azure apply --redpanda-id=$CLUSTER_ID --subscription-id=$REDPANDA_CLUSTER_SUBSCRIPTION_ID ``` 5. Continue to [configure the Private Link connection to Redpanda](#configure-azure-private-link-connection-to-redpanda-cloud). ### [](#enable-private-link-service-for-existing-clusters)Enable Private Link service for existing clusters > ⚠️ **CAUTION** > > Enabling Private Link on your VNet interrupts all communication on existing Redpanda bootstrap server and broker ports due to the change of private DNS resolution. Make sure all applications running in your virtual network are ready to start using the corresponding Private Link ports. 1. In the Redpanda Cloud Console, go to the cluster overview and copy the cluster ID from the **Details** section. ```bash CLUSTER_ID= ``` 2. Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster with the service enabled. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "azure_private_link": { "allowed_subscriptions": ["$SOURCE_CONNECTION_SUBSCRIPTION_ID"], "enabled": true, "connect_console": true } } EOF` curl -vv -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` 3. Before proceeding, check the state of the Update Cluster operation by calling [`GET /v1/operations/{id}`](/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation), and passing the operation ID returned from the Update Cluster call. When the state is `STATE_READY`, continue to [configure the Private Link connection to Redpanda](#configure-azure-private-link-connection-to-redpanda-cloud). ## [](#configure-azure-private-link-connection-to-redpanda-cloud)Configure Azure Private Link connection to Redpanda Cloud 1. In the Redpanda Cloud Console, go to [**Users**](https://cloud.redpanda.com/users?tab=users) and create a new user to authenticate the Private Link endpoint connections with the service. You will need the username and password to [access Redpanda services](#connect-to-redpanda-services-through-private-link-endpoints) or [test the connection](#test-the-connection) using `rpk` or cURL. 2. Call the [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) endpoint to check the service status and retrieve the service ID, DNS name, and Redpanda Console URL to use. ```bash DNS_RECORD=`curl -s -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID | jq -r ".cluster.azure_private_link.status.dns_a_record"` PRIVATE_SERVICE_ID=`curl -s -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID | jq -r ".cluster.azure_private_link.status.service_id"` CONSOLE_URL=`curl -s -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID | jq -r ".cluster.redpanda_console.url"` echo $DNS_RECORD echo $PRIVATE_SERVICE_ID echo $CONSOLE_URL ``` 3. Log in to Azure and set the subscription ID to the value you set for `SOURCE_CONNECTION_SUBSCRIPTION_ID`: ```bash az login az account set --subscription $SOURCE_CONNECTION_SUBSCRIPTION_ID ``` ### [](#set-up-azure-private-link-endpoint-in-your-virtual-network)Set up Azure Private Link endpoint in your virtual network 1. If you have not already done so, create the Azure resource group and virtual network for your Private Link source connections. ```none az group create --name --location $REGION ``` ```none az network vnet create \ --resource-group \ --location $REGION \ --name \ --address-prefixes 10.0.0.0/16 \ --subnet-name \ --subnet-prefixes 10.0.0.0/24 ``` 2. Create the private endpoint. ```none az network private-endpoint create \ --location $REGION \ --connection-name \ --name redpanda-$CLUSTER_ID \ --manual-request true \ --private-connection-resource-id $PRIVATE_SERVICE_ID \ --resource-group \ --subnet \ --vnet-name ``` 3. Create a private DNS zone using the outputted DNS record above (`echo $DNS_RECORD`) ```none az network private-dns zone create \ --resource-group \ --name "$DNS_RECORD" ``` 4. Link the private DNS zone to the virtual network you created earlier, so virtual machines (VMs) and containers can resolve the Redpanda cluster domain. ```none az network private-dns link vnet create \ --resource-group \ --zone-name "$CLUSTER_ID.byoc.prd.cloud.redpanda.com" \ --name redpanda-$CLUSTER_ID-dns-zone-link \ --virtual-network \ --registration-enabled false ``` 5. Create a wildcard record in the private DNS zone. ```none az network private-dns record-set a add-record \ --resource-group \ --zone-name redpanda-$CLUSTER_ID \ --record-set-name "*" \ --ipv4-address "$PRIVATE_ENDPOINT_IP" ``` ## [](#connect-to-redpanda-services-through-private-link-endpoints)Connect to Redpanda services through Private Link endpoints After you enable Private Link for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ### [](#test-the-connection)Test the connection You can test the Private Link connection from any VM or container in the subscription where the endpoint is created. If configuring a Kafka client isn’t possible right away, you can do these checks using [`rpk`](../../../current/get-started/rpk-install/) or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. #### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` #### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. #### rpk ```bash rpk topic consume test-topic -n 1 ``` #### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` --- # Page 462: Networking: BYOC **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc.md --- # Networking: BYOC --- title: "Networking: BYOC" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/index.adoc description: Learn how to create a VPC peering connection and how to configure private networking with AWS PrivateLink, Azure Private Link, and GCP Private Service Connect. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-07-17" --- - [AWS](aws/) Learn how to configure private networking for BYOC clusters on AWS. - [Azure](azure/) Learn how to configure private networking for BYOC clusters on Azure. - [GCP](gcp/) Learn how to configure private networking for BYOC clusters on GCP. --- # Page 463: AWS **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/aws.md --- # AWS --- title: AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/aws/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/aws/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/aws/index.adoc description: Learn how to configure private networking for BYOC clusters on AWS. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-12-04" --- - [Add a BYOC VPC Peering Connection on AWS](vpc-peering-aws/) Use the Redpanda UI and AWS CLI to create a VPC peering connection for a BYOC cluster. - [Configure AWS PrivateLink in the Cloud Console](../../configure-privatelink-in-cloud-ui/) Set up AWS PrivateLink in the Redpanda Cloud Console. - [Configure AWS PrivateLink with the Cloud API](../../aws-privatelink/) Set up AWS PrivateLink with the Cloud API. - [Add Amazon VPC Transit Gateway](transit-gateway/) Use a transit gateway to connect your BYOC cluster to AWS VPCs or on-premises networks. --- # Page 464: Add Amazon VPC Transit Gateway **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/aws/transit-gateway.md --- # Add Amazon VPC Transit Gateway --- title: Add Amazon VPC Transit Gateway latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/aws/transit-gateway page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/aws/transit-gateway.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/aws/transit-gateway.adoc description: Use a transit gateway to connect your BYOC cluster to AWS VPCs or on-premises networks. page-git-created-date: "2025-06-12" page-git-modified-date: "2025-06-12" --- You can set up an [Amazon VPC Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html) to connect your internal VPCs to Redpanda services while maintaining full control over network traffic. The transit gateway acts as a central hub for routing traffic between VPCs, enabling communication between a Redpanda cluster and client applications hosted in different VPCs that can be in different AWS accounts. AWS Transit Gateway is available for BYOC and BYOVPC clusters. ## [](#set-up-amazon-vpc-transit-gateway)Set up Amazon VPC Transit Gateway To set up Amazon VPC Transit Gateway for Redpanda: 1. Create a transit gateway in your AWS account. 2. Create transit gateway attachments to the VPC hosting Redpanda and the VPC that will communicate to Redpanda (where the producer or consumer resides). 3. Update the transit gateway route table with the new routes for transit gateway attachments. For detailed instructions, see the [AWS Transit Gateways documentation](https://docs.aws.amazon.com/vpc/latest/tgw/tgw-transit-gateways.html). ## [](#example)Example The [Redpanda Cloud Examples repository](https://github.com/redpanda-data/cloud-examples/blob/9e2083e4bd8392e288ab6991b2a5a9b77a5fb0c5/aws-transit-gateway/README.md) provides sample Terraform code to set up and manage an Amazon VPC Transit Gateway for accessing Redpanda services across multiple VPCs. It includes steps for when the Redpanda cluster and client applications are hosted in the same AWS account and in different AWS accounts. > 📝 **NOTE** > > Your implementation may differ depending on the networking configuration within your VPCs. --- # Page 465: Add a BYOC VPC Peering Connection on AWS **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/aws/vpc-peering-aws.md --- # Add a BYOC VPC Peering Connection on AWS --- title: Add a BYOC VPC Peering Connection on AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/aws/vpc-peering-aws page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/aws/vpc-peering-aws.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/aws/vpc-peering-aws.adoc description: Use the Redpanda UI and AWS CLI to create a VPC peering connection for a BYOC cluster. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-09-05" --- A VPC peering connection is a networking connection between two VPCs. This connection allows the VPCs to communicate with each other as if they were within the same network. A route table routes traffic between the two VPCs using private IPv4 addresses. To start sending data to the Redpanda cluster, you must configure the VPC network connection by connecting your Redpanda VPC to your existing AWS VPC. ## [](#prerequisites)Prerequisites - An AWS account - A running BYOC cluster in AWS. See [Create a BYOC Cluster on AWS](../../../../get-started/cluster-types/byoc/aws/create-byoc-cluster-aws/). - Your Redpanda cluster and VPC must be in the same region. ## [](#create-a-peering-connection)Create a peering connection 1. In the AWS management console or the CLI, create a new peering connection between your AWS VPC and your Redpanda network using the following: - VPC Requester: Your Redpanda VPC. This looks something like `network-ch2c2ntioepec6ilaoog`. - VPC Accepter: Your existing AWS VPC ID. 2. After the VPC peering connection is created, make note of your peering connection ID. It has a `pcx-` prefix. ## [](#create-routes-from-redpanda-to-aws)Create routes from Redpanda to AWS The following command routes traffic from Redpanda to AWS by finding the route tables for each associated subnet and creating a route: ```bash aws ec2 describe-route-tables --filter "Name=tag:Name,Values=network-" "Name=tag:purpose,Values=private" | jq -r '.RouteTables[].RouteTableId' | \ while read -r route_table_id; do \ aws ec2 create-route --route-table-id $route_table_id --destination-cidr-block --vpc-peering-connection-id ; \ done; ``` Replace the following placeholder values: - Redpanda network ID: This ID appears after clicking on the name of the **Redpanda network** in the **Details** section of the **Overview** page of your cluster. This network ID may look similar, however, it is distinct from your cluster ID. - AWS CIDR block: This is listed in the AWS UI **Details** for your VPC. - Peering connection ID: This is the ID of the peering connection noted in step one. ## [](#create-routes-from-aws-to-redpanda)Create routes from AWS to Redpanda Now you must route your AWS subnet(s) to your Redpanda CIDR. The base command: ```bash aws ec2 --region create-route \ --route-table-id \ --destination-cidr-block \ --vpc-peering-connection-id ``` Your VPC may have multiple subnets, which may have multiple route table associations. Add the route to all the subnets. ## [](#test-your-connection)Test your connection There are two ways to test your connection: - Return to your cluster overview, and follow the directions in the **How to connect** panel. - Use the AWS [Reachability Analyzer](https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html). Select your VM instance and a Redpanda instance as the source and destination, and test the connection between them. ## [](#switch-from-vpc-peering-to-privatelink)Switch from VPC peering to PrivateLink VPC peering and PrivateLink use the same DNS hostnames (connection URLs) to connect to the Redpanda cluster. When you configure the PrivateLink DNS, those hostnames resolve to PrivateLink endpoints, which can interrupt existing VPC peering-based connections if clients aren’t ready. To enable PrivateLink without disrupting VPC peering connections, do a controlled DNS switchover: 1. Enable PrivateLink on the existing cluster and configure the PrivateLink connection to Redpanda Cloud, but **do not modify VPC DNS attributes yet**. See: [Enable PrivateLink on an existing cluster](../../../aws-privatelink/#enable-privatelink-endpoint-service-for-existing-clusters). 2. During a planned window, modify the VPC DNS attributes to switch the shared hostnames over to PrivateLink. --- # Page 466: Azure **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/azure.md --- # Azure --- title: Azure latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/azure/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/azure/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/azure/index.adoc description: Learn how to configure private networking for BYOC clusters on Azure. page-git-created-date: "2025-02-07" page-git-modified-date: "2025-05-07" --- - [Configure Azure Private Link in the Cloud Console](../../azure-private-link-in-ui/) Set up Azure Private Link in the Redpanda Cloud Console. - [Configure Azure Private Link with the Cloud API](../../azure-private-link/) Set up Azure Private Link with the Cloud API. --- # Page 467: GCP **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/gcp.md --- # GCP --- title: GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/gcp/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/gcp/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/gcp/index.adoc description: Learn how to configure private networking for BYOC clusters on GCP. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Add a BYOC VPC Peering Connection on GCP](vpc-peering-gcp/) Use the Redpanda and GCP UIs to create a VPC peering connection for a BYOC cluster. - [Configure GCP Private Service Connect in the Cloud UI](../../configure-private-service-connect-in-cloud-ui/) Set up GCP Private Service Connect in the Redpanda Cloud UI. - [Configure GCP Private Service Connect with the Cloud API](../../gcp-private-service-connect/) Set up GCP Private Service Connect to securely access Redpanda Cloud. - [Enable Global Access](enable-global-access/) Learn how to enable global access for new BYOC and BYOVPC clusters on GCP. --- # Page 468: Enable Global Access **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/gcp/enable-global-access.md --- # Enable Global Access --- title: Enable Global Access latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/gcp/enable-global-access page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/gcp/enable-global-access.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/gcp/enable-global-access.adoc description: Learn how to enable global access for new BYOC and BYOVPC clusters on GCP. page-git-created-date: "2025-08-13" page-git-modified-date: "2025-08-20" --- By default, the seed load balancer for a cluster on GCP only accepts connections from the same region where the cluster is deployed. In Redpanda Cloud, the seed load balancer is the bootstrap server address you configure in your clients. If your Redpanda Cloud clients and BYOC or BYOVPC cluster are not all in the same GCP region, you must enable [global access](https://cloud.google.com/load-balancing/docs/internal/setting-up-internal#ilb-global-access). Global access lets the seed load balancer accept connections from clients outside your cluster’s region, then route them to the appropriate broker addresses for producing and consuming data. You can enable global access when you create a new BYOC or BYOVPC cluster on GCP. In this guide, you use the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) to create a resource group, network, and cluster with global access enabled on GCP. ## [](#limitations)Limitations You can only use the Cloud API to enable global access as part of cluster creation, and not on existing clusters. Enabling global access on a running cluster requires recreating the GCP forwarding rule, which may cause some downtime. To enable global access on an existing cluster, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ## [](#create-a-cluster-with-global-access)Create a cluster with global access ### [](#create-a-resource-group)Create a resource group Make a request to the `POST /v1/resource-groups` endpoint and store the ID of the resource group you create. ```bash export RESOURCE_GROUP_ID=$(curl -X POST \ https://api.redpanda.com/v1/resource-groups \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H 'content-type: application/json' \ -d '{ "resource_group": { "name": "" } }' | jq -r '.resource_group.id') ``` If you’re creating a BYOVPC cluster, continue to the next section. Otherwise, if you’re creating a standard BYOC cluster, skip ahead to [Create a network](#create-a-network). ### [](#byovpc-only-configure-customer-managed-resources)BYOVPC only: Configure customer-managed resources 1. Before you proceed, check the [prerequisites and limitations](../../../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/#prerequisites) for new BYOVPC clusters on GCP. 2. Follow the steps to [configure your VPC](../../../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/#configure-your-vpc) with the required permissions and firewall rules. 3. Follow the next steps to [configure the service project](../../../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/#configure-the-service-project) and service account bindings. ### [](#create-a-network)Create a network Make a request to the `POST /v1/networks` endpoint and store the ID of the network you create. - For standard BYOC clusters, run: Show BYOC network creation command ```bash NETWORK_POST_BODY=`cat << EOF { "network": { "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_BYOC", "region": "", "cidr_block": "10.0.0.0/20" } } EOF` export NETWORK_ID=$(curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" https://api.redpanda.com/v1/networks | jq -r '.operation.metadata.network_id') ``` - For BYOVPC clusters, you also make a request to the `POST /v1/networks` endpoint, with a different request body: Show BYOVPC network creation command ```bash NETWORK_POST_BODY=`cat << EOF { "network": { "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_BYOC", "region": "", "customer_managed_resources": { "gcp": { "network_name": "", "network_project_id": "", "management_bucket": { "name" : "" } } } } EOF` export NETWORK_ID=$(curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" https://api.redpanda.com/v1/networks | jq -r '.operation.metadata.network_id') ``` Replace the following placeholder variables for the request body: - ``: The name for the Redpanda network. - ``: The GCP region where the network will be created. - ``: The ID of the GCP project where your VPC is created. - ``: The name of your VPC. - ``: The name of the Google Storage bucket you created for the cluster. Note that this endpoint returns a long-running operation. To check the operation state, use the `GET /v1/operations/{operation_id}` endpoint. ### [](#enable-global-access)Enable global access 1. Make a request to the `POST /v1/clusters` endpoint to create a new cluster with global access enabled (`"gcp_enable_global_access": true`). - For BYOC clusters, run: Show BYOC cluster creation command ```bash CLUSTER_POST_BODY=`cat << EOF { "cluster": { "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "cloud_provider": "CLOUD_PROVIDER_GCP", "type": "TYPE_BYOC", "region": "", "zones": , "throughput_tier": "", "gcp_enable_global_access": true } } EOF` export CLUSTER_ID=$(curl -X POST \ https://api.redpanda.com/v1/clusters \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H 'content-type: application/json' \ -d "$CLUSTER_POST_BODY" | jq -r '.operation.metadata.cluster_id') ``` Replace the following placeholder variables for the request body: - ``: The name for the Redpanda cluster. - ``: The GCP region where the cluster will be created. - ``: Provide the list of GCP zones where the brokers will be deployed. Format: `["", "", ""]` - ``: Choose a Redpanda Cloud cluster tier. For example, `tier-1-gcp-v2-x86`. - For BYOVPC clusters, you also make a request to the `POST /v1/clusters` endpoint, with a different request body: Show BYOVPC cluster creation command ```bash CLUSTER_POST_BODY=`cat << EOF { "cluster": { "cloud_provider": "CLOUD_PROVIDER_GCP", "connection_type": "CONNECTION_TYPE_PRIVATE", "type": "TYPE_BYOC", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "region": "", "zones": , "throughput_tier": "", "redpanda_version": "", "gcp_enable_global_access": true, "customer_managed_resources": { "gcp": { "subnet": { "name":"", "secondary_ipv4_range_pods": { "name": "" }, "secondary_ipv4_range_services": { "name": "" }, "k8s_master_ipv4_range": "" }, "agent_service_account": { "email": "" }, "connector_service_account": { "email": "" }, "console_service_account": { "email": "" }, "redpanda_cluster_service_account": { "email": "" }, "gke_service_account": { "email": "" }, "tiered_storage_bucket": { "name" : "" } } } } } EOF` export CLUSTER_ID=$(curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" https://api.redpanda.com/v1/clusters | jq -r '.operation.metadata.cluster_id') ``` Replace the following placeholders for the request body. Variables with a `byovpc_` prefix represent the customer-managed resources that you set up previously: - ``: Provide a name for the new cluster. - ``: Choose a GCP region where the cluster will be created. - ``: Provide the list of GCP zones where the brokers will be deployed. Format: `["", "", ""]` - ``: Choose a Redpanda Cloud cluster tier. For example, `tier-1-gcp-v2-x86`. - ``: Choose the Redpanda Cloud version. - ``: The name of the GCP subnet that was created for the cluster. - ``: The name of the IPv4 range designated for K8s pods. - ``: The name of the IPv4 range designated for services. - ``: The master IPv4 range. - ``: The email for the agent service account. - ``: The email for the connectors service account. - ``: The email for the Console service account. - ``: The email for the Redpanda service account. - ``: The email for the GKE service account. - ``: The name of the Google Storage bucket to use for Tiered Storage. 2. Run `rpk cloud byoc gcp apply`: ```bash rpk cloud byoc gcp apply --redpanda-id="${CLUSTER_ID}" --project-id='' ``` ## [](#test-global-access)Test global access To test if global access is successfully enabled, see the [GCP documentation](https://cloud.google.com/load-balancing/docs/internal/setting-up-internal#gcloud_17). --- # Page 469: Add a BYOC VPC Peering Connection on GCP **URL**: https://docs.redpanda.com/redpanda-cloud/networking/byoc/gcp/vpc-peering-gcp.md --- # Add a BYOC VPC Peering Connection on GCP --- title: Add a BYOC VPC Peering Connection on GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: byoc/gcp/vpc-peering-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: byoc/gcp/vpc-peering-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/byoc/gcp/vpc-peering-gcp.adoc description: Use the Redpanda and GCP UIs to create a VPC peering connection for a BYOC cluster. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-09-05" --- A VPC peering connection is a networking connection between two VPCs. This connection allows the VPCs to communicate with each other as if they were within the same network. A route table routes traffic between the two VPCs using private IPv4 addresses. To start sending data to the Redpanda cluster, you must configure the VPC network connection by connecting your Redpanda VPC to your existing GCP VPC. ## [](#prerequisites)Prerequisites - A GCP account. - A running BYOC cluster in GCP. See [Create a BYOC Cluster on GCP](../../../../get-started/cluster-types/byoc/gcp/create-byoc-cluster-gcp/). - Your Redpanda cluster and VPC must be in the same region. ## [](#create-vpcs)Create VPCs 1. Go to the **VPC** section in your GCP project UI. 2. You should see an existing VPC. This has an ID with a `redpanda-` prefix. 3. If you don’t already have a second VPC to connect your Redpanda network to, create one. - This is your Redpanda client. Ensure that its CIDR does not overlap with the Redpanda network from step 1. - The following example uses the name `rp-client`. ## [](#create-a-new-peering-connection)Create a new peering connection 1. In the GCP project UI, go to **Peering Connections**. 2. Create a new peering connection with the following values: - Your VPC network: `rp-client` - Peered VPC network: `redpanda-` 3. Save changes. 4. Create another peering connection, with the reverse values as above: - Your VPC network: `redpanda-` - Peered VPC network: `rp-client` 5. Save changes. GCP should set up routing automatically. ## [](#connect-to-redpanda)Connect to Redpanda The cluster Overview page has a variety of ways for you to connect and start sending data. To quickly test this quickly in GCP: - Create a virtual machine on your GCP network that has a firewall rule allowing ingress traffic from your IP (for example, `/32`) - Activate the Cloud Shell in your project, install `rpk` in the Cloud Shell, and run `rpk cluster info`. - If there is output from Redpanda, your connection is successful. ## [](#switch-from-vpc-peering-to-private-service-connect)Switch from VPC peering to Private Service Connect VPC peering and Private Service Connect use the same DNS hostnames (connection URLs) to connect to the Redpanda cluster. When you configure the Private Service Connect DNS, those hostnames resolve to Private Service Connect endpoints, which can interrupt existing VPC peering-based connections if clients aren’t ready. To enable Private Service Connect without disrupting VPC peering connections, do a controlled DNS switchover: 1. Enable Private Service Connect on the existing cluster and deploy consumer-side resources, but **do not create private DNS yet**. See: [Enable Private Service Connect on an existing cluster](../../../gcp-private-service-connect/#enable-private-service-connect-on-an-existing-byoc-or-byovpc-cluster). 2. During a planned window, create the private DNS zone and records in your VPC to switch the shared hostnames over to Private Service Connect. --- # Page 470: Choose CIDR Ranges **URL**: https://docs.redpanda.com/redpanda-cloud/networking/cidr-ranges.md --- # Choose CIDR Ranges --- title: Choose CIDR Ranges latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cidr-ranges page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cidr-ranges.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/cidr-ranges.adoc description: Guidelines for choosing CIDR ranges when VPC peering. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-20" --- Choosing appropriate Classless Inter-Domain Routing (CIDR) ranges is essential for successful VPC peering between Redpanda and your cloud network. > 📝 **NOTE** > > These guidelines provide general recommendations for choosing non-conflicting CIDR ranges. If you have a complex networking setup, work with a networking engineer to identify Redpanda CIDRs that won’t conflict with your existing VPCs. ## [](#prerequisites)Prerequisites - **VPC or virtual network (VNet)**: Before setting up a peering connection in Redpanda Cloud, you must have another VPC or VNet to which Redpanda can connect. If you do not already have a network, create one in your cloud provider. - **Matching region**: VPC peering connections can only be established between networks created in the _same region_. Redpanda Cloud does not support inter-region VPC peering connections. > 💡 **TIP** > > Consider adding an `rp-` prefix to the VPC or VNet name to indicate that it is for deploying a Redpanda cluster. ## [](#supported-ip-address-ranges)Supported IP address ranges Redpanda Cloud uses private IPv4 address spaces for cluster CIDRs. These ranges are designed for internal networks and cannot be accessed directly from the internet. Choose a CIDR from one of the following RFC 1918 ranges: - **10.0.0.0/8** - Provides addresses from 10.0.0.0 through 10.255.255.255 - **172.16.0.0/12** - Provides addresses from 172.16.0.0 through 172.31.255.255 - **192.168.0.0/16** - Provides addresses from 192.168.0.0 through 192.168.255.255 For BYOC (Bring Your Own Cloud) clusters, Redpanda also supports the RFC 6598 Carrier-Grade NAT (CGNAT) address space: - **100.64.0.0/10** - Provides addresses from 100.64.0.0 through 100.127.255.255 > ❗ **IMPORTANT** > > Redpanda’s network infrastructure will only route traffic within these RFC 1918 and RFC 6598 address spaces. Redpanda does not route packets to other IP spaces. Traffic from public IP addresses or other private ranges outside these specifications are blocked by design. ## [](#what-are-cidrs)What are CIDRs? The following CIDR ranges are a critical part of Redpanda’s BYOC configuration: - Your existing (client) VPC/VNet CIDR - Your Redpanda cluster CIDR It is important to ensure that these ranges do not overlap when setting up VPC peering. ## [](#choose-the-cidr-ranges)Choose the CIDR ranges To choose a range for Redpanda, you must know your VPC/VNet CIDR: - In AWS, find it in the VPC area of the AWS Management Console, labeled **IPv4 CIDRs**. - In Azure, find it in the Essentials view of your virtual network, labeled **Address space**. - In GCP, find it in the Details view of your VPC, labeled **Internal IP Ranges**. You can check which IPs this range encompasses by using either the [ipcalc](https://www.linux.com/topic/networking/how-calculate-network-addresses-ipcalc/) command in your terminal or the [CIDR calculation tool](https://www.ipaddressguide.com/cidr). For example, if your client’s CIDR range is 10.0.0.0/20, run: `ipcalc 10.0.0.0/20` The output should look similar to the following: ```bash Address: 10.0.0.0 00001010.00000000.0000 0000.00000000 Netmask: 255.255.240.0 = 20 11111111.11111111.1111 0000.00000000 Wildcard: 0.0.15.255 00000000.00000000.0000 1111.11111111 => Network: 10.0.0.0/20 00001010.00000000.0000 0000.00000000 HostMin: 10.0.0.1 00001010.00000000.0000 0000.00000001 HostMax: 10.0.15.254 00001010.00000000.0000 1111.11111110 Broadcast: 10.0.15.255 00001010.00000000.0000 1111.11111111 Hosts/Net: 4094 Class A, Private Internet ``` Note the values for `HostMin` (10.0.0.1) and `HostMax` (10.0.15.254). These are the minimum and maximum values of the range of 4,094 IPs that this CIDR covers. The number of IPs is governed by the suffix: /16 contains 65534 IPs, /21 contains 2046, /24 contains 254, and so on. For private networks, this number can range from 8 (which contains 16777214 IPs) to 30 (which contains 2). > 📝 **NOTE** > > The Redpanda CIDR requires a block size between /16 and /20. ## [](#example)Example Assume that your client’s CIDR range is `10.0.0.0/20`. Your Redpanda range cannot overlap with it; if it does, VPC peering will not work. A limited set of examples that work with `10.0.0.0/20` are `10.8.0.0/20`, `10.0.16.0/20`, or `10.1.0.0/20`. Ranges like `10.0.0.6/20`, `10.0.8.0/20`, or `10.0.1.7/20` would not work. You can use [ipcalc](http://trk.free.fr/ipcalc/tools.html) to check for overlapping IPs. ## [](#multi-vpcvnet-example)Multi-VPC/VNet example If you have many IP ranges allocated in a complex system, work with a network engineer who can help with IP allocation. Your Redpanda CIDR cannot overlap with any of your existing VPCs/VNets, nor can it overlap with the VPC/VNet you want to peer with. Assume that the following example ranges are in use: - `10.0.0.0/20` - `10.8.0.0/20` - `10.0.35.8/20` - `10.0.16.8/20` A Redpanda CIDR that would work (and not overlap) with any of them is `10.8.48.8/20` --- # Page 471: Network Design and Ports **URL**: https://docs.redpanda.com/redpanda-cloud/networking/cloud-security-network.md --- # Network Design and Ports --- title: Network Design and Ports latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-security-network page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-security-network.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/cloud-security-network.adoc description: Learn how Redpanda Cloud manages network security and connectivity. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-29" --- Redpanda Cloud deploys different types of networks for public Redpanda clusters and for private Redpanda clusters. By default, networks are always laid out across multiple availability zones (AZs) to enable the creation of one or many single and multi-AZ Redpanda clusters within them. ## [](#public-vs-private-network-designs)Public vs private network designs The following table compares public and private Redpanda clusters: | Feature | Public clusters | Private clusters | | --- | --- | --- | | Access | Internet-accessible endpoints | Access only through VPC peering or private service connectivity (AWS PrivateLink, Azure Private Link, or GCP Private Service Connect) | | Security | SASL/SCRAM authentication + TLS encryption | SASL/SCRAM authentication + TLS encryption + network isolation | | Use case | Development, testing, or scenarios where public access is needed | Production environments requiring heightened security | The Redpanda Cloud agent (sometimes called the [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) agent) provisions, configures, and maintains cluster resources, including the network. Each agent has a dedicated operations queue in the [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) through which it pulls and materializes cluster definition documents into cloud infrastructure resources. For BYOC clusters, agents are provisioned by the user with `rpk`. For more information, see [BYOC Architecture](../../get-started/byoc-arch/). ### [](#public-redpanda-clusters)Public Redpanda clusters Public Redpanda clusters deploy networks segmented by workload type. Public clusters deploy brokers in public subnets. Redpanda ports are protected by SASL/SCRAM authentication (SCRAM-SHA-256, SCRAM-SHA-512) and encrypted in transit using TLS 1.2. Everything else is deployed on private subnets. ### [](#private-redpanda-clusters)Private Redpanda clusters Private Redpanda clusters also deploy networks segmented by workload type. Brokers are placed on private subnets, accessible from within the same VPC or from VPC peerings or private connectivity. The Redpanda Cloud agent and Redpanda Connect nodes are placed in distinct subnets, segmented away from Redpanda services by routing and firewall rules. The private link service (AWS PrivateLink, Azure Private Link, or GCP Private Service Connect) and VPC peering connections are used to connect to the Redpanda cluster. #### [](#private-network-data-flows)Private network data flows Data flows are the network traffic that carries data, such as messages produced to a topic or consumed from a topic. The following diagram shows the data flows from private Redpanda clusters. ![Redpanda Cloud private cluster data flows](../../shared/_images/data-flows.png) #### [](#private-network-metadata-flows)Private network metadata flows Metadata flows are the network traffic that carries metadata, such as telemetry and cluster configuration. The Redpanda Cloud agent uses metadata flows to share with the control plane connection endpoints, cluster readiness, and status. The following diagram shows the metadata flows from private Redpanda clusters. ![Redpanda Cloud private cluster metadata flows](../../shared/_images/metadata-flows.png) #### [](#private-network-control-flows)Private network control flows Control flows are the network traffic that carries control messages, such as cluster upgrades and configuration updates. The Redpanda Cloud agent uses control flows to manage the cluster. Occasionally, incident responders use control flows to mitigate incidents when automated controls are insufficient. The following diagram shows the control flows from private Redpanda clusters. ![Redpanda Cloud private cluster control flows](../../shared/_images/control-flows.png) ## [](#network-ports)Network ports This section lists the external ports on which Redpanda Cloud components communicate. Redpanda manages security group and firewall configurations, but if you need to add to your own rule sets, these are the available network ports. The following table provides a quick reference of network ports: | Direction | Purpose | Ports | | --- | --- | --- | | North-south | External client access | 30092, 9092, 30081, 30082, 443 | | East-west | Internal cluster communication | 30092, 9092, 8081, 8082, 33145, 30644, 8083 | | South-north | Outgoing connections | 443, 80 | > 📝 **NOTE** > > Redpanda also uses some ports for internal communication inside the cluster, including ports 80 and 9644. ### [](#north-south)North-south The following table lists the network ports available to external clients within each data plane. For private clusters, access to these ports is only possible through Redpanda Cloud network connections such as [VPC peering](../dedicated/aws/vpc-peering/), transit gateway attachments, or private service connectivity. | Service | Port | | --- | --- | | Kafka API | 30092/tcp | | Kafka API bootstrap | 9092/tcp | | Schema Registry | 30081/tcp | | Kafka HTTP Proxy and Kafka HTTP Proxy bootstrap | 30082/tcp | | Redpanda Console, Data Plane API, Prometheus metrics | 443/tcp | ### [](#east-west)East-west The following table lists the network ports available within each data plane for internal communication only. | Service | Port | | --- | --- | | Kafka API | 30092/tcp | | Kafka API bootstrap | 9092/tcp | | Schema Registry | 8081/tcp | | Kafka HTTP Proxy | 8082/tcp | | Redpanda RPC | 33145/tcp | | Redpanda Admin API | 30644/tcp | | Kafka Connect API | 8083/tcp | ### [](#south-north)South-north The following network port is used for outgoing network connections outside the VPC. DNS and NTP ports are not included because those network flows do not leave the cloud provider’s network, and they reach the internal cloud provider services within the VPC. | Service | Port | | --- | --- | | Control plane, breakglass, artifact repository, and telemetry | 443/tcp, 80/tcp | ## [](#private-service-connectivity-network-ports)Private service connectivity network ports ### [](#north-south-2)North-south When private service connectivity is enabled (AWS PrivateLink, Azure Private Link, or GCP Private Service Connect), the following network ports are made available to external clients: | Service | Port | | --- | --- | | Kafka API | 32000-32500/tcp | | Kafka API bootstrap | 30292/tcp | | Schema Registry | 30081/tcp | | Kafka HTTP Proxy | 35000-35500/tcp | | Kafka HTTP Proxy bootstrap | 30282/tcp | | Redpanda Console, Data Plane API, Prometheus metrics | 443/tcp | ## [](#nat-gateways)NAT gateways A NAT (Network Address Translation) gateway allows resources in a private network to access the internet, while blocking inbound connections. Redpanda Cloud clusters require outbound-only internet access for control plane connectivity, upgrades, and telemetry. The way NAT gateways are provisioned depends on your cloud provider and deployment type: - **BYOVPC/BYOVNet:** You are responsible for providing internet access, as you fully manage the network. - **BYOC/Dedicated** on **AWS:** Redpanda provisions one NAT gateway and one internet gateway for outbound-only access. - **BYOC/Dedicated** on **Azure:** Redpanda provisions one NAT gateway and a `/31` public IP prefix (two usable IPs) for outbound-only access. - **BYOC/Dedicated** on **GCP:** Redpanda provisions one NAT gateway and one internet gateway for outbound-only access. The following table summarizes when a NAT gateway is required: | Traffic type | NAT gateway required? | Notes | | --- | --- | --- | | Redpanda streaming traffic | No | | | Redpanda Tiered Storage traffic | No | AWS: All connections are done through a VPC gateway endpoint in the VPC. BYOVPC customers must ensure that this VPC endpoint exists in the VPC and that routing rules are configured appropriately.Azure: Three Private Link endpoints are used by Redpanda brokers to access Azure Blob Storage.GCP: Tiered Storage data transfer is free within the same region. | | Redpanda provisioning and telemetry | Yes | There is a minimal usage for artifact downloads and metrics. | | Internet-facing connectors | Yes | Internet-facing connectors incur NAT data transfer charges. | > 📝 **NOTE** > > GCP public clusters use multiple NAT gateways with dynamic IP allocation. For GCP public clusters, do not use specific NAT gateway IP addresses for allowlisting or firewall rules. ### [](#allowlist-the-nat-gateway)Allowlist the NAT gateway Redpanda Connect and Kafka Connect connectors that egress to the internet can incur NAT data transfer costs. You can add the NAT gateway IP address to your data source allowlist, if needed. Redpanda Data does not guarantee that the NAT gateway IP will remain static, but it is unlikely to change. For BYOC and Dedicated clusters, you can find the NAT gateway IP on the cluster **Overview** page or in the response body of the [`GET /v1/clusters/{id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_getcluster) API request. ## [](#cloud-provider-network-services)Cloud provider network services Each cloud provider offers specific network services integrated with Redpanda Cloud: ### AWS - **Time synchronization** Redpanda Cloud uses the [Amazon Time Sync Service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html), a fleet of redundant satellite-connected and atomic reference clocks in AWS regions. - **Domain name system (DNS)** Redpanda Cloud creates a new DNS zone for each cluster in the control plane and delegates its management exclusively to each cluster’s data plane. In turn, the data plane creates a hosted zone in Route 53, managing DNS records for Redpanda services as needed. All interactions with Route 53 are controlled by IAM policies targeted to the specific Route 53 resources managed by each data plane, following the principle of least privilege. The Route 53-hosted DNS zone in the data plane has the following naming convention: - BYOC/BYOVPC/BYOVNet: `[cluster_id].byoc.prd.cloud.redpanda.com` - Dedicated: `[cluster_id].fmc.prd.cloud.redpanda.com` - **Distributed denial of service (DDoS) protection** All Redpanda Cloud services publicly exposed in the control plane and data plane are protected against the most common layer 3 and 4 DDoS attacks by [AWS Shield Standard](https://aws.amazon.com/shield/features/#AWS_Shield_Standard), with no latency impact. - **VPC peering** VPC peering against Redpanda Cloud networks allows users to connect to private clusters without traversing the public internet. You can establish VPC peering connections between two VPCs with non-overlapping network addresses. When creating a network intended for peering, ensure that the specified network address range does not overlap with the network address range of the destination VPC. _Security best practice:_ When using VPC peering, always reject all network traffic initiated from a Redpanda Cloud network and only accept traffic from trusted connectors. - **AWS PrivateLink** AWS PrivateLink lets you connect to cluster services using unidirectional TCP connections that client applications can only initiate. These applications can run from multiple customer-managed VPCs, even if their CIDR ranges overlap with the Redpanda cluster VPC. AWS PrivateLink is configured against the Redpanda cluster’s network load balancer. All client connections to cluster services pass through this load balancer. You configure PrivateLink with the Redpanda Cloud UI or Cloud API, and it is protected by an allowlist of principal ARNs during creation. Only those principals can create VPC endpoint attachments to the PrivateLink service. ### Azure - **Time synchronization** Redpanda Cloud synchronizes time through the underlying Azure host, which uses internal Microsoft time servers that get their time from Microsoft-owned Stratum 1 devices with GPS antennas. - **Domain name system (DNS)** Redpanda Cloud creates a new DNS zone for each cluster in the control plane and delegates its management exclusively to each cluster’s agent. In turn, the agent creates an Azure DNS zone and manages the DNS records for Redpanda services, as needed. All Azure API interactions with Azure DNS are done through a user-assigned managed identity, with constrained Azure RBAC permissions, following the principle of least privilege. The DNS zone in the data plane has the following naming convention: - BYOC: `[cluster_id].byoc.prd.cloud.redpanda.com` - Dedicated: `[cluster_id].fmc.prd.cloud.redpanda.com` - **Distributed denial of service (DDoS) protection** All Redpanda Cloud services publicly exposed in the control plane are protected against the most common layer 3 and 4 DDoS attacks by AWS. Data plane services in Azure are not protected by default against common network-level DDoS attacks. Azure customers are fully responsible for enabling this protection, because it has an added cost. - **VNet peering** VNet peering against Redpanda Cloud networks allows users to connect to private clusters without traversing the public internet. > 📝 **NOTE** > > VNet peering in Azure is in limited availability. VNet peering connections can only be established between two or more VNets with non-overlapping network addresses. When creating a Redpanda Cloud network for peering, make sure the Redpanda network address range does not overlap with the network address range of the destination VNet. _Security best practice:_ When using VNet peering, always reject all network traffic initiated from a Redpanda Cloud network and only accept traffic from trusted connectors. Unlike AWS and GCP, Azure charges $0.01 per GB transferred over a VNet peering, in either direction. For high-throughput use cases, consider using BYOVPC clusters. With BYOVPC, client application workloads are deployed on the same VNet as the Redpanda brokers, avoiding additional data transfer costs. - **Azure Private Link** Azure Private Link lets you connect to cluster services using an unidirectional TCP connection that can only be initiated by client applications. These applications can run from multiple customer-managed VNets, even if their CIDR ranges overlap with the Redpanda cluster VNet. Redpanda configures Private Link against the cluster’s Azure load balancer. All client connections to the Redpanda cluster services pass through this load balancer. You configure Private Link with the Redpanda Cloud UI or the Cloud API, and it is protected during creation by an allowlist of Azure subscription IDs. Only allowlisted subscriptions can create private endpoint attachments to the cluster’s Private Link service. ### GCP - **Time synchronization** Redpanda Cloud uses [Google NTP Servers](https://cloud.google.com/compute/docs/instances/configure-ntp#linux-chrony), a fleet of satellite-connected and atomic reference clocks. - **Domain name system (DNS)** Redpanda Cloud creates a new DNS zone for each cluster in the control plane and delegates its management exclusively to each cluster’s data plane. In turn, the data plane creates a managed zone in Cloud DNS, managing DNS records for Redpanda services, as needed. All interactions with Cloud DNS are controlled by IAM policies targeted to the specific Cloud DNS resources managed by each data plane, following the principle of least privilege. - **Distributed denial of service (DDoS) protection** All Redpanda Cloud services publicly exposed in the control plane and data plane are protected against the most common layer 3 and 4 DDoS attacks by [Google Cloud Armor Standard](https://cloud.google.com/armor/docs/advanced-network-ddos), with no latency impact. - **VPC peering** VPC peering against Redpanda Cloud networks allows users to connect to private clusters without traversing the public internet. You can establish VPC peering connections between two VPCs with non-overlapping network addresses. When creating a network intended for peering, ensure that the specified network address range does not overlap with the network address range of the destination VPC. _Security best practice:_ When using VPC peering, always reject all network traffic initiated from a Redpanda Cloud network and only accept traffic from trusted connectors. - **GCP Private Service Connect** GCP Private Service Connect lets you connect to cluster services using a unidirectional TCP connection that can only be initiated by client applications. These applications can run from multiple customer-managed VPCs, even if their CIDR ranges overlap with the Redpanda cluster VPC. Redpanda configures a Private Service Connect producer against the cluster’s network load balancer. All client connections to the Redpanda cluster services pass through this load balancer. You configure a Private Service Connect publisher with the Redpanda Cloud UI or the Cloud API. It is protected during creation by a consumer accept list of GCP networks or project IDs. Only those consumers can create consumer endpoints to the Redpanda cluster’s Private Service Connect published service. ## [](#suggested-reading)Suggested reading - [Redpanda Cloud overview](../../get-started/cloud-overview/) - [BYOC architecture](../../get-started/byoc-arch/) - [BYOC networking](../byoc/) - [Dedicated networking](../dedicated/) --- # Page 472: Configure GCP Private Service Connect in the Cloud UI **URL**: https://docs.redpanda.com/redpanda-cloud/networking/configure-private-service-connect-in-cloud-ui.md --- # Configure GCP Private Service Connect in the Cloud UI --- title: Configure GCP Private Service Connect in the Cloud UI latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: configure-private-service-connect-in-cloud-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: configure-private-service-connect-in-cloud-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/configure-private-service-connect-in-cloud-ui.adoc description: Set up GCP Private Service Connect in the Redpanda Cloud UI. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-07" --- > 📝 **NOTE** > > - This guide is for configuring GCP Private Service Connect using the Redpanda Cloud UI. To configure and manage Private Service Connect on an existing cluster with **public** networking, you must use the [Cloud API for BYOC](../gcp-private-service-connect/) or the [Cloud API for Dedicated](../dedicated/gcp/configure-psc-in-api/). > > - The latest version of Redpanda GCP Private Service Connect (available March, 2025) supports zone affinity. This allows requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs. > > - DEPRECATION: The original Redpanda GCP Private Service Connect is deprecated and will be removed in a future release. For more information, see [Deprecated features](../../manage/maintenance/#deprecated-features). The Redpanda GCP Private Service Connect service provides secure access to Redpanda Cloud from your own VPC network. Traffic over Private Service Connect does not go through the public internet because these connections are treated as their own private GCP service. While your VPC network has access to the Redpanda VPC network, Redpanda cannot access your VPC network. Consider using Private Service Connect if you have multiple VPC networks and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each consumer VPC network can have one Private Service Connect endpoint connected to the Redpanda service attachment. > > - Private Service Connect allows overlapping [CIDR ranges](../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda [usage tier](../../reference/tiers/). Private Service Connect does not add extra connection limits. > > - You control from which GCP projects connections are allowed. ## [](#requirements)Requirements - Use the [gcloud](https://cloud.google.com/sdk/docs/install) command-line interface (CLI) to create the consumer-side resources, such as a consumer VPC network and forwarding rule, or to modify existing resources to use the Private Service Connect service attachment created for your cluster. - The consumer VPC network must be in the same region as your Redpanda cluster. ## [](#enable-private-service-connect-for-existing-clusters)Enable Private Service Connect for existing clusters 1. In the Redpanda Cloud UI, open your [cluster](https://cloud.redpanda.com/clusters), and click **Cluster settings**. 2. Under Private Service Connect, click **Enable**. 3. For [BYOVPC clusters](../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/), you need a PSC NAT subnet with `purpose` set to `PRIVATE_SERVICE_CONNECT`. You also need to create VPC network firewall rules to allow Private Service Connect traffic. You can use the `gcloud` CLI: > 📝 **NOTE** > > The firewall rules support up to 20 Redpanda brokers. If you have more than 20 brokers, or for help enabling Private Service Connect, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ```bash gcloud compute networks subnets create \ --project= \ --network= \ --region= \ --range= \ --purpose=PRIVATE_SERVICE_CONNECT ``` ```bash gcloud compute firewall-rules create redpanda-psc-ingress \ --description="Allow access to Redpanda PSC endpoints" \ --network="" \ --project="" \ --direction="INGRESS" \ --target-tags="redpanda-node" \ --source-ranges="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,100.64.0.0/10" \ --allow="tcp:30181,tcp:30282,tcp:30292,tcp:31004,tcp:31082-31101,tcp:31182-31201,tcp:31282-31301,tcp:32092-32111,tcp:32192-32211,tcp:32292-32311" ``` Provide your values for the following placeholders: - ``: The name of the PSC NAT subnet. - ``: The host GCP project ID. - ``: The name of the VPC network being used for your Redpanda Cloud cluster. - ``: The region of the Redpanda Cloud cluster. - ``: The CIDR range of the subnet. The mask should be at least `/29`. Each Private Service Connect connection takes up one IP address from the PSC NAT subnet, so the CIDR must be able to accommodate all projects from which connections to the service attachment will be issued. See the GCP documentation for [creating a subnet for Private Service Connect](https://cloud.google.com/vpc/docs/configure-private-service-connect-producer#add-subnet-psc). 4. For the accepted consumers list, you need the GCP project IDs from which incoming connections will be accepted. 5. It may take several minutes for your cluster to update. When the update is complete, the Private Service Connect status in **Cluster settings** changes from **In progress** to **Enabled**. ## [](#deploy-consumer-side-resources)Deploy consumer-side resources For each consumer VPC network, you must complete the following steps to successfully connect to the service attachment and use the Kafka API and other Redpanda services, such as HTTP Proxy. 1. In **Cluster settings**, copy the **DNS zone** and **Service attachment URL** under **Private Service Connect**. Use this URL to create the Private Service Connect endpoint in GCP. 2. Get the name of the consumer VPC network and the subnet ``, where the Private Service Connect endpoint forwarding rule will be created. 3. Create a Private Service Connect IP address for the endpoint: ```bash gcloud compute addresses create --subnet= --addresses= --region= ``` 4. Create the Private Service Connect endpoint forwarding rule: > 📝 **NOTE** > > If you enabled global access when creating the cluster, you must include the `--allow-psc-global-access` flag to configure the endpoint to accept client connections from different regions. ```bash gcloud compute forwarding-rules create --region= --network= --address= --target-service-attachment= ``` 5. Create firewall rules allowing egress traffic to the Private Service Connect endpoint: ```bash gcloud compute firewall-rules create redpanda-psc-egress \ --description="Allow access to Redpanda PSC endpoint" \ --network="" \ --direction="EGRESS" \ --destination-ranges= \ --allow="tcp:443,tcp:30081,tcp:30282,tcp:30292,tcp:32092-32141,tcp:35082-35131,tcp:32192-32241,tcp:35182-35231,tcp:32292-32341,tcp:35282-35331" ``` 6. Create a private DNS zone. Use the cluster **DNS zone** value as the DNS name: ```bash gcloud dns managed-zones create \ --project= \ --description="Redpanda Private Service Connect DNS zone" \ --dns-name="" \ --visibility="private" \ --networks="" ``` 7. In the newly-created DNS zone, create a wildcard DNS record using the cluster **DNS record** value: ```bash gcloud dns record-sets create '*.' \ --project= \ --zone="" \ --type="A" \ --ttl="300" \ --rrdatas="" ``` ## [](#access-redpanda-services-through-private-service-connect-endpoint)Access Redpanda services through Private Service Connect endpoint After you have enabled Private Service Connect for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud UI. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the Private Service Connect connection from any VM or container in the consumer VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or curl: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#disable-private-service-connect)Disable Private Service Connect In **Cluster settings**, click **Disable**. Existing connections are closed after it is disabled. To connect using Private Service Connect again, you must re-enable it. --- # Page 473: Configure AWS PrivateLink in the Cloud Console **URL**: https://docs.redpanda.com/redpanda-cloud/networking/configure-privatelink-in-cloud-ui.md --- # Configure AWS PrivateLink in the Cloud Console --- title: Configure AWS PrivateLink in the Cloud Console latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: configure-privatelink-in-cloud-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: configure-privatelink-in-cloud-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/configure-privatelink-in-cloud-ui.adoc description: Set up AWS PrivateLink in the Redpanda Cloud Console. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-03-02" --- > 📝 **NOTE** > > This guide is for configuring AWS PrivateLink using the Redpanda Cloud Console. To configure and manage PrivateLink on an existing public cluster, you must use the [Redpanda Cloud API](../aws-privatelink/). The Redpanda AWS PrivateLink endpoint service provides secure access to Redpanda Cloud from your own VPC. Traffic over PrivateLink does not go through the public internet because these connections are treated as their own private AWS service. While your VPC has access to the Redpanda VPC, Redpanda cannot access your VPC. Consider using the endpoint service if you have multiple VPCs and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each client VPC can have one endpoint connected to the PrivateLink service. > > - PrivateLink allows overlapping [CIDR ranges](../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda usage tier. PrivateLink does not add extra connection limits. However, VPC peering is limited to 125 connections. See [How scalable is AWS PrivateLink?](https://aws.amazon.com/privatelink/faqs/) > > - You control which AWS principals are allowed to connect to the endpoint service. ## [](#requirements)Requirements - Your Redpanda cluster and VPC must be in the same region, unless you configure [cross-region PrivateLink](#cross-region-privatelink). - Use the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to create a new client VPC or modify an existing one to use the PrivateLink endpoint. > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 350 seconds (350000 ms). ## [](#dns-resolution-with-privatelink)DNS resolution with PrivateLink PrivateLink changes how DNS resolution works for your cluster. When you query cluster hostnames outside the VPC that contains your PrivateLink endpoint, DNS may return private IP addresses that aren’t reachable from your location. To resolve cluster hostnames from other VPCs or on-premise networks, set up DNS forwarding using [Route 53 Resolver](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver.html): 1. In the VPC that contains your PrivateLink endpoint, create a Route 53 Resolver inbound endpoint. Ensure that the inbound endpoint’s security group allows inbound UDP/TCP port 53 from each VPC or on-prem network that will forward queries. 2. In each other VPC that must resolve the cluster domain, create a Resolver outbound endpoint and a forwarding rule for `` that targets the inbound endpoint IPs from the previous step. Associate the rule to those VPCs. The cluster domain is the suffix after the seed hostname. For example, if your bootstrap server URL is: `seed-3da65a4a.cki01qgth38kk81ard3g.byoc.dev.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.byoc.dev.cloud.redpanda.com`. 3. For on-premises DNS, create a conditional forwarder for `` that forwards to the inbound endpoint IPs from the earlier step (over VPN/Direct Connect). > ❗ **IMPORTANT** > > Do not configure forwarding rules to target the VPC’s Amazon-provided DNS resolver (VPC base CIDR + 2). Rules must target the IP addresses of Route 53 Resolver endpoints. ## [](#enable-endpoint-service-for-existing-clusters)Enable endpoint service for existing clusters 1. In the Redpanda Cloud Console, select your [cluster](https://cloud.redpanda.com/clusters), and go to the **Cluster settings** page. 2. For AWS PrivateLink, click **Enable**. 3. On the Enable PrivateLink page, for Allowed principal ARNs, click **Add**, and enter the Amazon Resource Names (ARNs) for each AWS principal allowed to access the endpoint service. For example, for all principals in a specific account, use `arn:aws:iam:::root`. See the AWS documentation on [configuring an endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permission) for details. 4. Click **Add** after entering each ARN, and when finished, click **Enable**. 5. (Optional) To enable cross-region PrivateLink, add supported regions. See [Cross-region PrivateLink](#cross-region-privatelink). 6. It may take several minutes for your cluster to update. When the update is complete, the AWS PrivateLink status on the Cluster settings page changes from **In progress** to **Enabled**. > 📝 **NOTE** > > For help with issues when enabling PrivateLink, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#configure-privatelink-connection-to-redpanda-cloud)Configure PrivateLink connection to Redpanda Cloud When you have a PrivateLink-enabled cluster, create a VPC endpoint to connect your client VPC to your cluster. ### [](#get-cluster-domain)Get cluster domain Get the domain (`cluster_domain`) of the cluster from the bootstrap server URL in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. For example, if the bootstrap server URL is: `seed-3da65a4a.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com`. ```bash CLUSTER_DOMAIN= ``` > 📝 **NOTE** > > Use `` as the domain you target with your DNS conditional forward (optionally also `*.` if your DNS platform requires a wildcard). ### [](#get-name-of-privatelink-endpoint-service)Get name of PrivateLink endpoint service You need the service name to [create a VPC endpoint](#create-vpc-endpoint). You can find the service name on the **Cluster settings** page after PrivateLink is enabled, or in the **How to Connect** section of the cluster overview. ```bash PL_SERVICE_NAME= ``` With the service name stored, set up your client VPC to connect to the endpoint service. ### [](#set-up-the-client-vpc)Set up the client VPC If you are not using an existing VPC, you must create a new one. > ⚠️ **CAUTION** > > [VPC peering](../byoc/aws/vpc-peering-aws/) and PrivateLink will not work at the same time if you set them up on the same VPC where your Kafka clients run. PrivateLink endpoints take priority. > > VPC peering and PrivateLink can both be used at the same time if Kafka clients are connecting from distinct VPCs. For example, in a private Redpanda cluster, you can connect your internal Kafka clients over VPC peering, and enable PrivateLink for external services. The client VPC must be in the same region as your Redpanda cluster, unless you have configured [cross-region PrivateLink](#cross-region-privatelink). To create the VPC, run: ```bash # See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for # information on profiles and credential files REGION= PROFILE= aws ec2 create-vpc --region $REGION --profile $PROFILE --cidr-block 10.0.0.0/20 # Store the client VPC ID from the command output CLIENT_VPC_ID= ``` You can also use an existing VPC. You need the VPC ID to [modify its DNS attributes](#modify-vpc-dns-attributes). ### [](#modify-vpc-dns-attributes)Modify VPC DNS attributes To modify the VPC attributes, run: ```bash aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-hostnames "{\"Value\":true}" aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-support "{\"Value\":true}" ``` These commands enable DNS hostnames and resolution for instances in the VPC. ### [](#create-security-group)Create security group You need the security group ID `security_group_id` from the command output to [add security group rules](#add-security-group-rules). To create a security group, run: ```bash aws ec2 create-security-group --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --description "Redpanda endpoint service client security group" \ --group-name "redpanda-privatelink-sg" SECURITY_GROUP_ID= ``` ### [](#add-security-group-rules)Add security group rules The following example adds security group rules that work for any broker count by opening the documented per-broker port ranges. For PrivateLink, clients connect to individual ports for each broker in ranges 32000-32500 (Kafka API) and 35000-35500 (HTTP Proxy). Opening only a few ports by broker count can break producers/consumers for topics with many partitions. See [Private service connectivity network ports](../cloud-security-network/#private-service-connectivity-network-ports). > ⚠️ **CAUTION** > > The following example uses `0.0.0.0/0` as the CIDR range for illustration. In production, replace `0.0.0.0/0` with the specific CIDR range of your client VPC or on-premises network to limit exposure. ```bash # Allow Kafka API bootstrap (seed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30292 --cidr 0.0.0.0/0 # Allow Schema Registry aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30081 --cidr 0.0.0.0/0 # Allow HTTP Proxy bootstrap aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 30282 --cidr 0.0.0.0/0 # Allow Redpanda Cloud Data Plane API / Prometheus (if needed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 443 --cidr 0.0.0.0/0 # Private service connectivity broker port pools # Kafka API per-broker ports aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID \ --ip-permissions 'IpProtocol=tcp,FromPort=32000,ToPort=32500,IpRanges=[{CidrIp=0.0.0.0/0}]' # HTTP Proxy per-broker ports aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID \ --ip-permissions 'IpProtocol=tcp,FromPort=35000,ToPort=35500,IpRanges=[{CidrIp=0.0.0.0/0}]' ``` ### [](#create-vpc-subnet)Create VPC subnet You need the subnet ID `subnet_id` from the command output to [create a VPC endpoint](#create-vpc-endpoint). Run the following command, specifying the subnet availability zone name (for example, `us-west-2a`): ```bash aws ec2 create-subnet --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --availability-zone \ --cidr-block 10.0.1.0/24 SUBNET_ID= ``` You can also use an existing subnet from your VPC. You need the subnet ID to [create a VPC endpoint](#create-vpc-endpoint). ### [](#create-vpc-endpoint)Create VPC endpoint Create the interface VPC endpoint using the service name and subnet ID from the previous steps: ```bash aws ec2 create-vpc-endpoint \ --region $REGION --profile $PROFILE \ --vpc-id $CLIENT_VPC_ID \ --vpc-endpoint-type "Interface" \ --ip-address-type "ipv4" \ --service-name $PL_SERVICE_NAME \ --subnet-ids $SUBNET_ID \ --security-group-ids $SECURITY_GROUP_ID \ --private-dns-enabled ``` ## [](#access-redpanda-services-through-vpc-endpoint)Access Redpanda services through VPC endpoint After you have enabled PrivateLink for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the connection to the endpoint service from any VM or container in the client VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#cross-region-privatelink)Cross-region PrivateLink By default, AWS PrivateLink only allows connections from VPCs in the same region as the endpoint service. Cross-region PrivateLink enables clients in different AWS regions to connect to your Redpanda cluster through PrivateLink. For more information about AWS cross-region PrivateLink support, see the [AWS documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-cross-region). ### [](#prerequisites)Prerequisites - The Redpanda cluster must be deployed across multiple [availability zones](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#availability-zone-az) (multi-AZ). This is an AWS limitation for cross-region PrivateLink. ### [](#configure-supported-regions)Configure supported regions > 📝 **NOTE** > > The **Supported regions** option only appears in the UI for multi-AZ clusters. 1. In the Redpanda Cloud Console, select your [cluster](https://cloud.redpanda.com/clusters), and go to the cluster settings page. 2. In the AWS PrivateLink section, click **Edit** (or **Enable** if PrivateLink is not yet enabled). 3. In the **Supported regions** section, click **Add** to add a region from which PrivateLink endpoints can connect to your cluster. 4. Select an AWS region from the dropdown. The cluster’s home region is automatically included and not shown in the list. 5. Repeat to add additional regions as needed. 6. Click **Save** (or **Enable**) to apply the changes. After saving, the **Supported regions** row on the cluster settings page displays your configured regions. Clients in VPCs located in the supported regions can now create PrivateLink endpoints that connect to your Redpanda cluster. ## [](#disable-endpoint-service)Disable endpoint service On the Cluster settings page for the cluster, click **Disable** for PrivateLink. Existing connections are closed after the AWS PrivateLink service is disabled. To connect using PrivateLink again, you must re-enable the service. ## [](#suggested-reading)Suggested reading - [Configure AWS PrivateLink with the Cloud API](../aws-privatelink/) --- # Page 474: Networking: Dedicated **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated.md --- # Networking: Dedicated --- title: "Networking: Dedicated" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/index.adoc description: Learn how to create a VPC peering connection and how to configure private networking with AWS PrivateLink, Azure Private Link, and GCP Private Service Connect. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-07-17" --- - [AWS](aws/) Learn how to configure private networking for Dedicated clusters on AWS. - [Azure](azure/) Learn how to configure private networking for Dedicated clusters on Azure. - [GCP](gcp/) Learn how to configure private networking for Dedicated clusters on GCP. --- # Page 475: AWS **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/aws.md --- # AWS --- title: AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/aws/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/aws/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/aws/index.adoc description: Learn how to configure private networking for Dedicated clusters on AWS. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Add a Dedicated VPC Peering Connection](vpc-peering/) Use the Redpanda Cloud UI to set up VPC peering. - [Configure AWS PrivateLink in the Cloud Console](../../configure-privatelink-in-cloud-ui/) Set up AWS PrivateLink in the Redpanda Cloud Console. - [Configure AWS PrivateLink with the Cloud API](../../aws-privatelink/) Set up AWS PrivateLink with the Cloud API. --- # Page 476: Add a Dedicated VPC Peering Connection **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/aws/vpc-peering.md --- # Add a Dedicated VPC Peering Connection --- title: Add a Dedicated VPC Peering Connection latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/aws/vpc-peering page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/aws/vpc-peering.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/aws/vpc-peering.adoc description: Use the Redpanda Cloud UI to set up VPC peering. page-git-created-date: "2024-12-13" page-git-modified-date: "2026-02-02" --- A VPC peering connection is a networking connection between two VPCs. This connection allows the VPCs to communicate with each other as if they were within the same network. A route table routes traffic between the two VPCs using private IPv4 addresses. > 📝 **NOTE** > > Traffic is _not_ routed over the public internet. When you select a network for deploying your Redpanda Dedicated cluster, you have the option to select a private connection with VPC peering. The VPC peering connection connects your VPC to the Redpanda Cloud VPC. ## [](#prerequisites)Prerequisites - **VPC network**: Before you set up a peering connection in the Redpanda Cloud UI, you must have a VPC in your own account for Redpanda’s VPC to connect to. If you do not already have a VPC, log in to the AWS VPC Console and create one. - **Matching region**: VPC peering connections can only be established between networks created in the **same region**. Redpanda Cloud does not support inter-region VPC peering connections. - **Non-overlapping CIDR blocks**: The CIDR block for your VPC network cannot match or overlap with the CIDR block for the Redpanda Cloud VPC. > 💡 **TIP** > > Consider adding `rp` at the beginning of the VPC name to indicate that this VPC is for deploying a Redpanda cluster. ## [](#create-a-peering-connection)Create a peering connection To create a peering connection between your VPC and Redpanda’s VPC: 1. In the Redpanda Cloud UI, go to the **Overview** page for your cluster. 2. In the Details section, click the name of the Redpanda network. 3. On the Networking page, click **VPC peering walkthrough**. 4. For **Connection name**, enter a name. For example, the name might refer to the VPC ID of the VPC you created in AWS. 5. For **AWS account number**, enter the account number associated with the VPC you want to connect to. 6. For **AWS VPC ID**, enter the VPC ID by copying it from the AWS VPC Console. 7. Click **Create peering connection**. ## [](#accept-the-peering-connection-request)Accept the peering connection request Redpanda sends a peering request to the AWS VPC console. You must accept the request from the Redpanda VPC to set up the peering connection. 1. Log in to the Amazon VPC console. 2. Select the region where the VPC was created. 3. From the navigation menu, select **Peering Connections**. 4. Under **Requester VPC**, select the VPC you created for use with Redpanda. The status should say "Pending acceptance". 5. Open the **Actions** menu and select **Accept Request**. 6. In the confirmation dialog box, verify that the requester owner ID corresponds to the Redpanda account, and select **Yes, Accept**. 7. In the next confirmation dialog box, select **Modify my route tables now**. Follow the steps in the dialog box to add routes to your route tables in the AWS console. This enables traffic to flow between the two VPCs. ## [](#switch-from-vpc-peering-to-privatelink)Switch from VPC peering to PrivateLink VPC peering and PrivateLink use the same DNS hostnames (connection URLs) to connect to the Redpanda cluster. When you configure the PrivateLink DNS, those hostnames resolve to PrivateLink endpoints, which can interrupt existing VPC peering-based connections if clients aren’t ready. To enable PrivateLink without disrupting VPC peering connections, do a controlled DNS switchover: 1. Enable PrivateLink on the existing cluster and configure the PrivateLink connection to Redpanda Cloud, but **do not modify VPC DNS attributes yet**. See: [Enable PrivateLink on an existing cluster](../../../aws-privatelink/#enable-privatelink-endpoint-service-for-existing-clusters). 2. During a planned window, modify the VPC DNS attributes to switch the shared hostnames over to PrivateLink. --- # Page 477: Azure **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/azure.md --- # Azure --- title: Azure latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/azure/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/azure/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/azure/index.adoc description: Learn how to configure private networking for Dedicated clusters on Azure. page-git-created-date: "2024-12-13" page-git-modified-date: "2025-05-07" --- - [Configure Azure Private Link in the Cloud Console](../../azure-private-link-in-ui/) Set up Azure Private Link in the Redpanda Cloud Console. - [Configure Azure Private Link with the Cloud API](../../azure-private-link/) Set up Azure Private Link with the Cloud API. --- # Page 478: GCP **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/gcp.md --- # GCP --- title: GCP latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/gcp/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/gcp/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/gcp/index.adoc description: Learn how to configure private networking for Dedicated clusters on GCP. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Add a Dedicated VPC Peering Connection](vpc-peering-gcp/) Use the Redpanda Cloud UI to set up VPC peering. - [Configure GCP Private Service Connect in the Cloud Console](configure-psc-in-ui/) Set up GCP Private Service Connect in the Redpanda Cloud Console. - [Configure GCP Private Service Connect with the Cloud API](configure-psc-in-api/) Set up GCP Private Service Connect to securely access Redpanda Cloud. --- # Page 479: Configure GCP Private Service Connect with the Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/gcp/configure-psc-in-api.md --- # Configure GCP Private Service Connect with the Cloud API --- title: Configure GCP Private Service Connect with the Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/gcp/configure-psc-in-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/gcp/configure-psc-in-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/gcp/configure-psc-in-api.adoc description: Set up GCP Private Service Connect to securely access Redpanda Cloud. page-git-created-date: "2025-06-23" page-git-modified-date: "2026-02-02" --- > 📝 **NOTE** > > - This guide is for configuring GCP Private Service Connect using the Redpanda Cloud API. To configure and manage Private Service Connect on an existing cluster with **public** networking, you must use the Cloud API. See [Configure Private Service Connect in the Cloud UI](../../../configure-private-service-connect-in-cloud-ui/) to set up the endpoint service using the Redpanda Cloud UI. > > - The latest version of Redpanda GCP Private Service Connect (available March, 2025) supports AZ affinity. This allows requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs. > > - DEPRECATION: The original Redpanda GCP Private Service Connect is deprecated and will be removed in a future release. For more information, see [Deprecated features](../../../../manage/maintenance/#deprecated-features). The Redpanda GCP Private Service Connect service provides secure access to Redpanda Cloud from your VPC network. Traffic over Private Service Connect remains within GCP’s private network, avoiding the public internet. Your VPC network can access the Redpanda VPC network, but Redpanda cannot access your VPC network. Consider using Private Service Connect if you have multiple VPC networks and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each consumer VPC network can have one Private Service Connect endpoint connected to the Redpanda service attachment. > > - Private Service Connect allows overlapping [CIDR ranges](../../../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda [usage tier](../../../../reference/tiers/). Private Service Connect does not add extra connection limits. > > - You control from which GCP projects connections are allowed. ## [](#prerequisites)Prerequisites - In this guide, you use the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) to enable the Redpanda endpoint service for your clusters. Follow the steps on this page to [get an access token](#get-a-cloud-api-access-token). - Use the [gcloud](https://cloud.google.com/sdk/docs/install) command-line interface (CLI) to create the consumer-side resources, such as a VPC and forwarding rule, or to modify existing resources to use the Private Service Connect attachment created for your cluster. - The consumer VPC network must be in the same region as your Redpanda cluster. ## [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ## [](#create-a-new-cluster-with-private-service-connect)Create a new cluster with Private Service Connect 1. In the [Redpanda Cloud Console](https://cloud.redpanda.com/), go to **Resource groups** and select the resource group in which you want to create a cluster. Copy and store the resource group ID (UUID) from the URL in the browser. ```bash export RESOURCE_GROUP_ID= ``` 2. Make a request to the [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork) endpoint to create a network. ```bash NETWORK_POST_BODY=`cat << EOF { "network": { "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_DEDICATED", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "region": "" } } EOF` curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" $PUBLIC_API_ENDPOINT/v1/networks ``` Replace the following placeholder variables for the request body: - ``: The name for the network. - ``: The GCP region where the network will be created. - ``: The ID of the GCP project where your VPC is created. - ``: The name of your VPC. - ``: The name of the Google Storage bucket you created for the cluster. 3. Store the network ID (`metadata.network_id`) returned in the response to the Create Network request. ```bash export NETWORK_ID= ``` 4. Make a request to the [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) endpoint to create a Redpanda Cloud cluster with Private Service Connect enabled. ```bash export CLUSTER_POST_BODY=`cat << EOF { "cluster": { "cloud_provider": "CLOUD_PROVIDER_GCP", "connection_type": "CONNECTION_TYPE_PRIVATE", "type": "TYPE_DEDICATED", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "region": "", "zones": , "throughput_tier": "", "redpanda_version": "", "gcp_private_service_connect": { "enabled": true, "consumer_accept_list": } } } EOF` curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" $PUBLIC_API_ENDPOINT/v1/clusters ``` - ``: Provide a name for the new cluster. - ``: Choose a GCP region where the network will be created. - ``: Provide the list of GCP zones where the brokers will be deployed. Format: `["", "", ""]` - ``: Choose a Redpanda Cloud cluster tier. For example, `tier-1-gcp-v2-x86`. - ``: Choose the Redpanda Cloud version. - ``: The list of IDs of GCP projects from which Private Service Connect connection requests are accepted. Format: `[{"source": ""}, {"source": ""}, {"source": ""}]` ## [](#enable-private-service-connect-on-an-existing-cluster)Enable Private Service Connect on an existing cluster > ⚠️ **CAUTION** > > Enabling Private Service Connect on your VPC interrupts all communication on existing Redpanda bootstrap server and broker ports due to the change of private DNS resolution. > > To avoid disruption, consider using a staged approach. See: [Switch from VPC peering to Private Service Connect](../vpc-peering-gcp/#switch-from-vpc-peering-to-private-service-connect). 1. In the Redpanda Cloud Console, go to the cluster overview and copy the cluster ID from the **Details** section. ```bash export CLUSTER_ID= ``` 2. Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster to enable Private Service Connect. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "gcp_private_service_connect": { "enabled": true, "consumer_accept_list": } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` Replace the following placeholder: ``: A JSON list specifying the projects from which incoming connections will be accepted. All other sources are rejected. For example, `[{"source": "consumer-project-ID-1"},{"source": "consumer-project-ID-2"}]`. Wait for the cluster to apply the new configuration (around 15 minutes). The Private Service Connect attachment is available when the cluster update is complete. To monitor the service attachment creation, run the following `gcloud` command with the project ID: ```bash gcloud compute service-attachments list --project '' ``` ## [](#deploy-consumer-side-resources)Deploy consumer-side resources For each consumer VPC network, you must complete the following steps to successfully connect to the service attachment and use the Kafka API and other Redpanda services, such as HTTP Proxy. 1. In **Cluster settings**, copy the **DNS zone** and **Service attachment URL** under **Private Service Connect**. Use this URL to create the Private Service Connect endpoint in GCP. 2. Get the name of the consumer VPC network and the subnet ``, where the Private Service Connect endpoint forwarding rule will be created. 3. Create a Private Service Connect IP address for the endpoint: ```bash gcloud compute addresses create --subnet= --addresses= --region= ``` 4. Create the Private Service Connect endpoint forwarding rule: > 📝 **NOTE** > > If you enabled global access when creating the cluster, you must include the `--allow-psc-global-access` flag to configure the endpoint to accept client connections from different regions. ```bash gcloud compute forwarding-rules create --region= --network= --address= --target-service-attachment= ``` 5. Create firewall rules allowing egress traffic to the Private Service Connect endpoint: ```bash gcloud compute firewall-rules create redpanda-psc-egress \ --description="Allow access to Redpanda PSC endpoint" \ --network="" \ --direction="EGRESS" \ --destination-ranges= \ --allow="tcp:443,tcp:30081,tcp:30282,tcp:30292,tcp:32092-32141,tcp:35082-35131,tcp:32192-32241,tcp:35182-35231,tcp:32292-32341,tcp:35282-35331" ``` 6. Create a private DNS zone. Use the cluster **DNS zone** value as the DNS name: ```bash gcloud dns managed-zones create \ --project= \ --description="Redpanda Private Service Connect DNS zone" \ --dns-name="" \ --visibility="private" \ --networks="" ``` 7. In the newly-created DNS zone, create a wildcard DNS record using the cluster **DNS record** value: ```bash gcloud dns record-sets create '*.' \ --project= \ --zone="" \ --type="A" \ --ttl="300" \ --rrdatas="" ``` ## [](#access-redpanda-services-through-private-service-connect-endpoint)Access Redpanda services through Private Service Connect endpoint After you have enabled Private Service Connect for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud UI. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the Private Service Connect connection from any VM or container in the consumer VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or curl: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#disable-private-service-connect)Disable Private Service Connect Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster to disable Private Service Connect. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "gcp_private_service_connect": { "enabled": false } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` --- # Page 480: Configure GCP Private Service Connect in the Cloud Console **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/gcp/configure-psc-in-ui.md --- # Configure GCP Private Service Connect in the Cloud Console --- title: Configure GCP Private Service Connect in the Cloud Console latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/gcp/configure-psc-in-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/gcp/configure-psc-in-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/gcp/configure-psc-in-ui.adoc description: Set up GCP Private Service Connect in the Redpanda Cloud Console. page-git-created-date: "2025-06-23" page-git-modified-date: "2026-02-02" --- > 📝 **NOTE** > > - This guide is for configuring GCP Private Service Connect using the Redpanda Cloud Console. To configure and manage Private Service Connect on an existing cluster with **public** networking, you must use the [Cloud API for BYOC](../../../gcp-private-service-connect/) or the [Cloud API for Dedicated](../configure-psc-in-api/). > > - The latest version of Redpanda GCP Private Service Connect (available March, 2025) supports AZ affinity. This allows requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs. > > - DEPRECATION: The original Redpanda GCP Private Service Connect is deprecated and will be removed in a future release. For more information, see [Deprecated features](../../../../manage/maintenance/#deprecated-features). The Redpanda GCP Private Service Connect service provides secure access to Redpanda Cloud from your VPC network. Traffic over Private Service Connect remains within GCP’s private network, avoiding the public internet. Your VPC network can access the Redpanda VPC network, but Redpanda cannot access your VPC network. Consider using Private Service Connect if you have multiple VPC networks and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each consumer VPC network can have one Private Service Connect endpoint connected to the Redpanda service attachment. > > - Private Service Connect allows overlapping [CIDR ranges](../../../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda usage tier. Private Service Connect does not add extra connection limits. > > - You control from which GCP projects connections are allowed. ## [](#prerequisites)Prerequisites - Use the [gcloud](https://cloud.google.com/sdk/docs/install) command-line interface (CLI) to create the consumer-side resources, such as a consumer VPC network and forwarding rule, or to modify existing resources to use the Private Service Connect service attachment created for your cluster. - The consumer VPC network must be in the same region as your Redpanda cluster. ## [](#enable-private-service-connect-for-existing-clusters)Enable Private Service Connect for existing clusters 1. In the Redpanda Cloud Console, open your [cluster](https://cloud.redpanda.com/clusters), and click **Cluster settings**. 2. Under Private Service Connect, click **Enable**. 3. For the accepted consumers list, you need the GCP project IDs from which incoming connections will be accepted. 4. It may take several minutes for your cluster to update. When the update is complete, the Private Service Connect status in **Cluster settings** changes from **In progress** to **Enabled**. ## [](#deploy-consumer-side-resources)Deploy consumer-side resources For each consumer VPC network, you must complete the following steps to successfully connect to the service attachment and use the Kafka API and other Redpanda services, such as HTTP Proxy. 1. In **Cluster settings**, copy the **DNS zone** and **Service attachment URL** under **Private Service Connect**. Use this URL to create the Private Service Connect endpoint in GCP. 2. Get the name of the consumer VPC network and the subnet ``, where the Private Service Connect endpoint forwarding rule will be created. 3. Create a Private Service Connect IP address for the endpoint: ```bash gcloud compute addresses create --subnet= --addresses= --region= ``` 4. Create the Private Service Connect endpoint forwarding rule: > 📝 **NOTE** > > If you enabled global access when creating the cluster, you must include the `--allow-psc-global-access` flag to configure the endpoint to accept client connections from different regions. ```bash gcloud compute forwarding-rules create --region= --network= --address= --target-service-attachment= ``` 5. Create firewall rules allowing egress traffic to the Private Service Connect endpoint: ```bash gcloud compute firewall-rules create redpanda-psc-egress \ --description="Allow access to Redpanda PSC endpoint" \ --network="" \ --direction="EGRESS" \ --destination-ranges= \ --allow="tcp:443,tcp:30081,tcp:30282,tcp:30292,tcp:32092-32141,tcp:35082-35131,tcp:32192-32241,tcp:35182-35231,tcp:32292-32341,tcp:35282-35331" ``` 6. Create a private DNS zone. Use the cluster **DNS zone** value as the DNS name: ```bash gcloud dns managed-zones create \ --project= \ --description="Redpanda Private Service Connect DNS zone" \ --dns-name="" \ --visibility="private" \ --networks="" ``` 7. In the newly-created DNS zone, create a wildcard DNS record using the cluster **DNS record** value: ```bash gcloud dns record-sets create '*.' \ --project= \ --zone="" \ --type="A" \ --ttl="300" \ --rrdatas="" ``` ## [](#access-redpanda-services-through-private-service-connect-endpoint)Access Redpanda services through Private Service Connect endpoint After you have enabled Private Service Connect for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud UI. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the Private Service Connect connection from any VM or container in the consumer VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or curl: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#disable-private-service-connect)Disable Private Service Connect In **Cluster settings**, click **Disable**. Existing connections are closed after GCP Private Service Connect is disabled. To connect using Private Service Connect again, you must re-enable the service. --- # Page 481: Add a Dedicated VPC Peering Connection **URL**: https://docs.redpanda.com/redpanda-cloud/networking/dedicated/gcp/vpc-peering-gcp.md --- # Add a Dedicated VPC Peering Connection --- title: Add a Dedicated VPC Peering Connection latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: dedicated/gcp/vpc-peering-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: dedicated/gcp/vpc-peering-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/dedicated/gcp/vpc-peering-gcp.adoc description: Use the Redpanda Cloud UI to set up VPC peering. page-git-created-date: "2024-12-13" page-git-modified-date: "2026-02-02" --- A VPC peering connection is a networking connection between two VPCs. This connection allows the VPCs to communicate with each other as if they were within the same network. A route table routes traffic between the two VPCs using private IPv4 addresses. > 📝 **NOTE** > > Traffic is _not_ routed over the public internet. When you select a network for deploying your Redpanda Dedicated cluster, you have the option to select a private connection with VPC peering. The VPC peering connection connects your VPC to the Redpanda Cloud VPC. ## [](#prerequisites)Prerequisites - **VPC network**:Before setting up a peering connection in the Redpanda Cloud UI, you must have a VPC in your own account for Redpanda’s VPC to connect to. - **Matching region**: VPC peering connections can only be established between networks created in the **same region**. Redpanda Cloud does not support inter-region VPC peering connections. - **Non-overlapping CIDR blocks**: The CIDR block for your VPC network cannot match or overlap with the CIDR block for the Redpanda Cloud VPC. > 💡 **TIP** > > Consider adding `rp` at the beginning of the VPC name to indicate that this VPC is for deploying a Redpanda cluster. ## [](#create-a-peering-connection)Create a peering connection A peering becomes active after both Redpanda and GCP create a peering that targets the other project/network. 1. In the Redpanda Cloud UI, go to the **Overview** page for your cluster. 2. In the Details section, click the name of the Redpanda network. 3. On the Networking page for your cluster, click **VPC peering walkthrough**. 4. For **Connection name**, enter a name for the connection. For example, the name might refer to the VPC ID of the VPC you created in GCP. 5. For **GCP project ID**, enter the ID of the project that contains the VPC network you want to connect to. 6. For **VPC network name**, enter the name of the VPC network. 7. Click **Create peering connection**. ## [](#create-the-reciprocal-peering-connection)Create the reciprocal peering connection 1. In the Google Cloud console, go to VPC network peering - Create peering connection. 2. For **Name**, enter a name for the connection (for example, `rp-peering`). 3. Select your VPC network, project, and VPC network name. 4. Click **Create**. ## [](#switch-from-vpc-peering-to-private-service-connect)Switch from VPC peering to Private Service Connect VPC peering and Private Service Connect use the same DNS hostnames (connection URLs) to connect to the Redpanda cluster. When you configure the Private Service Connect DNS, those hostnames resolve to Private Service Connect endpoints, which can interrupt existing VPC peering-based connections if clients aren’t ready. To enable Private Service Connect without disrupting VPC peering connections, do a controlled DNS switchover: 1. Enable Private Service Connect on the existing cluster and deploy consumer-side resources, but **do not create private DNS yet**. See: [Enable Private Service Connect on an existing cluster](../configure-psc-in-api/#enable-private-service-connect-on-an-existing-cluster). 2. During a planned window, create the private DNS zone and records in your VPC to switch the shared hostnames over to Private Service Connect. --- # Page 482: Configure GCP Private Service Connect with the Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/networking/gcp-private-service-connect.md --- # Configure GCP Private Service Connect with the Cloud API --- title: Configure GCP Private Service Connect with the Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: gcp-private-service-connect page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: gcp-private-service-connect.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/gcp-private-service-connect.adoc description: Set up GCP Private Service Connect to securely access Redpanda Cloud. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-09-05" --- > 📝 **NOTE** > > - This guide is for configuring GCP Private Service Connect using the Redpanda Cloud API. To configure and manage Private Service Connect on an existing cluster with **public** networking, you must use the Cloud API. See [Configure Private Service Connect in the Cloud UI](../configure-private-service-connect-in-cloud-ui/) to set up the endpoint service using the Redpanda Cloud UI. > > - The latest version of Redpanda GCP Private Service Connect (available March, 2025) supports AZ affinity. This allows requests from Private Service Connect endpoints to stay within the same availability zone, avoiding additional networking costs. > > - DEPRECATION: The original Redpanda GCP Private Service Connect is deprecated and will be removed in a future release. For more information, see [Deprecated features](../../manage/maintenance/#deprecated-features). The Redpanda GCP Private Service Connect service provides secure access to Redpanda Cloud from your VPC network. Traffic over Private Service Connect remains within GCP’s private network, avoiding the public internet. Your VPC network can access the Redpanda VPC network, but Redpanda cannot access your VPC network. Consider using Private Service Connect if you have multiple VPC networks and could benefit from a more simplified approach to network management. > 📝 **NOTE** > > - Each consumer VPC network can have one Private Service Connect endpoint connected to the Redpanda service attachment. > > - Private Service Connect allows overlapping [CIDR ranges](../cidr-ranges/) in VPC networks. > > - The number of connections is limited only by your Redpanda [usage tier](../../reference/tiers/). Private Service Connect does not add extra connection limits. > > - You control from which GCP projects connections are allowed. ## [](#prerequisites)Prerequisites - In this guide, you use the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) to enable the Redpanda endpoint service for your clusters. Follow the steps on this page to [get an access token](#get-a-cloud-api-access-token). - Use the [gcloud](https://cloud.google.com/sdk/docs/install) command-line interface (CLI) to create the consumer-side resources, such as a VPC and forwarding rule, or to modify existing resources to use the Private Service Connect attachment created for your cluster. - The consumer VPC network must be in the same region as your Redpanda cluster. ## [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ## [](#create-a-new-byovpc-cluster-with-private-service-connect)Create a new BYOVPC cluster with Private Service Connect 1. In the [Redpanda Cloud UI](https://cloud.redpanda.com/), go to **Resource groups** and select the resource group in which you want to create a cluster. Copy and store the resource group ID (UUID) from the URL in the browser. ```bash export RESOURCE_GROUP_ID= ``` 2. Follow the BYOVPC steps to [configure the service project](../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/#configure-the-service-project) to configure IAM role, permissions, and firewall rules. 3. BYOVPC clusters need a NAT subnet with `purpose` set to `PRIVATE_SERVICE_CONNECT`. You can create the subnet using the `gcloud` CLI: ```bash gcloud compute networks subnets create \ --project= \ --network= \ --region= \ --range= \ --purpose=PRIVATE_SERVICE_CONNECT ``` Provide your values for the following placeholders: - ``: The name of the NAT subnet. - ``: The host GCP project ID. - ``: The name of the VPC being used for your Redpanda Cloud cluster. The name is used to identify this network in the Cloud UI. - ``: The GCP region of the Redpanda Cloud cluster. - ``: The CIDR range of the subnet. The mask should be at least `/29`. Each Private Service Connect connection takes up one IP address from the NAT subnet, so the CIDR must be able to accommodate all projects from which connections to the service attachment will be issued. See the GCP documentation for [creating a subnet for Private Service Connect](https://cloud.google.com/vpc/docs/configure-private-service-connect-producer#add-subnet-psc). 4. Create VPC firewall rules to allow Private Service Connect traffic. Use the `gcloud` CLI to create the firewall rules: > 📝 **NOTE** > > The firewall rules support up to 20 Redpanda brokers. If you have more than 20 brokers, or for help enabling Private Service Connect, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ```none gcloud compute firewall-rules create redpanda-psc \ --description="Allow access to Redpanda PSC endpoints" \ --network="" \ --project="" \ --direction="INGRESS" \ --target-tags="redpanda-node" \ --source-ranges="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,100.64.0.0/10" \ --allow="tcp:30181,tcp:30282,tcp:30292,tcp:31004,tcp:31082-31101,tcp:31182-31201,tcp:31282-31301,tcp:32092-32111,tcp:32192-32211,tcp:32292-32311" ``` 5. Make a request to the [`POST /v1/networks`](/api/doc/cloud-controlplane/operation/operation-networkservice_createnetwork) endpoint to create a network. ```bash NETWORK_POST_BODY=`cat << EOF { "network": { "cloud_provider": "CLOUD_PROVIDER_GCP", "cluster_type": "TYPE_BYOC", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "region": "", "customer_managed_resources": { "gcp": { "network_name": "", "network_project_id": "", "management_bucket": { "name" : "" } } } } } EOF` curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$NETWORK_POST_BODY" $PUBLIC_API_ENDPOINT/v1/networks ``` Replace the following placeholder variables for the request body: - ``: The name for the network. - ``: The GCP region where the network will be created. - ``: The ID of the GCP project where your VPC is created. - ``: The name of your VPC. - ``: The name of the Google Storage bucket you created for the cluster. 6. Store the network ID (`operation.metadata.network_id`) returned in the response to the Create Network request. ```bash export NETWORK_ID= ``` 7. Make a request to the [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) endpoint to create a Redpanda Cloud cluster with Private Service Connect enabled. ```bash export CLUSTER_POST_BODY=`cat << EOF { "cluster": { "cloud_provider": "CLOUD_PROVIDER_GCP", "connection_type": "CONNECTION_TYPE_PRIVATE", "type": "TYPE_BYOC", "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "network_id": "$NETWORK_ID", "region": "", "zones": , "throughput_tier": "", "redpanda_version": "", "gcp_private_service_connect": { "enabled": true, "consumer_accept_list": }, "customer_managed_resources": { "gcp": { "subnet": { "name":"", "secondary_ipv4_range_pods": { "name": "" }, "secondary_ipv4_range_services": { "name": "" }, "k8s_master_ipv4_range": "" }, "psc_nat_subnet_name": "", "agent_service_account": { "email": "" }, "connector_service_account": { "email": "" }, "console_service_account": { "email": "" }, "redpanda_cluster_service_account": { "email": "" }, "gke_service_account": { "email": "" }, "tiered_storage_bucket": { "name" : "" } } } } } EOF` curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" $PUBLIC_API_ENDPOINT/v1/clusters ``` > 📝 **NOTE** > > To also enable global access on the seed load balancer for the Private Service Connect endpoint, you must set `"gcp_private_service_connect.global_access_enabled": true` during cluster creation: > > ```none > "cluster": { > "gcp_private_service_connect": { > "enabled": true, > "consumer_accept_list": , > "global_access_enabled": true > } > } > ``` > > See [Enable Global Access](../byoc/gcp/enable-global-access/) for more information. Replace the following placeholders for the request body. Variables with a `byovpc_` prefix represent [customer-managed resources](../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/) that should have been created previously: - ``: Provide a name for the new cluster. - ``: Choose a GCP region where the network will be created. - ``: Provide the list of GCP zones where the brokers will be deployed. Format: `["", "", ""]` - ``: Choose a Redpanda Cloud cluster tier. For example, `tier-1-gcp-v2-x86`. - ``: Choose the Redpanda Cloud version. - ``: The list of IDs of GCP projects from which Private Service Connect connection requests are accepted. Format: `[{"source": ""}, {"source": ""}, {"source": ""}]` - ``: The name of the GCP subnet that was created for the cluster. - ``: The name of the IPv4 range designated for K8s pods. - ``: The name of the IPv4 range designated for services. - ``: The master IPv4 range. - ``: The name of the GCP subnet that was created for Private Service Connect NAT. - ``: The email for the agent service account. - ``: The email for the connectors service account. - ``: The email for the console service account. - ``: The email for the Redpanda service account. - ``: The email for the GKE service account. - ``: The name of the Google Storage bucket to use for Tiered Storage. ## [](#enable-private-service-connect-on-an-existing-byoc-or-byovpc-cluster)Enable Private Service Connect on an existing BYOC or BYOVPC cluster > ⚠️ **CAUTION** > > Enabling Private Service Connect on your VPC interrupts all communication on existing Redpanda bootstrap server and broker ports due to the change of private DNS resolution. > > To avoid disruption, consider using a staged approach to enable Private Service Connect. See: [Switch from VPC peering to Private Service Connect](../byoc/gcp/vpc-peering-gcp/#switch-from-vpc-peering-to-private-service-connect). 1. In the Redpanda Cloud UI, go to the cluster overview and copy the cluster ID from the **Details** section. ```bash export CLUSTER_ID= ``` 2. For a **BYOC cluster**: - Run `rpk cloud byoc gcp apply` to ensure that the PSC NAT subnets are created in your BYOC cluster. ```bash rpk cloud byoc gcp apply --redpanda-id="${CLUSTER_ID}" --project-id='' ``` - Run `gcloud compute networks subnets list` to find the newly-created Private Service Connect NAT subnet name. ```bash gcloud compute networks subnets list --filter psc2-nat --format="value(name)" ``` For a **BYOVPC cluster**: - [Configure the service project](../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/#configure-the-service-project) to configure the IAM role, permissions, and firewall rules. - Create a NAT subnet and firewall rules to allow Private Service Connect traffic. To do this, follow steps 3 and 4 in [Create a new BYOVPC cluster with Private Service Connect](#create-a-new-byovpc-cluster-with-private-service-connect). - Run `rpk cloud byoc gcp apply`: ```bash rpk cloud byoc gcp apply --redpanda-id="${CLUSTER_ID}" --project-id='' ``` - Make a request to the [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) endpoint to update the cluster to include the newly-created Private Service Connect NAT subnet. ```bash export PSC_NAT_SUBNET_NAME='' export CLUSTER_PATCH_BODY=`cat << EOF { "customer_managed_resources": { "gcp": { "psc_nat_subnet_name": "${PSC_NAT_SUBNET_NAME}" } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` Replace the following placeholder: ``: The name of the Private Service Connect NAT subnet. Use the fully-qualified name, for example `"projects//regions//subnetworks/"`. 3. Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster to enable Private Service Connect. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "gcp_private_service_connect": { "enabled": true, "consumer_accept_list": } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` Replace the following placeholder: ``: A JSON list specifying the projects from which incoming connections will be accepted. All other sources are rejected. For example, `[{"source": "consumer-project-ID-1"},{"source": "consumer-project-ID-2"}]`. Wait for the cluster to apply the new configuration (around 15 minutes). The Private Service Connect attachment is available when the cluster update is complete. To monitor the service attachment creation, run the following `gcloud` command with the project ID: ```bash gcloud compute service-attachments list --project '' ``` ## [](#deploy-consumer-side-resources)Deploy consumer-side resources For each consumer VPC network, you must complete the following steps to successfully connect to the service attachment and use the Kafka API and other Redpanda services, such as HTTP Proxy. 1. In **Cluster settings**, copy the **DNS zone** and **Service attachment URL** under **Private Service Connect**. Use this URL to create the Private Service Connect endpoint in GCP. 2. Get the name of the consumer VPC network and the subnet ``, where the Private Service Connect endpoint forwarding rule will be created. 3. Create a Private Service Connect IP address for the endpoint: ```bash gcloud compute addresses create --subnet= --addresses= --region= ``` 4. Create the Private Service Connect endpoint forwarding rule: > 📝 **NOTE** > > If you enabled global access when creating the cluster, you must include the `--allow-psc-global-access` flag to configure the endpoint to accept client connections from different regions. ```bash gcloud compute forwarding-rules create --region= --network= --address= --target-service-attachment= ``` 5. Create firewall rules allowing egress traffic to the Private Service Connect endpoint: ```bash gcloud compute firewall-rules create redpanda-psc-egress \ --description="Allow access to Redpanda PSC endpoint" \ --network="" \ --direction="EGRESS" \ --destination-ranges= \ --allow="tcp:443,tcp:30081,tcp:30282,tcp:30292,tcp:32092-32141,tcp:35082-35131,tcp:32192-32241,tcp:35182-35231,tcp:32292-32341,tcp:35282-35331" ``` 6. Create a private DNS zone. Use the cluster **DNS zone** value as the DNS name: ```bash gcloud dns managed-zones create \ --project= \ --description="Redpanda Private Service Connect DNS zone" \ --dns-name="" \ --visibility="private" \ --networks="" ``` 7. In the newly-created DNS zone, create a wildcard DNS record using the cluster **DNS record** value: ```bash gcloud dns record-sets create '*.' \ --project= \ --zone="" \ --type="A" \ --ttl="300" \ --rrdatas="" ``` ## [](#access-redpanda-services-through-private-service-connect-endpoint)Access Redpanda services through Private Service Connect endpoint After you have enabled Private Service Connect for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud UI. You can access Redpanda services such as Schema Registry and HTTP Proxy from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 30292 | | HTTP Proxy | 30282 | | Schema Registry | 30081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `30292` to access the Kafka API seed service. ```bash export RPK_BROKERS=':30292' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER ======= redpanda.rp-cki01qgth38kk81ard3g BROKERS ======= ID HOST PORT RACK 0* 0-3da65a4a-0532364.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32092 use2-az1 1 1-3da65a4a-63b320c.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32093 use2-az1 2 2-3da65a4a-36068dc.cki01qgth38kk81ard3g.fmc.dev.cloud.redpanda.com 32094 use2-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `30081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :30081/subjects ``` ### [](#access-http-proxy-seed-service)Access HTTP Proxy seed service Use port `30282` to access the Redpanda HTTP Proxy seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 :30282/topics ``` ## [](#test-the-connection)Test the connection You can test the Private Service Connect connection from any VM or container in the consumer VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or curl: 1. Set the following environment variables. ```bash export RPK_BROKERS=':30292' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ### rpk ```bash echo 'hello world' | rpk topic produce test-topic ``` ### curl ```bash curl -s \ -X POST \ "/topics/test-topic" \ -H "Content-Type: application/vnd.kafka.json.v2+json" \ -d '{ "records":[ { "value":"hello world" } ] }' ``` 4. Consume from the test topic. ### rpk ```bash rpk topic consume test-topic -n 1 ``` ### curl ```bash curl -s \ "/topics/test-topic/partitions/0/records?offset=0&timeout=1000&max_bytes=100000"\ -H "Accept: application/vnd.kafka.json.v2+json" ``` ## [](#disable-private-service-connect)Disable Private Service Connect Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to update the cluster to disable Private Service Connect. ```bash CLUSTER_PATCH_BODY=`cat << EOF { "gcp_private_service_connect": { "enabled": false } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/clusters/$CLUSTER_ID ``` --- # Page 483: Networking: Serverless **URL**: https://docs.redpanda.com/redpanda-cloud/networking/serverless.md --- # Networking: Serverless --- title: "Networking: Serverless" latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: serverless/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: serverless/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/serverless/index.adoc description: Learn how to configure private networking with AWS PrivateLink. page-git-created-date: "2026-02-02" page-git-modified-date: "2026-02-02" --- - [AWS](aws/) Learn how to configure private networking for Serverless clusters on AWS. --- # Page 484: AWS **URL**: https://docs.redpanda.com/redpanda-cloud/networking/serverless/aws.md --- # AWS --- title: AWS latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: serverless/aws/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: serverless/aws/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/serverless/aws/index.adoc description: Learn how to configure private networking for Serverless clusters on AWS. page-git-created-date: "2026-02-02" page-git-modified-date: "2026-02-02" --- - [Configure AWS PrivateLink in the Cloud Console](privatelink-ui/) Set up AWS PrivateLink in the Redpanda Cloud Console for Serverless clusters. - [Configure AWS PrivateLink with the Cloud API](privatelink-api/) Set up AWS PrivateLink with the Cloud API for Serverless clusters. --- # Page 485: Configure AWS PrivateLink with the Cloud API **URL**: https://docs.redpanda.com/redpanda-cloud/networking/serverless/aws/privatelink-api.md --- # Configure AWS PrivateLink with the Cloud API --- title: Configure AWS PrivateLink with the Cloud API latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: serverless/aws/privatelink-api page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: serverless/aws/privatelink-api.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/serverless/aws/privatelink-api.adoc description: Set up AWS PrivateLink with the Cloud API for Serverless clusters. page-git-created-date: "2026-02-02" page-git-modified-date: "2026-03-02" --- The Redpanda AWS PrivateLink endpoint service provides secure access to Redpanda Cloud from your own VPC. Traffic over PrivateLink does not go through the public internet because a PrivateLink connection is treated as its own private AWS service. While your VPC has access to the Redpanda VPC, Redpanda cannot access your VPC. Consider using the PrivateLink endpoint service if you have multiple VPCs and could benefit from a more simplified approach to network management. You can create a new Serverless cluster with PrivateLink enabled, or enable PrivateLink for existing clusters using either the Console or the API. > 📝 **NOTE** > > - Each client VPC can have one endpoint connected to the PrivateLink service. > > - PrivateLink allows overlapping [CIDR ranges](../../../cidr-ranges/) in VPC networks. > > - PrivateLink does not add extra connection limits. However, VPC peering is limited to 125 connections. See [How scalable is AWS PrivateLink?](https://aws.amazon.com/privatelink/faqs/) > > - You control which AWS principals are allowed to connect to the endpoint service. After [getting an access token](#get-a-cloud-api-access-token), you can [enable PrivateLink when creating a new Serverless cluster](#create-new-cluster-with-privatelink-endpoint-service-enabled), or you can [enable PrivateLink for existing Serverless clusters](#enable-privatelink-endpoint-service-for-existing-clusters). ## [](#requirements)Requirements - Install `rpk`. - Your Redpanda Serverless cluster and [VPC](#create-client-vpc) must be in the same region. - This guide uses the [Redpanda Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) to enable the Redpanda endpoint service for your Serverless clusters. Follow the steps below to [get an access token](#get-an-access-token). - Use the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to create a new client VPC or modify an existing one to use the PrivateLink endpoint. > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 350 seconds (350000 ms). > 📝 **NOTE** > > Enabling PrivateLink changes private DNS behavior for your cluster. Before configuring connections, review [DNS resolution with PrivateLink](#dns-resolution-with-privatelink). ## [](#get-a-cloud-api-access-token)Get a Cloud API access token 1. Save the base URL of the Redpanda Cloud API in an environment variable: ```bash export PUBLIC_API_ENDPOINT="https://api.cloud.redpanda.com" ``` 2. In the Redpanda Cloud UI, go to the [**Organization IAM**](https://cloud.redpanda.com/organization-iam) page, and select the **Service account** tab. If you don’t have an existing service account, you can create a new one. Copy and store the client ID and secret. ```bash export CLOUD_CLIENT_ID= export CLOUD_CLIENT_SECRET= ``` 3. Get an API token using the client ID and secret. You can click the **Request an API token** link to see code examples to generate the token. ```bash export AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id="$CLOUD_CLIENT_ID" \ --data client_secret="$CLOUD_CLIENT_SECRET" \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` You must send the API token in the `Authorization` header when making requests to the Cloud API. ## [](#create-a-privatelink-resource)Create a PrivateLink resource Before you can create a Serverless cluster with PrivateLink enabled, you must first create a PrivateLink resource in your resource group. 1. In the [Redpanda Cloud Console](https://cloud.redpanda.com/), go to **Resource groups** and select the resource group in which you want to create a PrivateLink resource. Copy and store the resource group ID (UUID) from the URL in the browser. ```bash export RESOURCE_GROUP_ID= ``` 2. Set the Serverless region where you want to create the PrivateLink resource (for example, `us-east-1`). ```bash export SERVERLESS_REGION= ``` 3. Create a new PrivateLink resource by calling [`POST /v1/serverless/private-links`](/api/doc/cloud-controlplane/operation/operation-serverlessprivatelinkservice_createserverlessprivatelink). ```bash PL_POST_BODY=`cat << EOF { "serverless_private_link": { "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "serverless_region": "$SERVERLESS_REGION", "cloudprovider": "CLOUD_PROVIDER_AWS", "aws_config": { "allowed_principals": [ "arn:aws:iam:::root" ] } } } EOF` PL_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$PL_POST_BODY" $PUBLIC_API_ENDPOINT/v1/serverless/private-links | jq -r .operation.metadata.serverless_private_link_id` echo $PL_ID ``` You can also update private links to add or remove allowed principals. ```bash PL_PATCH_BODY=`cat << EOF { "aws_config": { "allowed_principals": [ "arn:aws:iam:::root" ] } } EOF` curl -vv -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$PL_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/serverless/private-links/$PL_ID ``` Store the PrivateLink ID for use in the following steps. ## [](#create-new-cluster-with-privatelink-endpoint-service-enabled)Create new cluster with PrivateLink endpoint service enabled Using the `RESOURCE_GROUP_ID` and `SERVERLESS_PRIVATE_LINK_ID` from the previous step, create a new Serverless cluster with the endpoint service enabled by calling [`POST /v1/serverless/clusters`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_createserverlesscluster). In the following example, make sure to set your own values for the following fields: - `name` - `serverless_region`: for example, `"us-east-1"` - `private_link_id`: The ID of the PrivateLink resource created in the previous step - `networking_config.private` and `networking_config.public`: Valid values are `STATE_ENABLED` or `STATE_DISABLED`. At least one must be enabled. If neither is specified, `public` defaults to `STATE_ENABLED`. ```bash CLUSTER_POST_BODY=`cat << EOF { "serverless_cluster": { "name": "", "resource_group_id": "$RESOURCE_GROUP_ID", "serverless_region": "$SERVERLESS_REGION", "private_link_id": "$SERVERLESS_PRIVATE_LINK_ID", "networking_config": { "private": "STATE_ENABLED", "public": "STATE_ENABLED" } } } EOF` CLUSTER_ID=`curl -vv -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_POST_BODY" $PUBLIC_API_ENDPOINT/v1/serverless/clusters | jq -r .operation.metadata.cluster_id` echo $CLUSTER_ID ``` ## [](#enable-privatelink-endpoint-service-for-existing-clusters)Enable PrivateLink endpoint service for existing clusters 1. In the Redpanda Cloud Console, go to the cluster Overview and copy the cluster ID from the **Details** section. ```bash CLUSTER_ID= ``` 2. Get the PrivateLink ID from the cluster Overview page in the Redpanda Cloud Console. ```bash SERVERLESS_PRIVATE_LINK_ID= ``` 3. Make a [`PATCH /v1/serverless/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-serverlessclusterservice_updateserverlesscluster) request to update the cluster with the Redpanda PrivateLink Endpoint Service enabled. In the following example, make sure to set your own value for the following fields: - `private_link_id`: The ID of an existing PrivateLink resource in the same resource group - `networking_config.private`: Set to `STATE_ENABLED` to enable private access ```bash CLUSTER_PATCH_BODY=`cat << EOF { "networking_config": { "private": "STATE_ENABLED" }, "private_link_id": "$SERVERLESS_PRIVATE_LINK_ID" } EOF` curl -vv -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" $PUBLIC_API_ENDPOINT/v1/serverless/clusters/$CLUSTER_ID ``` ## [](#dns-resolution-with-privatelink)DNS resolution with PrivateLink PrivateLink changes how DNS resolution works for your cluster. When you query cluster hostnames outside the VPC that contains your PrivateLink endpoint, DNS may return private IP addresses that aren’t reachable from your location. To resolve cluster hostnames from other VPCs or on-premise networks, set up DNS forwarding using [Route 53 Resolver](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver.html): 1. In the VPC that contains your PrivateLink endpoint, create a Route 53 Resolver inbound endpoint. Ensure that the inbound endpoint’s security group allows inbound UDP/TCP port 53 from each VPC or on-prem network that will forward queries. 2. In each other VPC that must resolve the cluster domain, create a Resolver outbound endpoint and a forwarding rule for `` that targets the inbound endpoint IPs from the previous step. Associate the rule to those VPCs. The cluster domain is the suffix after the seed hostname. For example, if your bootstrap server URL is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com`. 3. For on-premises DNS, create a conditional forwarder for `` that forwards to the inbound endpoint IPs from the earlier step (over VPN/Direct Connect). > ❗ **IMPORTANT** > > Do not configure forwarding rules to target the VPC’s Amazon-provided DNS resolver (VPC base CIDR + 2). Rules must target the IP addresses of Route 53 Resolver endpoints. ## [](#configure-privatelink-connection-to-redpanda-cloud)Configure PrivateLink connection to Redpanda Cloud When you have a PrivateLink-enabled cluster, you can create an endpoint to connect your VPC and your cluster. ### [](#get-cluster-domain)Get cluster domain Get the domain (`cluster_domain`) of the cluster from the cluster details in the Redpanda Cloud Console. For example, if the bootstrap server URL is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com`. ```bash CLUSTER_DOMAIN= ``` > 📝 **NOTE** > > Use `` as the domain you target with your DNS conditional forward (optionally also `*.` if your DNS platform requires a wildcard). ### [](#get-name-of-privatelink-endpoint-service)Get name of PrivateLink endpoint service The service name is required to [create VPC private endpoints](#create-vpc-endpoint). Run the following command to get the service name: ```bash PL_SERVICE_NAME=`curl -X GET \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ $PUBLIC_API_ENDPOINT/v1/serverless/private-links/$SERVERLESS_PRIVATE_LINK_ID | jq -r .serverless_private_link.status.aws.vpc_endpoint_service_name` ``` ### [](#create-client-vpc)Create client VPC If you are not using an existing VPC, you must create a new one. The VPC region must be the same region where the Redpanda cluster is deployed. To create the VPC, run: ```bash # See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for # information on profiles and credential files REGION= PROFILE= aws ec2 create-vpc --region $REGION --profile $PROFILE --cidr-block 10.0.0.0/20 # Store the client VPC ID from the command output CLIENT_VPC_ID= ``` You can also use an existing VPC. You need the VPC ID to [modify its DNS attributes](#modify-vpc-dns-attributes). ### [](#modify-vpc-dns-attributes)Modify VPC DNS attributes To modify the VPC attributes, run: ```bash aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-hostnames "{\"Value\":true}" aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-support "{\"Value\":true}" ``` These commands enable DNS hostnames and resolution for instances in the VPC. ### [](#create-security-group)Create security group You need the security group ID `security_group_id` from the command output to [add security group rules](#add-security-group-rules). To create a security group, run: ```bash aws ec2 create-security-group --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --description "Redpanda endpoint service client security group" \ --group-name "${CLUSTER_ID}-sg" SECURITY_GROUP_ID= ``` ### [](#add-security-group-rules)Add security group rules The following example shows how to add security group rules to allow access to Redpanda services. ```bash # Allow Kafka API bootstrap (seed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9092 --cidr 0.0.0.0/0 # Allow Kafka API broker 1 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9093 --cidr 0.0.0.0/0 # Allow Kafka API broker 2 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9094 --cidr 0.0.0.0/0 # Allow Kafka API broker 3 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9095 --cidr 0.0.0.0/0 # Allow Schema Registry aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 8081 --cidr 0.0.0.0/0 # Allow Redpanda Cloud Data Plane API / Prometheus (if needed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 443 --cidr 0.0.0.0/0 ``` ### [](#create-vpc-subnet)Create VPC subnet You need the subnet ID `subnet_id` from the command output to [create a VPC endpoint](#create-vpc-endpoint). Run the following command, specifying the subnet Availability Zone name (for example, `us-west-2a`): ```bash aws ec2 create-subnet --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --availability-zone \ --cidr-block 10.0.1.0/24 SUBNET_ID= ``` ### [](#create-vpc-endpoint)Create VPC endpoint ```bash aws ec2 create-vpc-endpoint \ --region $REGION --profile $PROFILE \ --vpc-id $CLIENT_VPC_ID \ --vpc-endpoint-type "Interface" \ --ip-address-type "ipv4" \ --service-name $PL_SERVICE_NAME \ --subnet-ids $SUBNET_ID \ --security-group-ids $SECURITY_GROUP_ID \ --private-dns-enabled ``` ## [](#access-redpanda-services-through-vpc-endpoint)Access Redpanda services through VPC endpoint After you have enabled PrivateLink for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. You can access Redpanda services such as the Kafka API and Schema Registry from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 9092 | | Schema Registry | 8081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `9092` to access the Kafka API seed service. ```bash export RPK_BROKERS=':9092' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER redpanda.rp-cki01qgth38kk81ard3g BROKERS ID HOST PORT RACK 0* cki01qgth38kk81ard3g-0.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9093 use1-az1 1 cki01qgth38kk81ard3g-1.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9094 use1-az1 2 cki01qgth38kk81ard3g-2.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9095 use1-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `8081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :8081/subjects ``` ## [](#test-the-connection)Test the connection You can test the PrivateLink connection from any VM or container in the client VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':9092' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ```bash echo 'hello world' | rpk topic produce test-topic ``` 4. Consume from the test topic. ```bash rpk topic consume test-topic -n 1 ``` > 📝 **NOTE** > > If both public and private access are enabled for your cluster, `rpk cloud cluster select` will prompt you to choose between public or private connectivity when you select the cluster. ## [](#suggested-reading)Suggested reading - [Configure AWS PrivateLink in the Cloud Console](../privatelink-ui/) - [Manage Redpanda Cloud with Terraform](../../../../manage/terraform-provider/) --- # Page 486: Configure AWS PrivateLink in the Cloud Console **URL**: https://docs.redpanda.com/redpanda-cloud/networking/serverless/aws/privatelink-ui.md --- # Configure AWS PrivateLink in the Cloud Console --- title: Configure AWS PrivateLink in the Cloud Console latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: serverless/aws/privatelink-ui page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: serverless/aws/privatelink-ui.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/networking/pages/serverless/aws/privatelink-ui.adoc description: Set up AWS PrivateLink in the Redpanda Cloud Console for Serverless clusters. page-git-created-date: "2026-02-02" page-git-modified-date: "2026-03-02" --- The Redpanda AWS PrivateLink endpoint service provides secure access to Redpanda Cloud from your own VPC. Traffic over PrivateLink does not go through the public internet because these connections are treated as their own private AWS service. While your VPC has access to the Redpanda VPC, Redpanda cannot access your VPC. Consider using the PrivateLink endpoint service if you have multiple VPCs and could benefit from a more simplified approach to network management. You can create a new Serverless cluster with PrivateLink enabled, or enable PrivateLink for existing clusters using either the Console or the API. > 📝 **NOTE** > > - Each client VPC can have one endpoint connected to the PrivateLink service. > > - PrivateLink allows overlapping [CIDR ranges](../../../cidr-ranges/) in VPC networks. > > - PrivateLink does not add extra connection limits. However, VPC peering is limited to 125 connections. See [How scalable is AWS PrivateLink?](https://aws.amazon.com/privatelink/faqs/) > > - You control which AWS principals are allowed to connect to the endpoint service. ## [](#requirements)Requirements - Your Redpanda Serverless cluster and VPC must be in the same region. - Use the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to create a new client VPC or modify an existing one to use the PrivateLink endpoint. > 💡 **TIP** > > In Kafka clients, set `connections.max.idle.ms` to a value less than 350 seconds (350000 ms). ## [](#dns-resolution-with-privatelink)DNS resolution with PrivateLink PrivateLink changes how DNS resolution works for your cluster. When you query cluster hostnames outside the VPC that contains your PrivateLink endpoint, DNS may return private IP addresses that aren’t reachable from your location. To resolve cluster hostnames from other VPCs or on-premise networks, set up DNS forwarding using [Route 53 Resolver](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver.html): 1. In the VPC that contains your PrivateLink endpoint, create a Route 53 Resolver inbound endpoint. Ensure that the inbound endpoint’s security group allows inbound UDP/TCP port 53 from each VPC or on-prem network that will forward queries. 2. In each other VPC that must resolve the cluster domain, create a Resolver outbound endpoint and a forwarding rule for `` that targets the inbound endpoint IPs from the previous step. Associate the rule to those VPCs. The cluster domain is the suffix after the seed hostname. For example, if your bootstrap server URL is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com`. 3. For on-premises DNS, create a conditional forwarder for `` that forwards to the inbound endpoint IPs from the earlier step (over VPN/Direct Connect). > ❗ **IMPORTANT** > > Do not configure forwarding rules to target the VPC’s Amazon-provided DNS resolver (VPC base CIDR + 2). Rules must target the IP addresses of Route 53 Resolver endpoints. ## [](#enable-endpoint-service-for-existing-clusters)Enable endpoint service for existing clusters If you do not already have a PrivateLink resource for your cluster’s resource group and region, create one at the organization level on the Networking page. For Serverless clusters, click **Create PrivateLink**. 1. Select your [cluster](https://cloud.redpanda.com/clusters), and go to the **Cluster settings** page. 2. Under Networking, select **Private Access** and then select an existing PrivateLink. > 📝 **NOTE** > > For help with issues enabling PrivateLink, contact [Redpanda support](https://support.redpanda.com/hc/en-us/requests/new). ## [](#configure-privatelink-connection-to-redpanda-cloud)Configure PrivateLink connection to Redpanda Cloud When you have a PrivateLink-enabled cluster, you can create an endpoint to connect your VPC and your cluster. ### [](#get-cluster-domain)Get cluster domain Get the domain (`cluster_domain`) of the cluster from the cluster details in the Redpanda Cloud Console. For example, if the bootstrap server URL is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com:9092`, then `cluster_domain` is: `cki01qgth38kk81ard3g.any.us-east-1.aw.priv.prd.cloud.redpanda.com`. ```bash CLUSTER_DOMAIN= ``` > 📝 **NOTE** > > Use `` as the domain you target with your DNS conditional forward (optionally also `*.` if your DNS platform requires a wildcard). ### [](#get-name-of-privatelink-endpoint-service)Get name of PrivateLink endpoint service The service name is required to [create VPC private endpoints](#create-vpc-endpoint). You can find the service name in the Redpanda Cloud Console on the Networking page, or by using the Redpanda Cloud API. ```bash PL_SERVICE_NAME= ``` ### [](#create-client-vpc)Create client VPC If you are not using an existing VPC, you must create a new one. The VPC region must be the same region where the Redpanda cluster is deployed. To create the VPC, run: ```bash # See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for # information on profiles and credential files REGION= PROFILE= aws ec2 create-vpc --region $REGION --profile $PROFILE --cidr-block 10.0.0.0/20 # Store the client VPC ID from the command output CLIENT_VPC_ID= ``` You can also use an existing VPC. You need the VPC ID to [modify its DNS attributes](#modify-vpc-dns-attributes). ### [](#modify-vpc-dns-attributes)Modify VPC DNS attributes To modify the VPC attributes, run: ```bash aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-hostnames "{\"Value\":true}" aws ec2 modify-vpc-attribute --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --enable-dns-support "{\"Value\":true}" ``` These commands enable DNS hostnames and resolution for instances in the VPC. ### [](#create-security-group)Create security group You need the security group ID `security_group_id` from the command output to [add security group rules](#add-security-group-rules). To create a security group, run: ```bash aws ec2 create-security-group --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --description "Redpanda endpoint service client security group" \ --group-name "redpanda-privatelink-sg" SECURITY_GROUP_ID= ``` ### [](#add-security-group-rules)Add security group rules The following example shows how to add security group rules to allow access to Redpanda services: ```bash # Allow Kafka API bootstrap (seed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9092 --cidr 0.0.0.0/0 # Allow Kafka API broker 1 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9093 --cidr 0.0.0.0/0 # Allow Kafka API broker 2 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9094 --cidr 0.0.0.0/0 # Allow Kafka API broker 3 aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 9095 --cidr 0.0.0.0/0 # Allow Schema Registry aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 8081 --cidr 0.0.0.0/0 # Allow Redpanda Cloud Data Plane API / Prometheus (if needed) aws ec2 authorize-security-group-ingress --region $REGION --profile $PROFILE \ --group-id $SECURITY_GROUP_ID --protocol tcp --port 443 --cidr 0.0.0.0/0 ``` ### [](#create-vpc-subnet)Create VPC subnet You need the subnet ID `subnet_id` from the command output to [create a VPC endpoint](#create-vpc-endpoint). Run the following command, specifying the subnet Availability Zone name (for example, `us-west-2a`): ```bash aws ec2 create-subnet --region $REGION --profile $PROFILE --vpc-id $CLIENT_VPC_ID \ --availability-zone \ --cidr-block 10.0.1.0/24 SUBNET_ID= ``` ### [](#create-vpc-endpoint)Create VPC endpoint The following example shows how to create the VPC endpoint: ```bash aws ec2 create-vpc-endpoint \ --region $REGION --profile $PROFILE \ --vpc-id $CLIENT_VPC_ID \ --vpc-endpoint-type "Interface" \ --ip-address-type "ipv4" \ --service-name $PL_SERVICE_NAME \ --subnet-ids $SUBNET_ID \ --security-group-ids $SECURITY_GROUP_ID \ --private-dns-enabled ``` ## [](#access-redpanda-services-through-vpc-endpoint)Access Redpanda services through VPC endpoint After you have enabled PrivateLink for your cluster, your connection URLs are available in the **How to Connect** section of the cluster overview in the Redpanda Cloud Console. You can access Redpanda services such as the Kafka API and Schema Registry from the client VPC or virtual network; for example, from a compute instance in the VPC or network. The bootstrap server hostname is unique to each cluster. The service attachment exposes a set of bootstrap ports for access to Redpanda services. These ports load balance requests among brokers. Make sure you use the following ports for initiating a connection from a consumer: | Redpanda service | Default bootstrap port | | --- | --- | | Kafka API | 9092 | | Schema Registry | 8081 | ### [](#access-kafka-api-seed-service)Access Kafka API seed service Use port `9092` to access the Kafka API seed service. ```bash export RPK_BROKERS=':9092' rpk cluster info -X tls.enabled=true -X user= -X pass= ``` When successful, the `rpk` output should look like the following: ```bash CLUSTER redpanda.rp-cki01qgth38kk81ard3g BROKERS ID HOST PORT RACK 0* cki01qgth38kk81ard3g-0.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9093 use1-az1 1 cki01qgth38kk81ard3g-1.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9094 use1-az1 2 cki01qgth38kk81ard3g-2.any.us-east-1.aw.priv.prd.cloud.redpanda.com 9095 use1-az1 ``` ### [](#access-schema-registry-seed-service)Access Schema Registry seed service Use port `8081` to access the Schema Registry seed service. ```bash curl -vv -u : -H "Content-Type: application/vnd.schemaregistry.v1+json" --sslv2 --http2 :8081/subjects ``` ## [](#test-the-connection)Test the connection You can test the connection to the endpoint service from any VM or container in the client VPC. If configuring a client isn’t possible right away, you can do these checks using `rpk` or cURL: 1. Set the following environment variables. ```bash export RPK_BROKERS=':9092' export RPK_TLS_ENABLED=true export RPK_SASL_MECHANISM="" export RPK_USER= export RPK_PASS= ``` 2. Create a test topic. ```bash rpk topic create test-topic ``` 3. Produce to the test topic. ```bash echo 'hello world' | rpk topic produce test-topic ``` 4. Consume from the test topic. ```bash rpk topic consume test-topic -n 1 ``` ## [](#disable-endpoint-service)Disable endpoint service On the Cluster Settings page, deselect **Private Access**. Existing connections are closed after the AWS PrivateLink service is disabled. > 📝 **NOTE** > > Disabling private access in Redpanda Cloud does not delete the PrivateLink endpoint in your AWS account or the PrivateLink resource in Redpanda Cloud. Both remain provisioned and continue to incur charges until you explicitly delete them. ## [](#suggested-reading)Suggested reading - [Configure AWS PrivateLink with the Cloud API](../privatelink-api/) - [Manage Redpanda Cloud with Terraform](../../../../manage/terraform-provider/) --- # Page 487: Reference **URL**: https://docs.redpanda.com/redpanda-cloud/reference.md --- # Reference --- title: Reference latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/index.adoc description: Reference index page. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-06-07" --- - [Tiers and Regions](tiers/) When you create a cluster, you select your region. For BYOC and Dedicated clusters, you also select a usage tier, which provides tested workload configurations for throughput, partitions (pre-replication), and connections. - [API Reference](api-reference/) Use Redpanda API reference documentation to learn about and interact with API endpoints. - [Properties](properties/) Learn about the Redpanda properties you can configure. - [Data Transforms SDKs](data-transforms/sdks/) This page provides a link to all SDK reference docs for data transforms. - [rpk Commands](rpk/) Index page of Redpanda Cloud `rpk` commands in alphabetical order. - [Metrics Reference](public-metrics-reference/) Metrics to create your system dashboard. - [Glossary](glossary/) --- # Page 488: API Reference **URL**: https://docs.redpanda.com/redpanda-cloud/reference/api-reference.md --- # API Reference --- title: API Reference latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: api-reference page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: api-reference.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/api-reference.adoc description: Use Redpanda API reference documentation to learn about and interact with API endpoints. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-08-20" --- - [Redpanda Cloud Control Plane API Reference](/api/doc/cloud-controlplane/) Use the Control Plane API to manage resources in your Redpanda Cloud organization such as clusters and networks. - [Redpanda Cloud Data Plane API Reference](/api/doc/cloud-dataplane/) Use the Data Plane API to manage topics, ACLs, and connectors within each cluster. - [Schema Registry API Reference](/api/doc/schema-registry/) Manage schemas within a Redpanda cluster. See also: [Schema Registry documentation](../../manage/schema-reg/). - [HTTP Proxy API Reference](/api/doc/http-proxy/) HTTP Proxy is an HTTP server that exposes operations you can perform directly on a Redpanda cluster. Use the Redpanda HTTP Proxy API to perform a subset of actions that are also available through the Kafka API, but using simpler REST operations. See also: [Use Redpanda with the HTTP Proxy API](../../develop/http-proxy/). --- # Page 489: Golang SDK for Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/golang-sdk.md --- # Golang SDK for Data Transforms --- title: Golang SDK for Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/golang-sdk page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/golang-sdk.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/golang-sdk.adoc description: Work with data transform APIs in Redpanda using Go. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-05-07" --- The API reference is in the Go package documentation: - [Data transforms client library](https://pkg.go.dev/github.com/redpanda-data/redpanda/src/transform-sdk/go/transform#section-documentation): This library provides a framework for writing transforms. - [Schema Registry client library](https://pkg.go.dev/github.com/redpanda-data/redpanda/src/transform-sdk/go/transform/sr): This library provides data transforms with access to the Schema Registry built into Redpanda. --- # Page 490: JavaScript SDK for Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/js.md --- # JavaScript SDK for Data Transforms --- title: JavaScript SDK for Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/js/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/js/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/js/index.adoc description: This page provides a list of API packages available in the JavaScript SDK for data transforms. Explore the functionalities and methods offered by each package to implement data transforms in your applications. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- - [JavaScript API for Data Transforms](js-sdk/) Work with data transforms using JavaScript. - [JavaScript Schema Registry API for Data Transforms](js-sdk-sr/) Work with Schema Registry in data transforms using JavaScript. --- # Page 491: JavaScript Schema Registry API for Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/js/js-sdk-sr.md --- # JavaScript Schema Registry API for Data Transforms --- title: JavaScript Schema Registry API for Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/js/js-sdk-sr page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/js/js-sdk-sr.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/js/js-sdk-sr.adoc description: Work with Schema Registry in data transforms using JavaScript. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- This page contains the API reference for the Schema Registry client library of the data transforms JavaScript SDK. ## [](#functions)Functions ### [](#newClient)newClient() newClient (): <> Returns a client interface for interacting with Redpanda Schema Registry. #### [](#returns)Returns [`SchemaRegistryClient`](#SchemaRegistryClient) #### [](#example)Example ```js import { newClient, SchemaFormat } from "@redpanda-data/sr"; var sr_client = newClient(); const schema = { type: "record", name: "Example", fields: [ { "name": "a", "type": "long", "default": 0 }, { "name": "b", "type": "string", "default": "" } ] }; const subj_schema = sr_client.createSchema( "avro-value", { schema: JSON.stringify(schema), format: SchemaFormat.Avro, references: [], } ); ``` ### [](#decodeSchemaID)decodeSchemaID() decodeSchemaID (\`buf\`): <> #### [](#parameters)Parameters - `buf`: `string`, `ArrayBuffer`, or `Uint8Array` #### [](#returns-2)Returns [`DecodeResult`](#DecodeResult) in the same type as the given argument. ## [](#interfaces)Interfaces ### [](#DecodeResult)DecodeResult The result of a [`decodeSchemaID`](#decodeSchemaID) function. #### [](#properties)Properties - `id` (read only): The decoded schema ID - `rest` (read only): The remainder of the input buffer after stripping the encoded ID. ### [](#reference)Reference #### [](#properties-2)Properties - `name`: `string` - `subject`: `string` - `version`: `number` ### [](#schema)Schema #### [](#properties-3)Properties - `format` (read only): [`SchemaFormat`](#SchemaFormat) - `references` (read only): [`Reference`](#reference) - `schema` (read only): `string` ### [](#SchemaRegistryClient)SchemaRegistryClient Client interface for interacting with Redpanda Schema Registry. #### [](#methods)Methods - `createSchema(subject (string), [schema](#schema))`: [`SubjectSchema`](#SubjectSchema) - `lookupLatestSchema(subject (string))`: [`SubjectSchema`](#SubjectSchema) - `lookupSchemaById(id (number))`: [`Schema`](#schema) - `lookupSchemaByVersion(subject (string), version (number))`: [`SubjectSchema`](#SubjectSchema) ### [](#SubjectSchema)SubjectSchema #### [](#properties-4)Properties - `id` (read only): `number` - `schema` (read only): [`Schema`](#schema) - `subject` (read only): `string` - `version` (read only): `number` ## [](#enumerations)Enumerations ### [](#SchemaFormat)SchemaFormat #### [](#enumeration-members)Enumeration members - Avro: `0` - Protobuf: `1` - JSON: `2` ## [](#suggested-reading)Suggested reading [JavaScript API for Data Transforms](../js-sdk/) --- # Page 492: JavaScript API for Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/js/js-sdk.md --- # JavaScript API for Data Transforms --- title: JavaScript API for Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/js/js-sdk page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/js/js-sdk.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/js/js-sdk.adoc description: Work with data transforms using JavaScript. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- This page contains the API reference for the data transforms client library of the JavaScript SDK. ## [](#functions)Functions ### [](#OnRecordWritten)onRecordWritten() onRecordWritten (\`cb\`): \`void\` Registers a callback to be fired when a record is written to the input topic. This callback is triggered after the record has been written, fsynced to disk, and acknowledged by the producer. This method should be called in your script’s entry point. #### [](#parameters)Parameters - [`cb`](#OnRecordWrittenCallback) #### [](#returns)Returns `void` #### [](#example)Example ```ts import {onRecordWritten} from "@redpanda-data/transform-sdk"; // Copy the input data to the output topic. onRecordWritten((event, writer) => { writer.write(event.record); }); ``` ## [](#interfaces)Interfaces ### [](#OnRecordWrittenCallback)OnRecordWrittenCallback() OnRecordWrittenCallback : (\`event\`, \`writer\`) => \`void\` The callback type for [`OnRecordWritten`](#OnRecordWritten). #### [](#parameters-2)Parameters - [`event`](#OnRecordWrittenEvent): The event object representing the written record. - [`writer`](#RecordWriter): The writer object used to write transformed records to the output topics. #### [](#returns-2)Returns `void` ### [](#OnRecordWrittenEvent)OnRecordWrittenEvent An event generated after a write event within the broker. #### [](#properties)Properties - [`record`](#WrittenRecord) (read only): The record that was written as part of this event. ### [](#Record)Record A record within Redpanda, generated as a result of any transforms acting upon a written record. #### [](#properties-2)Properties - [`headers`](#RecordHeader) (optional, read only): The headers attached to this record. - `key` (optional, read only): The key for this record. The key can be `string`, `ArrayBuffer`, `Uint8Array`, or [`RecordData`](#RecordData). - `value` (optional, read only): The value for this record. The value can be `string`, `ArrayBuffer`, `Uint8Array`, or [`RecordData`](#RecordData). ### [](#RecordData)RecordData A wrapper around the underlying raw data in a record, similar to a JavaScript response object. #### [](#methods)Methods - `array()`: Returns the data as a raw byte array (`Uint8Array`). - `json()`: Parses the data as JSON. This is a more efficient version of `JSON.parse(text())`. Returns the parsed JSON. Throws an error if the payload is not valid JSON. - `text()`: Parses the data as a UTF-8 string. Returns the parsed string. Throws an error if the payload is not valid UTF-8. ### [](#RecordHeader)RecordHeader Records may have a collection of headers attached to them. Headers are opaque to the broker and are only a mechanism for the producer and consumers to pass information. #### [](#properties-3)Properties - `key` (optional, read only): The key for this header. The key can be `string`, `ArrayBuffer`, `Uint8Array`, or [`RecordData`](#RecordData). - `value` (optional, read only): The value for this header. The value can be `string`, `ArrayBuffer`, `Uint8Array`, or [`RecordData`](#RecordData). ### [](#RecordWriter)RecordWriter A writer for transformed records that are written to the output topics. ### [](#methods-2)Methods - `write([record](#Record))`: Write a record to the output topic. Returns `void`. Throws an error if there are errors writing the record. ### [](#WrittenRecord)WrittenRecord A persisted record written to a topic within Redpanda. It is similar to a `Record`, except that it only contains `RecordData` or `null`. #### [](#properties-4)Properties - [`headers`](#RecordHeader) (read only): The headers attached to this record. - `key` (read only): The key for this record. - [`value`](#RecordData) (optional, read only): The value for this record. ### [](#WrittenRecordHeader)WrittenRecordHeader Records may have a collection of headers attached to them. Headers are opaque to the broker and are only a mechanism for the producer and consumers to pass information. This interface is similar to a [`RecordHeader`](#RecordHeader), except that it only contains `RecordData` or `null`. #### [](#properties-5)Properties - `key` (optional, read only): The key for this header. - `value` (optional, read only): The value for this header. ## [](#suggested-reading)Suggested reading [JavaScript Schema Registry API for Data Transforms](../js-sdk-sr/) --- # Page 493: Rust SDK for Data Transforms **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/rust-sdk.md --- # Rust SDK for Data Transforms --- title: Rust SDK for Data Transforms latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/rust-sdk page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/rust-sdk.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/rust-sdk.adoc description: Work with data transforms using Rust. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- The API reference is in the crate documentation: - [Data transforms client library](https://docs.rs/redpanda-transform-sdk/latest/redpanda_transform_sdk/): This crate provides a framework for writing transforms. - [Schema Registry client library](https://docs.rs/redpanda-transform-sdk-sr/latest/redpanda_transform_sdk_sr/): This crate provides data transforms with access to the Schema Registry built into Redpanda. --- # Page 494: Data Transforms SDKs **URL**: https://docs.redpanda.com/redpanda-cloud/reference/data-transforms/sdks.md --- # Data Transforms SDKs --- title: Data Transforms SDKs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: data-transforms/sdks page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: data-transforms/sdks.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/data-transforms/sdks.adoc description: This page provides a link to all SDK reference docs for data transforms. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- - [Golang SDK for Data Transforms](../golang-sdk/) Work with data transform APIs in Redpanda using Go. - [Rust SDK for Data Transforms](../rust-sdk/) Work with data transforms using Rust. - [JavaScript SDK for Data Transforms](../js/) This page provides a list of API packages available in the JavaScript SDK for data transforms. Explore the functionalities and methods offered by each package to implement data transforms in your applications. --- # Page 495: Glossary **URL**: https://docs.redpanda.com/redpanda-cloud/reference/glossary.md --- # Glossary --- title: Glossary latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: glossary page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: glossary.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/glossary.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2024-07-25" --- ## [](#agentic-data-plane)Agentic Data Plane ### [](#agent2agent-a2a-protocol)Agent2Agent (A2A) protocol Communication protocol that enables AI agents to discover, coordinate with, and delegate tasks to other agents in a distributed system. The A2A protocol allows agents to work together by sharing capabilities, coordinating workflows, and distributing complex tasks across multiple specialized agents. It provides standardized messaging, capability discovery, and task delegation mechanisms for multi-agent systems. ### [](#agentic-data-plane-adp)Agentic Data Plane (ADP) Infrastructure layer that enables AI agents to discover, connect to, and interact with data sources and tools through standardized protocols. The Agentic Data Plane provides the underlying infrastructure for AI agents to access streaming data, invoke tools, and coordinate operations across distributed systems using protocols like MCP and A2A. ### [](#ai-agent)AI agent An autonomous program that uses AI models to interpret requests, make decisions, and interact with tools and data sources. AI agents can understand natural language instructions, reason about tasks, invoke tools through MCP servers, and coordinate multiple operations to accomplish complex workflows. ### [](#ai-token)AI token A credential used specifically for authenticating AI agents and authorizing their access to resources in agentic systems. AI tokens are specialized authentication credentials for AI agents, distinct from bearer tokens used in traditional API authentication. They enable agents to authenticate with MCP servers and access data plane resources while maintaining audit trails of agent operations. ### [](#context-window)context window The maximum amount of text (measured in tokens) that an LLM can process in a single request. The context window determines how much information an agent can consider at once, including the system prompt, conversation history, tool outputs, and retrieved documents. Larger context windows enable more sophisticated reasoning but may increase latency and cost. Common sizes range from 8K to 200K+ tokens. ### [](#frontier-model)frontier model The most advanced and capable AI models available, representing the current state-of-the-art in language understanding and reasoning. Frontier models are cutting-edge large language models with exceptional reasoning, planning, and problem-solving capabilities. Examples include GPT-4, Claude 3, and Gemini Ultra. These models are commonly used to power sophisticated AI agents that require advanced decision-making and tool orchestration. ### [](#large-language-model-llm)large language model (LLM) An AI model trained on vast amounts of text data that can understand and generate human-like text, reason about tasks, and follow instructions. Large language models power AI agents by providing natural language understanding, reasoning capabilities, and the ability to plan and execute complex tasks. LLMs interpret user requests, decide which tools to invoke, and synthesize responses based on retrieved data. ### [](#mcp-client)MCP client An AI application or agent that connects to MCP servers to discover and invoke tools. MCP clients use the Model Context Protocol to communicate with MCP servers, discovering available tools, understanding their capabilities, and invoking them with appropriate parameters. The client handles authentication, request formatting, and response processing. ### [](#mcp-server)MCP server A service that exposes tools and resources using the Model Context Protocol, allowing AI agents to discover and invoke them. MCP servers act as bridges between AI agents and external systems, providing standardized interfaces for tool discovery, invocation, and resource access. ### [](#model-context-protocol-mcp)Model Context Protocol (MCP) A standardized protocol that enables AI agents to connect with external data sources and tools in Redpanda. MCP provides a consistent interface for AI applications to discover and interact with data sources, services, and computational tools through Redpanda infrastructure. ### [](#observability-o11y)observability (o11y) The ability to understand a system’s internal state by examining its external outputs, such as traces, metrics, and logs. In Redpanda’s agentic systems, observability enables debugging agent behavior, monitoring performance, analyzing execution flow, and identifying bottlenecks through transcripts captured in the `redpanda.otel_traces` topic. ### [](#opentelemetry)OpenTelemetry Open-source observability framework that provides standardized APIs, libraries, and tools for capturing and exporting telemetry data. OpenTelemetry provides standardized APIs for capturing traces, metrics, and logs from applications. Redpanda agents and MCP servers automatically emit OpenTelemetry traces to the `redpanda.otel_traces` topic to provide complete observability into agentic system operations. ### [](#otlp-opentelemetry-protocol)OTLP (OpenTelemetry Protocol) Standard protocol for encoding and transmitting telemetry data defined by the OpenTelemetry project. OTLP is the OpenTelemetry Protocol specification for encoding and transmitting telemetry data. Redpanda stores spans in the `redpanda.otel_traces` topic using a Protobuf schema that closely follows the OTLP specification. ### [](#prompt)prompt Natural language instructions or context provided to an LLM to guide its behavior and responses. Prompts are the primary way to communicate with LLMs and AI agents. They can include instructions, examples, context, and questions that guide the model’s reasoning and output. Effective prompt design is critical for agent performance and reliability. ### [](#span)span A single unit of work within a trace representing one operation, such as a data processing operation or an external API call. Spans are organized in the Redpanda UI as parent-child relationships that show how operations flow through the system. Each span captures details about a specific operation, including timing, status, and metadata. ### [](#subagent)subagent A specialized AI agent that handles specific tasks or domains as part of a larger multi-agent system. Subagents are autonomous components within a multi-agent architecture that have focused expertise in particular domains or operations. They communicate with a parent agent or other subagents to accomplish complex workflows that require coordination across multiple specializations. ### [](#system-prompt)system prompt Initial instructions that define an agent’s role, capabilities, and behavioral guidelines. The system prompt is provided at the start of an agent session and establishes the agent’s identity, available tools, operating constraints, and response style. It remains active throughout the conversation and shapes all subsequent agent behavior and decision-making. ### [](#tool-invocation)tool invocation The process of an AI agent executing an MCP tool to perform a specific operation. Tool invocation occurs when an agent determines that it needs to use a tool, formats the request with appropriate parameters, sends it to the MCP server, and processes the response. Each invocation is captured in transcripts as spans for observability and debugging. ### [](#trace)trace The complete lifecycle of a request captured as a collection of spans, showing how operations relate to each other. A trace represents the complete lifecycle of a request (for example, a tool invocation from start to finish). A trace contains one or more spans organized hierarchically, showing how operations relate to each other. ### [](#transcript)transcript Complete observability record of agent or MCP server operations captured as OpenTelemetry traces and stored in the redpanda.otel\_traces topic. Transcripts capture tool invocations, agent reasoning steps, data processing operations, external API calls, error conditions, and performance metrics. They provide a complete record of how agentic systems operate, enabling debugging, auditing, and performance analysis. ## [](#redpanda-cloud)Redpanda Cloud ### [](#beta)beta Features in beta are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments. ### [](#byoc)BYOC Bring Your Own Cloud (BYOC) is a fully-managed Redpanda Cloud deployment where clusters run in your private cloud, so all data is contained in your own environment. Redpanda handles provisioning, operations, and maintenance. ### [](#byovnet)BYOVNet A Bring Your Own Virtual Network (BYOVNet) cluster allows you to deploy the Redpanda data plane into your existing Azure VNet to fully manage the networking lifecycle. Compared to standard BYOC, BYOVNet provides more security, but the configuration is more complex. ### [](#byovpc)BYOVPC A Bring Your Own Virtual Private Cloud (BYOVPC) cluster allows you to deploy the Redpanda data plane into your existing VPC on AWS or GCP to fully manage the networking lifecycle. Compared to standard BYOC, BYOVPC provides more security, but the configuration is more complex. ### [](#connector)connector Enables Redpanda to integrate with external systems, such as databases. ### [](#control-plane)control plane This part of Redpanda Cloud enforces rules in the data plane, including cluster management, operations, and maintenance. ### [](#data-plane)data plane This part of Redpanda Cloud contains Redpanda clusters and other components, such as Redpanda Console, Redpanda Operator, and `rpk`. It is managed by an agent that receives cluster specifications from the control plane. Sometimes used interchangeably with clusters. ### [](#data-sovereignty)data sovereignty Containing all your data in your environment. With BYOC, Redpanda handles provisioning, monitoring, and upgrades, but you manage your streaming data without Redpanda’s control plane ever seeing it. Additionally, with BYOVPC, the Redpanda Cloud agent doesn’t create any new resources or alter any settings in your account. ### [](#dedicated-cloud)Dedicated Cloud A fully-managed Redpanda Cloud deployment option where you host your data in Redpanda’s VPC, and Redpanda handles provisioning, operations, and maintenance. Dedicated clusters are single-tenant deployments that support private networking (for example, VPC peering to talk over private IPs) for better data isolation. ### [](#limited-availability)limited availability Features in limited availability (LA) are production-ready and are covered by Redpanda Support for early adopters. ### [](#pipeline)pipeline A single configuration file running in Redpanda Connect with an input connector, an output connector, and optional processors in between. A pipeline typically streams data into Redpanda from an operational source (like PostgreSQL) or streams data out of Redpanda into an analytical system (like Snowflake). ### [](#redpanda-cloud-2)Redpanda Cloud A fully-managed data streaming service deployed with Redpanda Console. It includes automated upgrades and patching, backup and recovery, data and partition balancing, and built-in connectors. Redpanda Cloud is available in Serverless, Dedicated, and Bring Your Own Cloud (BYOC) deployment options to suit different data sovereignty and infrastructure requirements. ### [](#redpanda-console)Redpanda Console The web-based UI for managing and monitoring Redpanda clusters and streaming workloads. You can also set up and manage connectors in Redpanda Console. Redpanda Console is an integral part of Redpanda Cloud, but it also can be used as a standalone program as part of a Redpanda Self-Managed deployment. ### [](#remote-mcp)Remote MCP An MCP server hosted in your Redpanda Cloud cluster. It exposes custom tools that AI assistants can call to access your data and workflows. ### [](#resource-group)resource group A container for Redpanda Cloud resources, including clusters and networks. You can rename your default resource group, and you can create more resource groups. For example, you may want different resource groups for production and testing. ### [](#serverless)Serverless Serverless is the fastest and easiest way to start data streaming. You host your data in Redpanda’s VPC, and Redpanda handles automatic scaling, provisioning, operations, and maintenance. ### [](#sink-connector)sink connector Exports data from a Redpanda cluster into a target system. ### [](#source-connector)source connector Imports data from a source system into a Redpanda cluster. ## [](#redpanda-connect)Redpanda Connect ### [](#mcp-tool)MCP tool A function that an AI assistant can call to perform a specific task, such as fetching data from an API, querying a database, or processing streaming data. Each tool is defined using Redpanda Connect components and annotated with MCP metadata. ### [](#processor)processor A Redpanda Connect component that transforms data, validates inputs, or calls external APIs within a processing pipeline. Processors are stateless components in Redpanda Connect that operate on individual messages or batches. When used as MCP tools, processors handle data transformations, validate parameters, and invoke external services. Each processor executes independently per request with no state maintained between invocations. ### [](#redpanda-connect-mcp-server)Redpanda Connect MCP server A process that exposes Redpanda Connect components to MCP clients. You write each tool’s logic using Redpanda Connect configurations and annotate them with MCP metadata so clients can discover and invoke them. ### [](#redpanda-connect-2)Redpanda Connect A framework for building data streaming applications using declarative YAML configurations. Redpanda Connect provides components such as inputs, processors, outputs, and caches to define data flows and transformations. ## [](#redpanda-core)Redpanda core ### [](#availability-zone-az)availability zone (AZ) One or more data centers served by high-bandwidth links with low latency, typically within a close distance of one another. ### [](#broker)broker An instance of Redpanda that stores and manages event streams. Multiple brokers join together to form a Redpanda cluster. Sometimes used interchangeably with node, but a node is typically a physical or virtual server. See also: node ### [](#client)client A producer application that writes events to Redpanda, or a consumer application that reads events from Redpanda. This could also be a client library, like librdkafka or franz-go. ### [](#cluster)cluster One or more brokers that work together to manage real-time data streaming, processing, and storage. ### [](#consumer-group)consumer group A set of consumers that cooperate to read data for better scalability. As group members arrive and leave, partitions are re-assigned so each member receives a proportional share. ### [](#consumer-offset)consumer offset The position of a consumer in a specific topic partition, to track which records they have read. A consumer offset of 3 means it has read messages 0-2 and will next read message 3. ### [](#consumer)consumer A client application that subscribes to Redpanda topics to asynchronously read events. ### [](#controller-broker)controller broker A broker that manages operational metadata for a Redpanda cluster and ensures replicas are distributed among brokers. At any given time, one active controller exists in a cluster. If the controller fails, another broker is automatically elected as the controller. ### [](#data-stream)data stream A continuous flow of events in real time that are produced and consumed by client applications. Redpanda is a data streaming platform. Also known as event stream. ### [](#event)event A record of something changing state at a specific time. Events can be generated by various sources, including sensors, applications, and devices. Producers write events to Redpanda, and consumers read events from Redpanda. ### [](#kafka-api)Kafka API Producers and consumers interact with Redpanda using the Kafka API. It uses the default port 9092. ### [](#learner)learner A broker that is a follower in a Raft group but is not part of quorum. In a Raft group, a broker can be in learner status. Learners are followers that cannot vote and so do not count towards quorum (the majority). They cannot be elected to leader nor can they trigger leader elections. Brokers can be promoted or demoted between learner and voter. New Raft group members start as learners. ### [](#listener)listener Configuration on a broker that defines how it should accept client or inter-broker connections. Each listener is associated with a specific protocol, hostname, and port combination. The listener defines where the broker should listen for incoming connections. ### [](#log)log An ordered, append-only, immutable sequence of records. The log is Redpanda’s core storage abstraction for event streams. At the conceptual level, topics represent replayable logs. Physically, each partition is implemented as a log file on disk, divided into segments. Redpanda uses the Raft consensus algorithm to coordinate writing data to log files and replicate them across brokers for fault tolerance. See also: topic, partition, segment ### [](#message)message One or more records representing individual events being transmitted. Redpanda transfers messages between producers and consumers. Sometimes used interchangeably with record. ### [](#node)node A machine, which could be a server, a virtual machine (instance), or a Docker container. Every node has its own disk. Partitions are stored locally on nodes. In Kubernetes, a Node is the machine that Redpanda runs on. Outside the context of Kubernetes, this term may be used interchangeably with broker, such as `node_id`. See also: broker ### [](#offset-commit)offset commit An acknowledgement that the event has been read. ### [](#offset)offset A unique integer assigned to each record to show its location in the partition. ### [](#pandaproxy)pandaproxy Original name for the subsystem of Redpanda that allows access to your data through a REST API. This name still appears in the HTTP Proxy API and the Schema Registry API. ### [](#partition-leader)partition leader Every Redpanda partition forms a Raft group with a single elected leader. This leader handles all writes, and it replicates data to followers to ensure that a majority of brokers store the data. ### [](#partition)partition A subset of events in a topic, like a log file. It is an ordered, immutable sequence of records. Partitions allow you to distribute a stream, which lets producers write messages in parallel and consumers read messages in parallel. Partitions are made up of segment files on disk. ### [](#producer)producer A client application that writes events to Redpanda. Redpanda stores these events in sequence and organizes them into topics. ### [](#rack)rack A failure zone that has one or more Redpanda brokers assigned to it. ### [](#raft)Raft The consensus algorithm Redpanda uses to coordinate writing data to log files and replicating that data across brokers. For more details, see [https://raft.github.io/](https://raft.github.io/) ### [](#record)record A self-contained data entity with a defined structure, representing a single event. Sometimes used interchangeably with message. ### [](#replicas)replicas Copies of partitions that are distributed across different brokers, so if one broker goes down, there is a copy of the data. ### [](#retention)retention The mechanism for determining how long Redpanda stores data on local disk or in object storage before purging it. ### [](#replication-factor)replication factor The number of partition copies in a cluster. This is set to 3 in Redpanda Cloud deployments and 1 (no replication) in Self-Managed deployments. A replication factor of at least 3 ensures that each partition has a copy of its data on at least one other broker. One replica acts as the leader, and the other replicas are followers. ### [](#schema)schema An external mechanism to describe the structure of data and its encoding. Schemas validate the structure and ensure that producers and consumers can connect with data in the same format. ### [](#seastar)Seastar An open-source thread-per-core C++ framework, which binds all work to physical cores. Redpanda is built on Seastar. For more details, see [https://seastar.io/](https://seastar.io/) ### [](#seed-server)seed server The initial set of brokers that a Redpanda broker contacts to join the cluster. Seed servers play a crucial role in cluster formation and recovery, acting as a point of reference for new or restarting brokers to understand the current topology of the cluster. ### [](#segment)segment Discrete part of a partition, used to break down a continuous stream into manageable chunks. You can set the maximum duration (`segment.ms`) or size (`segment.bytes`) for a segment to be open for writes. ### [](#serialization)serialization The process of converting a record into a format that can be stored. Deserialization is the process of converting a record back to the original state. Redpanda Schema Registry supports Avro and Protobuf serialization formats. ### [](#shard)shard A CPU core. ### [](#subject)subject A logical grouping or category for schemas. When data formats are updated, a new version of the schema can be registered under the same subject, allowing for backward and forward compatibility. ### [](#thread-per-core)thread-per-core Programming model that allows Redpanda to pin each of its application threads to a CPU core to avoid context switching and blocking. ### [](#topic-partition)topic partition A topic may be partitioned through multiple brokers. A "topic partition" represents this logical separation in Redpanda, which is managed natively by Raft. ### [](#topic)topic A logical stream of related events that are written to the same log. It can be divided into multiple partitions. A topic can have various clients writing events to it and reading events from it. ## [](#redpanda-features)Redpanda features ### [](#admin-api)Admin API A REST API used to manage and monitor Redpanda Self-Managed clusters. It uses the default port 9644. Note: The Redpanda Admin API is different from the [Kafka Admin API](https://kafka.apache.org/documentation/#adminapi). ### [](#cloud-topic)Cloud Topic A Redpanda topic type, Cloud Topics use object storage (S3, GCS, or MinIO) as the primary data store (rather than replicating data across brokers). Unlike standard Redpanda topics, Cloud Topics allow users with flexible latency requirements to lower or eliminate costs associated with cross-AZ networking. ### [](#compaction)compaction Feature that retains the latest value for each key within a partition while discarding older values. ### [](#controller-snapshot)controller snapshot Snapshot of the current cluster metadata state saved to disk, so broker startup is fast. ### [](#data-transforms)data transforms Framework to manipulate or enrich data written to Redpanda topics. You can develop custom data functions, which run asynchronously using a WebAssembly (Wasm) engine inside a Redpanda broker. ### [](#http-proxy)HTTP Proxy Redpanda HTTP Proxy (pandaproxy) allows access to your data through a REST API. It is built into the Redpanda binary and uses the default port 8082. ### [](#leader-pinning)Leader Pinning Feature that places a topic’s partition leaders in a preferred location, such as a cloud availability zone, to reduce networking costs and latency for nearby clients. ### [](#maintenance-mode)maintenance mode A state where a Redpanda broker temporarily doesn’t take any partition leaderships. It continues to store data as a follower. This is usually done for system maintenance or a rolling upgrade. ### [](#rack-awareness)rack awareness Feature that lets you distribute replicas of the same partition across different racks to minimize data loss and improve fault tolerance in the event of a rack failure. ### [](#rebalancing)rebalancing Process of moving partition replicas and transferring partition leadership for improved performance. Redpanda provides various topic-aware tools to balance clusters for best performance. - Leadership balancing changes where data is written to first, but it does not involve any data transfer. The partition leader regularly sends heartbeats to its followers. If a follower does not receive a heartbeat within a timeout, it triggers a new leader election. Redpanda also provides leadership balancing when brokers are added or decommissioned. - Partition replica balancing moves partition replicas to alleviate disk pressure and to honor the configured replication factor across brokers and the additional redundancy across failure domains (such as racks). Redpanda provides partition replica rebalancing when brokers are added or decommissioned. ### [](#rolling-upgrade)rolling upgrade The process of upgrading each broker in a Redpanda cluster, one at a time, to minimize disruption and ensure continuous availability. ### [](#rpk)rpk Redpanda’s command-line interface tool for managing Redpanda clusters. ### [](#remote-read-replica)Remote Read Replica A read-only topic that mirrors a topic on a different cluster, using data from Tiered Storage. ### [](#schema-registry)Schema Registry Redpanda Schema Registry (pandaproxy) is the interface for storing and managing event schemas. Producers and consumers register and retrieve schemas they use from the registry. It is built into the Redpanda binary and uses the default port 8081. ### [](#tiered-storage)Tiered Storage Feature that lets you offload log segments to object storage in near real-time, providing long-term data retention and topic recovery. ## [](#redpanda-in-kubernetes)Redpanda in Kubernetes ### [](#cert-manager)cert-manager A Kubernetes controller that simplifies the process of obtaining, renewing, and using certificates. For more details, see [https://cert-manager.io/docs/](https://cert-manager.io/docs/) ### [](#redpanda-helm-chart)Redpanda Helm chart Generates and applies all the manifest files you need for deploying Redpanda in Kubernetes. ### [](#redpanda-operator)Redpanda Operator Extends Kubernetes with custom resource definitions (CRDs), which allow Redpanda clusters to be treated as native Kubernetes resources. ## [](#redpanda-licenses)Redpanda licenses ### [](#redpanda-community-edition)Redpanda Community Edition Redpanda software that is available under the Redpanda Business Source License (BSL). These core features are free and source-available. ### [](#redpanda-enterprise-edition)Redpanda Enterprise Edition Redpanda software that is available under the Redpanda Community License (RCL). It includes the free features licensed with the Redpanda Community Edition, as well enterprise features, such as Tiered Storage, Remote Read Replicas, and Continuous Data Balancing. ### [](#self-managed)Self-Managed Redpanda Self-Managed refers to the product offering that includes both the Enterprise Edition and the Community Edition of Redpanda. Sometimes used interchangeably with self-hosted. ## [](#redpanda-security)Redpanda security ### [](#access-control-list-acl)access control list (ACL) A security feature used to define and enforce granular permissions to resources, ensuring only authorized users or applications can perform specific operations. ACLs act on principals. ### [](#advertised-listener)advertised listener The address a Redpanda broker broadcasts to producers, consumers, and other brokers. It specifies the hostname and port for connections to different listeners. Clients and other brokers use advertised listeners to connect to services such as the Admin API, Kafka API, and HTTP Proxy API. The advertised address might differ from the listener address in scenarios where brokers are behind a NAT, in a Docker container, or in Kubernetes. Advertised addresses ensure clients can reach the Redpanda brokers even in complex network setups. ### [](#authentication)authentication The process of verifying the identity of a principal, user, or service account. Also known as AuthN. ### [](#authorization)authorization The process of specifying access rights to resources. Access rights are enforced through roles or access control lists (ACLs). Also known as AuthZ. ### [](#bearer-token)bearer token An access token used for authentication and authorization in web applications and APIs. It holds user credentials, usually in the form of random strings of characters. ### [](#gbac)GBAC Group-based access control lets you manage Redpanda permissions at scale by assigning them to OIDC groups instead of individual users. GBAC lets you manage Redpanda permissions at scale using the groups that already exist in your identity provider (IdP). You define access once for a group and your IdP controls who belongs to it. You can grant permissions to groups in two ways: create ACLs with `Group:` principals, or assign groups as members of RBAC roles. Both approaches can be used independently or together. ### [](#identity-provider-idp)identity provider (IdP) A service that creates, maintains, and manages identity information while providing authentication services to applications. Identity providers authenticate users and issue tokens that applications can use to verify identity and access permissions. Common IdPs include Okta, Auth0, Azure AD, and Google Identity Platform. ### [](#openid-connect-oidc)OpenID Connect (OIDC) Authentication layer built on OAuth 2.0 that allows clients to verify user identity and obtain basic profile information. OpenID Connect provides a standardized way for applications to authenticate users through identity providers. In Redpanda’s agentic systems, OIDC enables secure authentication for AI agents and MCP servers accessing cloud resources. ### [](#principal)principal An authenticated identity (user, service account, or group) that Redpanda evaluates when enforcing ACLs and role assignments. Redpanda supports `User:` and `Group:` principal types. Permissions are granted to principals through ACLs or RBAC role assignments. ### [](#rbac)RBAC Role-based access control lets you assign users access to specific resources. ### [](#service-account)service account An identity independent of the user who created it that can be used to authenticate and perform operations. This is especially useful for authentication of machines. --- # Page 496: Properties **URL**: https://docs.redpanda.com/redpanda-cloud/reference/properties.md --- # Properties --- title: Properties latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: properties/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: properties/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/properties/index.adoc description: Learn about the Redpanda properties you can configure. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-04-08" --- - [Cluster Configuration Properties](cluster-properties/) Reference of cluster configuration properties. - [Object Storage Properties](object-storage-properties/) Reference of object storage properties. --- # Page 497: Cluster Configuration Properties **URL**: https://docs.redpanda.com/redpanda-cloud/reference/properties/cluster-properties.md --- # Cluster Configuration Properties --- title: Cluster Configuration Properties latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: properties/cluster-properties page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: properties/cluster-properties.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/properties/cluster-properties.adoc description: Reference of cluster configuration properties. page-git-created-date: "2025-04-08" page-git-modified-date: "2025-11-25" --- Cluster properties are configuration settings that control the behavior of a Redpanda cluster at a global level. Configuring cluster properties allows you to adapt Redpanda to specific workloads, optimize resource usage, and enable or disable features. For information on how to edit cluster properties, see [Configure Cluster Properties](../../../manage/cluster-maintenance/config-cluster/). > 📝 **NOTE** > > Some properties require a cluster restart for updates to take effect. This triggers a [long-running operation](../../../manage/api/cloud-byoc-controlplane-api/#lro) that can take several minutes to complete. ## [](#cluster-configuration)Cluster configuration ### [](#audit_enabled)audit\_enabled Enables or disables audit logging. When you set this to true, Redpanda checks for an existing topic named `_redpanda.audit_log`. If none is found, Redpanda automatically creates one for you. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#audit_excluded_principals)audit\_excluded\_principals List of user principals to exclude from auditing. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Example | ["User:principal1","User:principal2"] | ### [](#audit_excluded_topics)audit\_excluded\_topics List of topics to exclude from auditing. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Example | ["topic1","topic2"] | ### [](#audit_log_num_partitions)audit\_log\_num\_partitions Defines the number of partitions used by a newly-created audit topic. This configuration applies only to the audit log topic and may be different from the cluster or other topic configurations. This cannot be altered for existing audit log topics. | Property | Value | | --- | --- | | Type | integer | | Range | [-2147483648, 2147483647] | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Unit | Number of partitions per topic | | Requires restart | No | ### [](#auto_create_topics_enabled)auto\_create\_topics\_enabled Allow automatic topic creation. To prevent excess topics, this property is not supported on Redpanda Cloud BYOC and Dedicated clusters. You should explicitly manage topic creation for these Redpanda Cloud clusters. If you produce to a topic that doesn’t exist, the topic will be created with defaults if this property is enabled. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#data_transforms_binary_max_size)data\_transforms\_binary\_max\_size The maximum size for a deployable WebAssembly binary that the broker can store. | Property | Value | | --- | --- | | Type | integer | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | ### [](#data_transforms_enabled)data\_transforms\_enabled Enables WebAssembly-powered data transforms directly in the broker. When `data_transforms_enabled` is set to `true`, Redpanda reserves memory for data transforms, even if no transform functions are currently deployed. This memory reservation ensures that adequate resources are available for transform functions when they are needed, but it also means that some memory is allocated regardless of usage. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | ### [](#data_transforms_logging_line_max_bytes)data\_transforms\_logging\_line\_max\_bytes Transform log lines truncate to this length. Truncation occurs after any character escaping. | Property | Value | | --- | --- | | Type | integer | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Unit | Bytes | | Requires restart | No | ### [](#data_transforms_per_core_memory_reservation)data\_transforms\_per\_core\_memory\_reservation The amount of memory to reserve per core for data transform (Wasm) virtual machines. Memory is reserved on boot. The maximum number of functions that can be deployed to a cluster is equal to `data_transforms_per_core_memory_reservation` / `data_transforms_per_function_memory_limit`. | Property | Value | | --- | --- | | Type | integer | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | Yes | | Example | 26214400 | ### [](#data_transforms_per_function_memory_limit)data\_transforms\_per\_function\_memory\_limit The amount of memory to give an instance of a data transform (Wasm) virtual machine. The maximum number of functions that can be deployed to a cluster is equal to `data_transforms_per_core_memory_reservation` / `data_transforms_per_function_memory_limit`. | Property | Value | | --- | --- | | Type | integer | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | Yes | | Example | 5242880 | ### [](#default_redpanda_storage_mode)default\_redpanda\_storage\_mode Set the default storage mode for new topics. This value applies to any topic created without an explicit [`redpanda.storage.mode`](#redpandastoragemode) setting (that is, when the topic’s `redpanda.storage.mode` is `unset`). Accepted values: - `unset`: Defer to the legacy [`redpanda.remote.read`](#cloud_storage_enable_remote_read) and [`redpanda.remote.write`](#cloud_storage_enable_remote_write) topic properties for Tiered Storage configuration. - `local`: Store data only on local disks, with no object storage involvement. - `tiered`: Store data on local disks and replicate it to object storage using Tiered Storage. Equivalent to setting `redpanda.remote.read` and `redpanda.remote.write` to `true`. - `cloud`: Store data primarily in object storage using Cloud Topics. | Property | Value | | --- | --- | | Type | string (enum) | | Accepted values | local, tiered, cloud, unset | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Example | tiered | | Related topics | Manage Cloud Topics | ### [](#enable_consumer_group_metrics)enable\_consumer\_group\_metrics List of enabled consumer group metrics. Accepted values include: - `group`: Enables the [`redpanda_kafka_consumer_group_consumers`](../../public-metrics-reference/#redpanda_kafka_consumer_group_consumers) and [`redpanda_kafka_consumer_group_topics`](../../public-metrics-reference/#redpanda_kafka_consumer_group_topics) metrics. - `partition`: Enables the [`redpanda_kafka_consumer_group_committed_offset`](../../public-metrics-reference/#redpanda_kafka_consumer_group_committed_offset) metric. - `consumer_lag`: Enables the [`redpanda_kafka_consumer_group_lag_max`](../../public-metrics-reference/#redpanda_kafka_consumer_group_lag_max) and [`redpanda_kafka_consumer_group_lag_sum`](../../public-metrics-reference/#redpanda_kafka_consumer_group_lag_sum) metrics Enabling `consumer_lag` may add a small amount of additional processing overhead to the brokers, especially in environments with a high number of consumer groups or partitions. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Related topics | redpanda_kafka_consumer_group_consumersredpanda_kafka_consumer_group_topicsredpanda_kafka_consumer_group_committed_offsetredpanda_kafka_consumer_group_lag_maxredpanda_kafka_consumer_group_lag_sumconsumer_group_lag_collection_interval_secMonitor consumer group lag | ### [](#enable_schema_id_validation)enable\_schema\_id\_validation Controls whether Redpanda validates schema IDs in records and which topic properties are enforced. Values: - `none`: Schema validation is disabled (no schema ID checks are done). Associated topic properties cannot be modified. - `redpanda`: Schema validation is enabled. Only Redpanda topic properties are accepted. - `compat`: Schema validation is enabled. Both Redpanda and compatible topic properties are accepted. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Related topics | Server-Side Schema ID Validation | ### [](#enable_shadow_linking)enable\_shadow\_linking Enable creating shadow links from this cluster to a remote source cluster for data replication. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#group_offset_retention_sec)group\_offset\_retention\_sec Consumer group offset retention seconds. To disable offset retention, set this to null. | Property | Value | | --- | --- | | Type | integer | | Range | [-17179869184, 17179869183] | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | Yes | | Unit | Seconds | | Requires restart | No | ### [](#http_authentication)http\_authentication A list of supported HTTP authentication mechanisms. Accepted Values: `BASIC`, `OIDC`. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | ### [](#iceberg_catalog_base_location)iceberg\_catalog\_base\_location Base path for the object-storage-backed Iceberg catalog. After Iceberg is enabled, do not change this value. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | Yes | ### [](#iceberg_catalog_type)iceberg\_catalog\_type Iceberg catalog type that Redpanda will use to commit table metadata updates. Supported types: `rest`, `object_storage`. NOTE: You must set [`iceberg_rest_catalog_endpoint`](#iceberg_rest_catalog_endpoint) at the same time that you set `iceberg_catalog_type` to `rest`. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string (enum) | | Accepted values | object_storage, rest | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | ### [](#iceberg_default_catalog_namespace)iceberg\_default\_catalog\_namespace The default namespace (database name) for Iceberg tables. All tables created by Redpanda will be placed in this namespace within the Iceberg catalog. Supports nested namespaces as an array of strings. > ❗ **IMPORTANT** > > This value must be configured before enabling Iceberg and must not be changed afterward. Changing it will cause Redpanda to lose track of existing tables. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | ### [](#iceberg_default_partition_spec)iceberg\_default\_partition\_spec Default value for the `redpanda.iceberg.partition.spec` topic property that determines the partition spec for the Iceberg table corresponding to the topic. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Related topics | Enable Iceberg integration | ### [](#iceberg_delete)iceberg\_delete Default value for the `redpanda.iceberg.delete` topic property that determines if the corresponding Iceberg table is deleted upon deleting the topic. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#iceberg_disable_snapshot_tagging)iceberg\_disable\_snapshot\_tagging Whether to disable tagging of Iceberg snapshots. These tags are used to ensure that the snapshots that Redpanda writes are retained during snapshot removal, which in turn, helps Redpanda ensure exactly-once delivery of records. Disabling tags is therefore not recommended, but it may be useful if the Iceberg catalog does not support tags. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#iceberg_enabled)iceberg\_enabled Enables the translation of topic data into Iceberg tables. Setting `iceberg_enabled` to `true` activates the feature at the cluster level, but each topic must also set the `redpanda.iceberg.enabled` topic-level property to `true` to use it. If `iceberg_enabled` is set to `false`, then the feature is disabled for all topics in the cluster, overriding any topic-level settings. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | ### [](#iceberg_invalid_record_action)iceberg\_invalid\_record\_action Default value for the `redpanda.iceberg.invalid.record.action` topic property. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string (enum) | | Accepted values | drop, dlq_table | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#iceberg_rest_catalog_authentication_mode)iceberg\_rest\_catalog\_authentication\_mode The authentication mode for client requests made to the Iceberg catalog. Choose from: `none`, `bearer`, `oauth2`, and `aws_sigv4`. In `bearer` mode, the token specified in `iceberg_rest_catalog_token` is used unconditonally, and no attempts are made to refresh the token. In `oauth2` mode, the credentials specified in `iceberg_rest_catalog_client_id` and `iceberg_rest_catalog_client_secret` are used to obtain a bearer token from the URI defined by `iceberg_rest_catalog_oauth2_server_uri`. In `aws_sigv4` mode, the same AWS credentials used for cloud storage (see `cloud_storage_region`, `cloud_storage_access_key`, `cloud_storage_secret_key`, and `cloud_storage_credentials_source`) are used to sign requests to AWS Glue catalog with SigV4. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string (enum) | | Accepted values | none, bearer, oauth2, aws_sigv4, gcp | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | | Example | none | ### [](#iceberg_rest_catalog_aws_access_key)iceberg\_rest\_catalog\_aws\_access\_key AWS access key for Iceberg REST catalog SigV4 authentication. If not set, falls back to [`cloud_storage_access_key`](../object-storage-properties/#cloud_storage_access_key) when using aws\_sigv4 authentication mode. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | | Related topics | cloud_storage_access_key | ### [](#iceberg_rest_catalog_aws_region)iceberg\_rest\_catalog\_aws\_region AWS region for Iceberg REST catalog SigV4 authentication. If not set, falls back to [`cloud_storage_region`](../object-storage-properties/#cloud_storage_region) when using aws\_sigv4 authentication mode. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | | Related topics | cloud_storage_region | ### [](#iceberg_rest_catalog_aws_secret_key)iceberg\_rest\_catalog\_aws\_secret\_key AWS secret key for Iceberg REST catalog SigV4 authentication. If not set, falls back to [`cloud_storage_secret_key`](../object-storage-properties/#cloud_storage_secret_key) when using aws\_sigv4 authentication mode. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | | Related topics | cloud_storage_secret_key | ### [](#iceberg_rest_catalog_base_location)iceberg\_rest\_catalog\_base\_location Base URI for the Iceberg REST catalog. If unset, the REST catalog server determines the location. Some REST catalogs, like AWS Glue, require the client to set this. After Iceberg is enabled, do not change this value. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_client_id)iceberg\_rest\_catalog\_client\_id Iceberg REST catalog user ID. This ID is used to query the catalog API for the OAuth token. Required if catalog type is set to `rest` and `iceberg_rest_catalog_authentication_mode` is set to `oauth2`. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_client_secret)iceberg\_rest\_catalog\_client\_secret Secret used with the client ID to query the OAuth token endpoint for Iceberg REST catalog authentication. Required if catalog type is set to `rest` and `iceberg_rest_catalog_authentication_mode` is set to `oauth2`. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_crl)iceberg\_rest\_catalog\_crl The contents of a certificate revocation list for `iceberg_rest_catalog_trust`. Takes precedence over `iceberg_rest_catalog_crl_file`. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_endpoint)iceberg\_rest\_catalog\_endpoint URL of Iceberg REST catalog endpoint. NOTE: If you set [`iceberg_catalog_type`](#iceberg_catalog_type) to `rest`, you must also set this property at the same time. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | | Example | http://hostname:8181 | ### [](#iceberg_rest_catalog_oauth2_scope)iceberg\_rest\_catalog\_oauth2\_scope The OAuth scope used to retrieve access tokens for Iceberg catalog authentication. Only meaningful when `iceberg_rest_catalog_authentication_mode` is set to `oauth2` > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | Yes | ### [](#iceberg_rest_catalog_oauth2_server_uri)iceberg\_rest\_catalog\_oauth2\_server\_uri The OAuth URI used to retrieve access tokens for Iceberg catalog authentication. If left undefined, the deprecated Iceberg catalog endpoint `/v1/oauth/tokens` is used instead. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_request_timeout_ms)iceberg\_rest\_catalog\_request\_timeout\_ms Maximum length of time that Redpanda waits for a response from the REST catalog before aborting the request > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | integer | | Range | [-17592186044416, 17592186044415] | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Unit | Milliseconds | | Requires restart | No | ### [](#iceberg_rest_catalog_token)iceberg\_rest\_catalog\_token Token used to access the REST Iceberg catalog. If the token is present, Redpanda ignores credentials stored in the properties [`iceberg_rest_catalog_client_id`](#iceberg_rest_catalog_client_id) and [`iceberg_rest_catalog_client_secret`](#iceberg_rest_catalog_client_secret). Required if [`iceberg_rest_catalog_authentication_mode`](#iceberg_rest_catalog_authentication_mode) is set to `bearer`. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_trust)iceberg\_rest\_catalog\_trust The contents of a certificate chain to trust for the REST Iceberg catalog. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_rest_catalog_warehouse)iceberg\_rest\_catalog\_warehouse Warehouse to use for the Iceberg REST catalog. Redpanda queries the catalog to retrieve warehouse-specific configurations and automatically configures settings like the appropriate prefix. The prefix is appended to the catalog path (for example, `/v1/{prefix}/namespaces`). > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | Yes | ### [](#iceberg_target_lag_ms)iceberg\_target\_lag\_ms Default value for the `redpanda.iceberg.target.lag.ms` topic property, which controls how often the data in an Iceberg table is refreshed with new data from the corresponding Redpanda topic. Redpanda attempts to commit all data produced to the topic within the lag target, subject to resource availability. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | integer | | Range | [-17592186044416, 17592186044415] | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Unit | Milliseconds | | Requires restart | No | ### [](#iceberg_topic_name_dot_replacement)iceberg\_topic\_name\_dot\_replacement A replacement string for dots in topic names when creating Iceberg table names. Use this when your downstream systems don’t allow dots in table names. The replacement string cannot contain dots. Be careful to avoid table name collisions. Don’t change this value after creating any Iceberg topics with dots in their names. > 📝 **NOTE** > > This property is available only in Redpanda Cloud BYOC deployments. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | No | ### [](#kafka_connections_max_overrides)kafka\_connections\_max\_overrides A list of IP addresses for which Kafka client connection limits are overridden and don’t apply. For example, `(['127.0.0.1:90', '50.20.1.1:40']).`. | Property | Value | | --- | --- | | Type | array | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | | Example | ['127.0.0.1:90', '50.20.1.1:40'] | | Related topics | Limit client connections | ### [](#kafka_connections_max_per_ip)kafka\_connections\_max\_per\_ip Maximum number of Kafka client connections per IP address, per broker. If `null`, the property is disabled. | Property | Value | | --- | --- | | Type | integer | | Maximum | 4294967295 | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | Yes | | Requires restart | No | | Related topics | Limit client connections | ### [](#log_segment_ms)log\_segment\_ms Default lifetime of log segments. If `null`, the property is disabled, and no default lifetime is set. Any value under 60 seconds (60000 ms) is rejected. This property can also be set in the Kafka API using the Kafka-compatible alias, `log.roll.ms`. | Property | Value | | --- | --- | | Type | integer | | Range | [-17592186044416, 17592186044415] | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | Yes | | Unit | Milliseconds | | Requires restart | No | | Example | 3600000 | ### [](#oidc_discovery_url)oidc\_discovery\_url The URL pointing to the well-known discovery endpoint for the OIDC provider. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | ### [](#oidc_principal_mapping)oidc\_principal\_mapping Rule for mapping JWT payload claim to a Redpanda user principal. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | | Related topics | OpenID Connect authentication | ### [](#oidc_token_audience)oidc\_token\_audience A string representing the intended recipient of the token. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | ### [](#sasl_mechanisms)sasl\_mechanisms A list of supported SASL mechanisms. Accepted values: `SCRAM`, `GSSAPI`, `OAUTHBEARER`, `PLAIN`. Note that in order to enable PLAIN, you must also enable SCRAM. | Property | Value | | --- | --- | | Type | array (enum) | | Accepted values | GSSAPI, SCRAM, OAUTHBEARER, PLAIN | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | No | ### [](#schema_registry_enable_authorization)schema\_registry\_enable\_authorization Enables ACL-based authorization for Schema Registry requests. When `true`, Schema Registry uses ACL-based authorization instead of the default `public/user/superuser` authorization model. | Property | Value | | --- | --- | | Type | boolean | | Default | Available in the Redpanda Cloud Console (editable) | | Nullable | No | | Requires restart | No | ### [](#tls_min_version)tls\_min\_version The minimum TLS version that Redpanda clusters support. This property prevents client applications from negotiating a downgrade to the TLS version when they make a connection to a Redpanda cluster. | Property | Value | | --- | --- | | Type | string (enum) | | Accepted values | v1.0, v1.1, v1.2, v1.3 | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | No | | Requires restart | Yes | --- # Page 498: Object Storage Properties **URL**: https://docs.redpanda.com/redpanda-cloud/reference/properties/object-storage-properties.md --- # Object Storage Properties --- title: Object Storage Properties latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: properties/object-storage-properties page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: properties/object-storage-properties.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/properties/object-storage-properties.adoc description: Reference of object storage properties. page-git-created-date: "2025-05-21" page-git-modified-date: "2025-11-25" --- Object storage properties are a type of cluster property. Cluster properties are configuration settings that control the behavior of a Redpanda cluster at a global level. Configuring cluster properties allows you to adapt Redpanda to specific workloads, optimize resource usage, and enable or disable features. For information on how to edit cluster properties, see [Configure Cluster Properties](../../../manage/cluster-maintenance/config-cluster/). > 📝 **NOTE** > > Some properties require a cluster restart for updates to take effect. This triggers a [long-running operation](../../../manage/api/cloud-byoc-controlplane-api/#lro) that can take several minutes to complete. ## [](#cluster-configuration)Cluster configuration ### [](#cloud_storage_azure_container)cloud\_storage\_azure\_container The name of the Azure container to use with Tiered Storage. If `null`, the property is disabled. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | Yes | | Requires restart | Yes | ### [](#cloud_storage_azure_storage_account)cloud\_storage\_azure\_storage\_account The name of the Azure storage account to use with Tiered Storage. If `null`, the property is disabled. | Property | Value | | --- | --- | | Type | string | | Default | Available in the Redpanda Cloud Console (read-only) | | Nullable | Yes | | Requires restart | Yes | --- # Page 499: Metrics Reference **URL**: https://docs.redpanda.com/redpanda-cloud/reference/public-metrics-reference.md --- # Metrics Reference --- title: Metrics Reference latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: public-metrics-reference page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: public-metrics-reference.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/public-metrics-reference.adoc description: Metrics to create your system dashboard. page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- This section provides reference descriptions for the public metrics exported from Redpanda’s `/public_metrics` endpoint. > ❗ **IMPORTANT** > > In a live system, Redpanda metrics are exported only for features that are in use. For example, Redpanda does not export metrics for consumer groups if no groups are registered. > > To see the available public metrics in your system, query the `/public_metrics` endpoint: > > ```bash > curl http://:9644/public_metrics | grep "[HELP|TYPE]" > ``` ## [](#cluster-metrics)Cluster metrics ### [](#redpanda_cluster_brokers)redpanda\_cluster\_brokers Total number of fully commissioned brokers configured in the cluster. **Type**: gauge **Usage**: Create an alert if this gauge falls below a steady-state threshold, which may indicate that a broker has become unresponsive. **Available in Serverless**: No * * * ### [](#redpanda_cluster_controller_log_limit_requests_available_rps)redpanda\_cluster\_controller\_log\_limit\_requests\_available\_rps The upper limit on the requests per second (RPS) that the cluster controller log is allowed to process, segmented by command group. **Type**: gauge **Labels**: - `redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")` **Available in Serverless**: No * * * ### [](#redpanda_cluster_controller_log_limit_requests_dropped)redpanda\_cluster\_controller\_log\_limit\_requests\_dropped The cumulative number of requests dropped by the controller log because the incoming rate exceeded the available RPS limit. **Type**: counter **Labels**: - `redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")` **Usage**: A rising counter indicates that requests are being dropped, which could signal overload or misconfiguration. **Available in Serverless**: No * * * ### [](#redpanda_cluster_features_enterprise_license_expiry_sec)redpanda\_cluster\_features\_enterprise\_license\_expiry\_sec Number of seconds remaining until the Enterprise Edition license expires. **Type**: gauge **Usage**: - A value of `-1` indicates that no license is present. - A value of `0` signifies an expired license. Use this metric to proactively monitor license status and trigger alerts for timely renewal. **Available in Serverless**: No * * * ### [](#redpanda_cluster_latest_cluster_metadata_manifest_age)redpanda\_cluster\_latest\_cluster\_metadata\_manifest\_age The amount of time in seconds since the last time Redpanda uploaded metadata files to Tiered Storage for your cluster. A value of `0` indicates metadata has not yet been uploaded. When performing a whole cluster restore operation, metadata for new clusters will not include any changes made to a source cluster that is newer than this age. **Type**: gauge **Usage**: On a healthy system, this should not exceed the value set for `cloud_storage_cluster_metadata_upload_interval_ms`. You may consider setting an alert if this remains `0` for longer than 1.5 \* `cloud_storage_cluster_metadata_upload_interval_ms` as that may indicate a configuration issue. **Available in Serverless**: No * * * ### [](#redpanda_cluster_members_backend_queued_node_operations)redpanda\_cluster\_members\_backend\_queued\_node\_operations The number of node operations queued per shard that are awaiting processing by the backend. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_cluster_non_homogenous_fips_mode)redpanda\_cluster\_non\_homogenous\_fips\_mode Count of brokers whose FIPS mode configuration differs from the rest of the cluster. **Type**: gauge **Usage**: Indicates inconsistencies in security configurations that might affect compliance or interoperability. **Available in Serverless**: No * * * ### [](#redpanda_cluster_partition_moving_from_node)redpanda\_cluster\_partition\_moving\_from\_node Number of partition replicas that are in the process of being removed from a broker. **Type**: gauge **Usage**: A non-zero value can indicate ongoing or unexpected partition reassignments. Investigate if this metric remains elevated. **Available in Serverless**: No * * * ### [](#redpanda_cluster_partition_moving_to_node)redpanda\_cluster\_partition\_moving\_to\_node Number of partition replicas in the cluster currently being added or moved to a broker. **Type**: gauge **Usage**: When this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions causing partition replicas movement. **Available in Serverless**: No * * * ### [](#redpanda_cluster_partition_node_cancelling_movements)redpanda\_cluster\_partition\_node\_cancelling\_movements During a partition movement cancellation operation, the number of partition replicas that were being moved that now need to be canceled. **Type**: gauge **Usage**: Track this metric to verify that partition reassignments are proceeding as expected; persistent non-zero values may warrant further investigation. **Available in Serverless**: No * * * ### [](#redpanda_cluster_partition_num_with_broken_rack_constraint)redpanda\_cluster\_partition\_num\_with\_broken\_rack\_constraint During a partition movement cancellation operation, the number of partition replicas that were scheduled for movement but now require cancellation. **Type**: gauge **Usage**: A non-zero value may indicate issues in the partition reassignment process that need attention. **Available in Serverless**: No * * * ### [](#redpanda_cluster_partitions)redpanda\_cluster\_partitions Total number of logical partitions managed by the cluster. This includes partitions for the controller topic but excludes replicas. **Type**: gauge **Available in Serverless**: Yes * * * ### [](#redpanda_cluster_topics)redpanda\_cluster\_topics The total number of topics configured within the cluster. **Type**: gauge **Available in Serverless**: Yes * * * ### [](#redpanda_cluster_unavailable_partitions)redpanda\_cluster\_unavailable\_partitions Number of partitions that are unavailable due to a lack of quorum among their replica set. **Type**: gauge **Usage**: A non-zero value indicates that some partitions do not have an active leader. Consider increasing the number of brokers or the replication factor if this persists. **Available in Serverless**: No ## [](#debug-bundle-metrics)Debug bundle metrics ### [](#redpanda_debug_bundle_failed_generation_count)redpanda\_debug\_bundle\_failed\_generation\_count Running count of debug bundle generation failures, reported per shard. **Type**: counter **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_debug_bundle_last_failed_bundle_timestamp_seconds)redpanda\_debug\_bundle\_last\_failed\_bundle\_timestamp\_seconds Unix epoch timestamp of the last failed debug bundle generation, per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_debug_bundle_last_successful_bundle_timestamp_seconds)redpanda\_debug\_bundle\_last\_successful\_bundle\_timestamp\_seconds Unix epoch timestamp of the last successfully generated debug bundle, per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_debug_bundle_successful_generation_count)redpanda\_debug\_bundle\_successful\_generation\_count Running count of successfully generated debug bundles, reported per shard. **Type**: counter **Labels**: - `shard` **Available in Serverless**: No ## [](#iceberg-metrics)Iceberg metrics ### [](#redpanda_iceberg_rest_client_active_gets)redpanda\_iceberg\_rest\_client\_active\_gets Number of active GET requests. **Type**: gauge **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_active_puts)redpanda\_iceberg\_rest\_client\_active\_puts Number of active PUT requests. **Type**: gauge **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_active_requests)redpanda\_iceberg\_rest\_client\_active\_requests Number of active HTTP requests (includes PUT and GET). **Type**: gauge **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_commit_table_update_requests)redpanda\_iceberg\_rest\_client\_num\_commit\_table\_update\_requests Total number of requests sent to the `commit_table_update` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed)redpanda\_iceberg\_rest\_client\_num\_commit\_table\_update\_requests\_failed Number of requests sent to the `commit_table_update` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_create_namespace_requests)redpanda\_iceberg\_rest\_client\_num\_create\_namespace\_requests Total number of requests sent to the `create_namespace` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_create_namespace_requests_failed)redpanda\_iceberg\_rest\_client\_num\_create\_namespace\_requests\_failed Number of requests sent to the `create_namespace` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_create_table_requests)redpanda\_iceberg\_rest\_client\_num\_create\_table\_requests Total number of requests sent to the `create_table` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_create_table_requests_failed)redpanda\_iceberg\_rest\_client\_num\_create\_table\_requests\_failed Number of requests sent to the `create_table` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_drop_table_requests)redpanda\_iceberg\_rest\_client\_num\_drop\_table\_requests Total number of requests sent to the `drop_table` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_drop_table_requests_failed)redpanda\_iceberg\_rest\_client\_num\_drop\_table\_requests\_failed Number of requests sent to the `drop_table` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_get_config_requests)redpanda\_iceberg\_rest\_client\_num\_get\_config\_requests Total number of requests sent to the `config` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_get_config_requests_failed)redpanda\_iceberg\_rest\_client\_num\_get\_config\_requests\_failed Number of requests sent to the `config` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_load_table_requests)redpanda\_iceberg\_rest\_client\_num\_load\_table\_requests Total number of requests sent to the `load_table` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_load_table_requests_failed)redpanda\_iceberg\_rest\_client\_num\_load\_table\_requests\_failed Number of requests sent to the `load_table` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_oauth_token_requests)redpanda\_iceberg\_rest\_client\_num\_oauth\_token\_requests Total number of requests sent to the `oauth_token` endpoint. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_oauth_token_requests_failed)redpanda\_iceberg\_rest\_client\_num\_oauth\_token\_requests\_failed Number of requests sent to the `oauth_token` endpoint that failed. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_request_timeouts)redpanda\_iceberg\_rest\_client\_num\_request\_timeouts Total number of catalog requests that could no longer be retried because they timed out. This may occur if the catalog is not responding. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_num_transport_errors)redpanda\_iceberg\_rest\_client\_num\_transport\_errors Total number of transport errors (TCP and TLS). **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_total_gets)redpanda\_iceberg\_rest\_client\_total\_gets Number of completed GET requests. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_total_inbound_bytes)redpanda\_iceberg\_rest\_client\_total\_inbound\_bytes Total number of bytes received from the Iceberg REST catalog. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_total_outbound_bytes)redpanda\_iceberg\_rest\_client\_total\_outbound\_bytes Total number of bytes sent to the Iceberg REST catalog. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_total_puts)redpanda\_iceberg\_rest\_client\_total\_puts Number of completed PUT requests. **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_rest_client_total_requests)redpanda\_iceberg\_rest\_client\_total\_requests Number of completed HTTP requests (includes PUT and GET). **Type**: counter **Labels**: - `role` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_decompressed_bytes_processed)redpanda\_iceberg\_translation\_decompressed\_bytes\_processed Number of bytes consumed post-decompression for processing that may or may not succeed in being processed. For example, if Redpanda fails to communicate with the coordinator preventing processing of a batch, this metric still increases. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_dlq_files_created)redpanda\_iceberg\_translation\_dlq\_files\_created Number of created Parquet files for the dead letter queue (DLQ) table. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_files_created)redpanda\_iceberg\_translation\_files\_created Number of created Parquet files (not counting the DLQ table). **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_invalid_records)redpanda\_iceberg\_translation\_invalid\_records Number of invalid records handled by translation. **Type**: counter **Labels**: - `redpanda_cause` - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_parquet_bytes_added)redpanda\_iceberg\_translation\_parquet\_bytes\_added Number of bytes in created Parquet files (not counting the DLQ table). **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_parquet_rows_added)redpanda\_iceberg\_translation\_parquet\_rows\_added Number of rows in created Parquet files (not counting the DLQ table). **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_raw_bytes_processed)redpanda\_iceberg\_translation\_raw\_bytes\_processed Number of raw, potentially compressed bytes, consumed for processing that may or may not succeed in being processed. For example, if Redpanda fails to communicate with the coordinator preventing processing of a batch, this metric still increases. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_iceberg_translation_translations_finished)redpanda\_iceberg\_translation\_translations\_finished Number of finished translator executions. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ## [](#infrastructure-metrics)Infrastructure metrics ### [](#redpanda_cpu_busy_seconds_total)redpanda\_cpu\_busy\_seconds\_total Total time (in seconds) the CPU has been actively processing tasks. **Type**: counter **Usage**: Useful for tracking overall CPU utilization. **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_io_queue_total_read_ops)redpanda\_io\_queue\_total\_read\_ops Cumulative count of read operations processed by the I/O queue. **Type**: counter **Labels**: - `class=("default" | "compaction" | "raft")` - `iogroup` - `mountpoint` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_io_queue_total_write_ops)redpanda\_io\_queue\_total\_write\_ops Cumulative count of write operations processed by the I/O queue. **Type**: counter **Labels**: - `class=("default" | "compaction" | "raft")` - `iogroup` - `mountpoint` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_memory_allocated_memory)redpanda\_memory\_allocated\_memory Total memory allocated (in bytes) per CPU shard. **Type**: gauge **Labels**: - `shard` **Usage**: This metric includes reclaimable memory from the batch cache. For monitoring memory pressure, consider using `redpanda_memory_available_memory` instead, which provides a more accurate picture of memory that can be immediately reallocated. **Available in Serverless**: No * * * ### [](#redpanda_memory_available_memory)redpanda\_memory\_available\_memory Total memory (in bytes) available to a CPU shard—including both free and reclaimable memory. **Type**: gauge **Labels**: - `shard` **Usage**: This metric is more useful than `redpanda_memory_allocated_memory` for monitoring memory pressure, as it accounts for reclaimable memory in the batch cache. A low value indicates the system is approaching memory exhaustion. **Available in Serverless**: No * * * ### [](#redpanda_memory_available_memory_low_water_mark)redpanda\_memory\_available\_memory\_low\_water\_mark The lowest recorded available memory (in bytes) per CPU shard since the process started. **Type**: gauge **Labels**: - `shard` **Usage**: This metric helps identify the closest the system has come to memory exhaustion. Useful for capacity planning and understanding historical memory pressure patterns. **Available in Serverless**: No * * * ### [](#redpanda_memory_free_memory)redpanda\_memory\_free\_memory Total unallocated (free) memory in bytes available for each CPU shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_rpc_active_connections)redpanda\_rpc\_active\_connections Current number of active RPC client connections on a shard. **Type**: gauge **Labels**: - `redpanda_server=("kafka" | "internal")` **Available in Serverless**: No * * * ### [](#redpanda_rpc_received_bytes)redpanda\_rpc\_received\_bytes Number of bytes received from the clients in valid requests. The `redpanda_server` label supports the following options for this metric: - `kafka`: Data sent over the Kafka API - `internal`: Inter-broker traffic **Type**: counter **Labels**: - `redpanda_server` **Available in Serverless**: No * * * ### [](#redpanda_rpc_request_errors_total)redpanda\_rpc\_request\_errors\_total Cumulative count of RPC errors encountered, segmented by server type. **Type**: counter **Labels**: - `redpanda_server=("kafka" | "internal")` **Usage**: Use this metric to diagnose potential issues in RPC communication. **Available in Serverless**: No * * * ### [](#redpanda_rpc_request_latency_seconds)redpanda\_rpc\_request\_latency\_seconds Histogram capturing the latency (in seconds) for RPC requests. **Type**: histogram **Labels**: - `redpanda_server=("kafka" | "internal")` **Available in Serverless**: No * * * ### [](#redpanda_rpc_sent_bytes)redpanda\_rpc\_sent\_bytes Number of bytes sent to clients. The `redpanda_server` label supports the following options for this metric: - `kafka`: Data sent over the Kafka API - `internal`: Inter-broker traffic **Type**: counter **Labels**: - `redpanda_server` **Available in Serverless**: No * * * ### [](#redpanda_scheduler_runtime_seconds_total)redpanda\_scheduler\_runtime\_seconds\_total Total accumulated runtime (in seconds) for the task queue associated with each scheduling group per shard. **Type**: counter **Labels**: - `redpanda_scheduling_group=("admin" | "archival_upload" | "cache_background_reclaim" | "cluster" | "coproc" | "kafka" | "log_compaction" | "main" | "node_status" | "raft" | "raft_learner_recovery")` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_storage_cache_disk_free_bytes)redpanda\_storage\_cache\_disk\_free\_bytes Amount of free disk space (in bytes) available on the cache storage. **Type**: gauge **Usage**: Monitor this to ensure sufficient cache storage capacity. **Available in Serverless**: No * * * ### [](#redpanda_storage_cache_disk_free_space_alert)redpanda\_storage\_cache\_disk\_free\_space\_alert Alert indicator for cache storage free space, where: - `0` = OK - `1` = Low space - `2` = Degraded **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_storage_cache_disk_total_bytes)redpanda\_storage\_cache\_disk\_total\_bytes Total size of attached storage, in bytes. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_storage_disk_free_bytes)redpanda\_storage\_disk\_free\_bytes Amount of free disk space (in bytes) available on attached storage. **Type**: gauge **Available in Serverless**: No ### [](#redpanda_storage_disk_free_space_alert)redpanda\_storage\_disk\_free\_space\_alert Alert indicator for overall disk storage free space, where: - `0` = OK - `1` = Low space - `2` = Degraded **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_storage_disk_total_bytes)redpanda\_storage\_disk\_total\_bytes Total capacity (in bytes) of the attached storage. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_uptime_seconds_total)redpanda\_uptime\_seconds\_total Total system uptime (in seconds) representing the overall CPU runtime. **Type**: gauge **Available in Serverless**: No ## [](#raft-metrics)Raft metrics ### [](#redpanda_node_status_rpcs_received)redpanda\_node\_status\_rpcs\_received Total count of node status RPCs received by a broker. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_node_status_rpcs_sent)redpanda\_node\_status\_rpcs\_sent Total count of node status RPCs sent by a broker. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_node_status_rpcs_timed_out)redpanda\_node\_status\_rpcs\_timed\_out Total count of node status RPCs that timed out on a broker. **Type**: gauge **Available in Serverless**: No ## [](#redpanda-connect-metrics)Redpanda Connect metrics ### [](#input_connection_failed)input\_connection\_failed Number of input connections to the Redpanda Connect pipeline that have failed. **Type**: counter **Available in Serverless**: Yes * * * ### [](#input_connection_lost)input\_connection\_lost Number of times a connection to the upstream system is lost in a Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#input_connection_up)input\_connection\_up Number of active input connections to the Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#input_latency_ns)input\_latency\_ns Input latency for the Redpanda Connect pipeline. **Type**: summary **Available in Serverless**: Yes * * * ### [](#input_received)input\_received Number of records received by the Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_batch_sent)output\_batch\_sent Number of batches produced by the Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_connection_failed)output\_connection\_failed Number of output connections from the Redpanda Connect pipeline that have failed. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_connection_lost)output\_connection\_lost Number of times the connection to the downstream system is lost in a Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_connection_up)output\_connection\_up Number of active output connections from the Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_error)output\_error Number of errors encountered in the Redpanda Connect pipeline output. **Type**: counter **Available in Serverless**: Yes * * * ### [](#output_latency_ns)output\_latency\_ns Output latency for the Redpanda Connect pipeline. **Type**: summary **Available in Serverless**: Yes * * * ### [](#output_sent)output\_sent Records sent by the Redpanda Connect pipeline. **Type**: counter **Available in Serverless**: Yes * * * ### [](#processor_batch_received)processor\_batch\_received Number of record batches received as input in a Redpanda Connect pipeline processor. **Type**: counter **Available in Serverless**: Yes * * * ### [](#processor_batch_sent)processor\_batch\_sent Number of record batches produced as output by a Redpanda Connect pipeline processor. **Type**: counter **Available in Serverless**: Yes * * * ### [](#processor_error)processor\_error Number of errors encountered by a Redpanda Connect pipeline processor. **Type**: counter **Available in Serverless**: Yes * * * ### [](#processor_latency_ns)processor\_latency\_ns Processing time in nanoseconds of a Redpanda Connect pipeline processor. **Type**: summary **Available in Serverless**: Yes * * * ### [](#processor_received)processor\_received Number of records received as input in a Redpanda Connect pipeline processor. **Type**: counter **Available in Serverless**: Yes * * * ### [](#processor_sent)processor\_sent Number of records produced as output by a Redpanda Connect pipeline processor. **Type**: counter **Available in Serverless**: Yes ## [](#serverless-metrics)Serverless metrics ### [](#redpanda_serverless_ingress_bytes_total)redpanda\_serverless\_ingress\_bytes\_total Total raw bytes sent by clients to the Serverless cluster. **Type**: counter **Available in Serverless**: Yes * * * ### [](#redpanda_serverless_egress_bytes_total)redpanda\_serverless\_egress\_bytes\_total Total raw bytes sent by the Serverless cluster to clients. **Type**: counter **Available in Serverless**: Yes * * * ### [](#redpanda_serverless_connections_active)redpanda\_serverless\_connections\_active Number of active client connections. **Type**: gauge **Available in Serverless**: Yes * * * ### [](#redpanda_serverless_connections_created_total)redpanda\_serverless\_connections\_created\_total Total number of client connections created. **Type**: counter **Available in Serverless**: Yes * * * ### [](#redpanda_serverless_connections_duration_seconds)redpanda\_serverless\_connections\_duration\_seconds Total duration (in seconds) of client connections. **Type**: summary **Available in Serverless**: Yes * * * ### [](#redpanda_serverless_resource_limit)redpanda\_serverless\_resource\_limit Resource limits for the Serverless cluster: - Partition quota - Topic quota - Ingress quota - Egress quota - Connection quota To increase resource limits, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). **Type**: gauge **Available in Serverless**: Yes ## [](#service-metrics)Service metrics ### [](#redpanda_authorization_result)redpanda\_authorization\_result Cumulative count of authorization results, categorized by result type. **Type**: counter **Labels**: - `type` **Available in Serverless**: No * * * ### [](#redpanda_kafka_rpc_sasl_session_expiration_total)redpanda\_kafka\_rpc\_sasl\_session\_expiration\_total Total number of SASL session expirations observed. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_kafka_rpc_sasl_session_reauth_attempts_total)redpanda\_kafka\_rpc\_sasl\_session\_reauth\_attempts\_total Total number of SASL reauthentication attempts made by clients. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_kafka_rpc_sasl_session_revoked_total)redpanda\_kafka\_rpc\_sasl\_session\_revoked\_total Total number of SASL sessions that have been revoked. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_rest_proxy_request_latency_seconds)redpanda\_rest\_proxy\_request\_latency\_seconds Histogram capturing the latency (in seconds) for REST proxy requests. The measurement includes waiting for resource availability, processing, and response dispatch. **Type**: histogram **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_cache_schema_count)redpanda\_schema\_registry\_cache\_schema\_count Total number of schemas currently stored in the Schema Registry cache. **Type**: gauge **Available in Serverless**: Yes * * * ### [](#redpanda_schema_registry_cache_schema_memory_bytes)redpanda\_schema\_registry\_cache\_schema\_memory\_bytes Memory usage (in bytes) by schemas stored in the Schema Registry cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_cache_subject_count)redpanda\_schema\_registry\_cache\_subject\_count Count of subjects stored in the Schema Registry cache. **Type**: gauge **Labels**: - `deleted` **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_cache_subject_version_count)redpanda\_schema\_registry\_cache\_subject\_version\_count Count of versions available for each subject in the Schema Registry cache. **Type**: gauge **Labels**: - `deleted` - `subject` **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_inflight_requests_memory_usage_ratio)redpanda\_schema\_registry\_inflight\_requests\_memory\_usage\_ratio Ratio of memory used by in-flight requests in the Schema Registry, reported per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_inflight_requests_usage_ratio)redpanda\_schema\_registry\_inflight\_requests\_usage\_ratio Usage ratio for in-flight Schema Registry requests, reported per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_queued_requests_memory_blocked)redpanda\_schema\_registry\_queued\_requests\_memory\_blocked Count of Schema Registry requests queued due to memory constraints, reported per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_schema_registry_request_errors_total)redpanda\_schema\_registry\_request\_errors\_total Total number of errors encountered by the Schema Registry, categorized by status code. **Type**: counter **Labels**: - `redpanda_status=("5xx" | "4xx" | "3xx")` **Available in Serverless**: Yes * * * ### [](#redpanda_schema_registry_request_latency_seconds)redpanda\_schema\_registry\_request\_latency\_seconds Histogram capturing the latency (in seconds) for Schema Registry requests. **Type**: histogram **Available in Serverless**: Yes ## [](#partition-metrics)Partition metrics ### [](#redpanda_kafka_max_offset)redpanda\_kafka\_max\_offset High watermark offset for a partition, used to calculate consumer group lag. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_partition` - `redpanda_topic` **Related topics**: - [Consumer group lag](../../manage/monitor-cloud/#consumer-group-lag) **Available in Serverless**: No * * * ### [](#redpanda_kafka_request_bytes_total)redpanda\_kafka\_request\_bytes\_total Total number of bytes read from or written to the partitions of a topic. The total may include fetched bytes that are not returned to the client. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` - `redpanda_request=("produce" | "consume")` **Available in Serverless**: Yes * * * ### [](#redpanda_kafka_under_replicated_replicas)redpanda\_kafka\_under\_replicated\_replicas Number of partition replicas that are live yet lag behind the latest offset, [redpanda\_kafka\_max\_offset](#redpanda_kafka_max_offset). **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_partition` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_raft_leadership_changes)redpanda\_raft\_leadership\_changes Total count of leadership changes (such as successful leader elections) across all partitions for a given topic. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_raft_learners_gap_bytes)redpanda\_raft\_learners\_gap\_bytes Total number of bytes that must be delivered to learner replicas to bring them up to date. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_raft_recovery_offsets_pending)redpanda\_raft\_recovery\_offsets\_pending Sum of offsets across partitions on a broker that still need to be recovered. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_raft_recovery_partition_movement_available_bandwidth)redpanda\_raft\_recovery\_partition\_movement\_available\_bandwidth Available network bandwidth (in bytes per second) for partition movement operations. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_raft_recovery_partition_movement_consumed_bandwidth)redpanda\_raft\_recovery\_partition\_movement\_consumed\_bandwidth Network bandwidth (in bytes per second) currently being consumed for partition movement. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_raft_recovery_partitions_active)redpanda\_raft\_recovery\_partitions\_active Number of partition replicas currently undergoing recovery on a broker. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_raft_recovery_partitions_to_recover)redpanda\_raft\_recovery\_partitions\_to\_recover Total count of partition replicas that are pending recovery on a broker. **Type**: gauge **Available in Serverless**: No ## [](#topic-metrics)Topic metrics ### [](#redpanda_cluster_partition_schema_id_validation_records_failed)redpanda\_cluster\_partition\_schema\_id\_validation\_records\_failed Count of records that failed schema ID validation during ingestion. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_kafka_partitions)redpanda\_kafka\_partitions Configured number of partitions for a topic. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: Yes * * * ### [](#redpanda_kafka_records_fetched_total)redpanda\_kafka\_records\_fetched\_total Total number of records fetched from a topic. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: Yes * * * ### [](#redpanda_kafka_records_produced_total)redpanda\_kafka\_records\_produced\_total Total number of records produced to a topic. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: Yes * * * ### [](#redpanda_kafka_replicas)redpanda\_kafka\_replicas Configured number of replicas for a topic. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: Yes * * * ### [](#redpanda_security_audit_errors_total)redpanda\_security\_audit\_errors\_total Cumulative count of errors encountered when creating or publishing audit event log entries. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_security_audit_last_event_timestamp_seconds)redpanda\_security\_audit\_last\_event\_timestamp\_seconds Unix epoch timestamp of the last successful audit log event publication. **Type**: counter **Available in Serverless**: No ## [](#broker-metrics)Broker metrics ### [](#redpanda_kafka_handler_latency_seconds)redpanda\_kafka\_handler\_latency\_seconds Histogram capturing the latency for processing Kafka requests at the broker level. **Type**: histogram **Available in Serverless**: No * * * ### [](#redpanda_kafka_request_latency_seconds)redpanda\_kafka\_request\_latency\_seconds Histogram capturing the latency (in seconds) for produce/consume requests at the broker. This duration spans from request initiation to response fulfillment. **Type**: histogram **Labels**: - `redpanda_request=("produce" | "consume")` **Available in Serverless**: No * * * ### [](#redpanda_kafka_quotas_client_quota_throttle_time)redpanda\_kafka\_quotas\_client\_quota\_throttle\_time Histogram of client quota throttling delays (in seconds) per quota rule and type. **Type**: histogram **Labels**: - `quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id")` - `quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota")` **Available in Serverless**: No * * * ### [](#redpanda_kafka_quotas_client_quota_throughput)redpanda\_kafka\_quotas\_client\_quota\_throughput Histogram of client quota throughput per quota rule and type. **Type**: histogram **Labels**: - `quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id")` - `quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota")` **Available in Serverless**: No ## [](#consumer-group-metrics)Consumer group metrics ### [](#redpanda_kafka_consumer_group_committed_offset)redpanda\_kafka\_consumer\_group\_committed\_offset Committed offset for a consumer group, segmented by topic and partition. To enable this metric, you must include the `partition` option in the [`enable_consumer_group_metrics`](../properties/cluster-properties/#enable_consumer_group_metrics) cluster property. **Type**: gauge **Labels**: - `redpanda_group` - `redpanda_partition` - `redpanda_topic` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_kafka_consumer_group_consumers)redpanda\_kafka\_consumer\_group\_consumers Number of active consumers within a consumer group. To enable this metric, you must include the `group` option in the [`enable_consumer_group_metrics`](../properties/cluster-properties/#enable_consumer_group_metrics) cluster property. **Type**: gauge **Labels**: - `redpanda_group` - `shard` **Available in Serverless**: Yes * * * ### [](#redpanda_kafka_consumer_group_lag_max)redpanda\_kafka\_consumer\_group\_lag\_max Maximum consumer group lag across topic partitions. This metric is useful for identifying the most delayed partition in the consumer group. To enable this metric, you must include the `consumer_lag` option in the [`enable_consumer_group_metrics`](../properties/cluster-properties/#enable_consumer_group_metrics) cluster property. **Type**: gauge **Labels**: - `redpanda_group` **Available in Serverless**: Yes **Related topics**: - [Consumer group lag](../../manage/monitor-cloud/#consumer-group-lag) * * * ### [](#redpanda_kafka_consumer_group_lag_sum)redpanda\_kafka\_consumer\_group\_lag\_sum Sum of consumer group lag for all topic partitions. This metric is useful for tracking the total lag across all partitions. To enable this metric, you must include the `consumer_lag` option in the [`enable_consumer_group_metrics`](../properties/cluster-properties/#enable_consumer_group_metrics) cluster property. **Type**: gauge **Labels**: - `redpanda_group` **Available in Serverless**: Yes **Related topics**: - [Consumer group lag](../../manage/monitor-cloud/#consumer-group-lag) * * * ### [](#redpanda_kafka_consumer_group_topics)redpanda\_kafka\_consumer\_group\_topics Number of topics being consumed by a consumer group. To enable this metric, you must include the `group` option in the [`enable_consumer_group_metrics`](../properties/cluster-properties/#enable_consumer_group_metrics) cluster property. **Type**: gauge **Labels**: - `redpanda_group` - `shard` **Available in Serverless**: Yes ## [](#rest-proxy-metrics)REST proxy metrics ### [](#redpanda_rest_proxy_inflight_requests_memory_usage_ratio)redpanda\_rest\_proxy\_inflight\_requests\_memory\_usage\_ratio Ratio of memory used by in-flight REST proxy requests, measured per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_rest_proxy_inflight_requests_usage_ratio)redpanda\_rest\_proxy\_inflight\_requests\_usage\_ratio Usage ratio for in-flight REST proxy requests, measured per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_rest_proxy_queued_requests_memory_blocked)redpanda\_rest\_proxy\_queued\_requests\_memory\_blocked Count of REST proxy requests queued due to memory limitations, measured per shard. **Type**: gauge **Labels**: - `shard` **Available in Serverless**: No * * * ### [](#redpanda_rest_proxy_request_errors_total)redpanda\_rest\_proxy\_request\_errors\_total Cumulative count of REST proxy errors, categorized by HTTP status code. **Type**: counter **Labels**: - `redpanda_status("5xx" | "4xx" | "3xx")` **Available in Serverless**: No * * * ### [](#redpanda_rest_proxy_request_latency_seconds_bucket)redpanda\_rest\_proxy\_request\_latency\_seconds\_bucket Histogram representing the internal latency buckets for REST proxy requests. **Type**: histogram **Available in Serverless**: No ## [](#application-metrics)Application metrics ### [](#redpanda_application_build)redpanda\_application\_build Build information for Redpanda, including the revision and version details. **Type**: gauge **Labels**: - `redpanda_revision` - `redpanda_version` **Available in Serverless**: Yes * * * ### [](#redpanda_application_fips_mode)redpanda\_application\_fips\_mode Indicates whether Redpanda is running in FIPS mode. Possible values: - `0` = disabled - `1` = permissive - `2` = enabled **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_application_uptime_seconds_total)redpanda\_application\_uptime\_seconds\_total Total runtime (in seconds) of the Redpanda application. **Type**: gauge **Available in Serverless**: No ## [](#cloud-metrics)Cloud metrics ### [](#redpanda_cloud_client_backoff)redpanda\_cloud\_client\_backoff Total number of object storage requests that experienced backoff delays. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_client_pool_utilization)redpanda\_cloud\_client\_client\_pool\_utilization Utilization of the object storage pool(0 - unused, 100 - fully utilized). **Type**: gauge **Labels**: - `redpanda_endpoint` - `redpanda_region` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_download_backoff)redpanda\_cloud\_client\_download\_backoff Total number of object storage download requests that experienced backoff delays. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_downloads)redpanda\_cloud\_client\_downloads Total number of successful download requests from object storage. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_lease_duration)redpanda\_cloud\_client\_lease\_duration Histogram representing the lease duration for object storage clients. **Type**: histogram **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_not_found)redpanda\_cloud\_client\_not\_found Total number of object storage requests that resulted in a "not found" error. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_num_borrows)redpanda\_cloud\_client\_num\_borrows Count of instances where a shard borrowed a object storage client from another shard. **Type**: counter **Labels**: - `redpanda_endpoint` - `redpanda_region` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_upload_backoff)redpanda\_cloud\_client\_upload\_backoff Total number of object storage upload requests that experienced backoff delays. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ### [](#redpanda_cloud_client_uploads)redpanda\_cloud\_client\_uploads Total number of successful upload requests to object storage. **Type**: counter **Labels**: - For S3 and GCP: - `redpanda_endpoint` - `redpanda_region` - For Azure Blob Storage (ABS): - `redpanda_endpoint` - `redpanda_storage_account` **Available in Serverless**: No * * * ## [](#tls_metrics)TLS metrics ### [](#redpanda_tls_certificate_expires_at_timestamp_seconds)redpanda\_tls\_certificate\_expires\_at\_timestamp\_seconds Unix epoch timestamp for the expiration of the shortest-lived installed TLS certificate. **Type**: gauge **Labels**: - `area` - `detail` **Usage**: Useful for proactive certificate renewal by indicating the next certificate set to expire. **Available in Serverless**: No * * * ### [](#redpanda_tls_certificate_serial)redpanda\_tls\_certificate\_serial The least significant 4 bytes of the serial number for the certificate that will expire next. **Type**: gauge **Labels**: - `area` - `detail` **Usage**: Provides a quick reference to identify the certificate in question. **Available in Serverless**: No * * * ### [](#redpanda_tls_certificate_valid)redpanda\_tls\_certificate\_valid Indicator of whether a resource has at least one valid TLS certificate installed. Returns `1` if a valid certificate is present and `0` if not. **Type**: gauge **Labels**: - `area` - `detail` **Usage**: Aids in continuous monitoring of certificate validity across resources. **Available in Serverless**: No * * * ### [](#redpanda_tls_loaded_at_timestamp_seconds)redpanda\_tls\_loaded\_at\_timestamp\_seconds Unix epoch timestamp marking the last time a TLS certificate was loaded for a resource. **Type**: gauge **Labels**: - `area` - `detail` **Usage**: Indicates recent certificate updates across resources. **Available in Serverless**: No * * * ### [](#redpanda_tls_truststore_expires_at_timestamp_seconds)redpanda\_tls\_truststore\_expires\_at\_timestamp\_seconds Unix epoch timestamp representing the expiration time of the shortest-lived certificate authority (CA) in the installed truststore. **Type**: gauge **Labels**: - `area` - `detail` **Usage**: Helps identify when any CA in the chain is nearing expiration. **Available in Serverless**: No * * * ### [](#redpanda_trust_file_crc32c)redpanda\_trust\_file\_crc32c CRC32C checksum calculated from the contents of the trust file. This value is calculated when a valid certificate is loaded and a trust store is present. Otherwise, the value is zero. **Type**: gauge **Labels**: - `area` - `detail` - `shard` **Available in Serverless**: No * * * ### [](#redpanda_truststore_expires_at_timestamp_seconds)redpanda\_truststore\_expires\_at\_timestamp\_seconds Expiry time of the shortest-lived CA in the truststore, measured in seconds since epoch. **Type**: gauge **Labels**: - `area` - `detail` - `shard` **Available in Serverless**: No * * * ## [](#data_transform_metrics)Data transforms metrics ### [](#redpanda_transform_execution_errors)redpanda\_transform\_execution\_errors Counter for the number of errors encountered during the invocation of data transforms. **Type**: counter **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_transform_execution_latency_sec)redpanda\_transform\_execution\_latency\_sec Histogram tracking the execution latency (in seconds) for processing a single record using data transforms. **Type**: histogram **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_transform_failures)redpanda\_transform\_failures Counter for each failure encountered by a data transform processor. **Type**: counter **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_transform_processor_lag)redpanda\_transform\_processor\_lag Number of records pending processing in the input topic for a data transform. **Type**: gauge **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_transform_read_bytes)redpanda\_transform\_read\_bytes Cumulative count of bytes read as input to data transforms. **Type**: counter **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_transform_state)redpanda\_transform\_state Current count of transform processors in a specific state (running, inactive, or errored). **Type**: gauge **Labels**: - `function_name` - `state=("running" | "inactive" | "errored")` **Available in Serverless**: No * * * ### [](#redpanda_transform_write_bytes)redpanda\_transform\_write\_bytes Cumulative count of bytes output from data transforms. **Type**: counter **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_wasm_binary_executable_memory_usage)redpanda\_wasm\_binary\_executable\_memory\_usage Number of bytes (memory) used by executable WebAssembly binaries. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_wasm_engine_cpu_seconds_total)redpanda\_wasm\_engine\_cpu\_seconds\_total Total CPU time (in seconds) consumed by WebAssembly functions. **Type**: counter **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_wasm_engine_max_memory)redpanda\_wasm\_engine\_max\_memory Maximum allowed memory (in bytes) allocated for a WebAssembly function. **Type**: gauge **Labels**: - `function_name` **Available in Serverless**: No * * * ### [](#redpanda_wasm_engine_memory_usage)redpanda\_wasm\_engine\_memory\_usage Current memory usage (in bytes) by a WebAssembly function. **Type**: gauge **Labels**: - `function_name` **Available in Serverless**: No ## [](#object-storage-metrics)Object storage metrics ### [](#redpanda_cloud_storage_active_segments)redpanda\_cloud\_storage\_active\_segments Number of remote log segments that are currently hydrated and available for read operations. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_anomalies)redpanda\_cloud\_storage\_anomalies Count of missing partition manifest anomalies detected for the topic. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_op_hit)redpanda\_cloud\_storage\_cache\_op\_hit Total number of successful get requests that found the requested object in the cache. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_op_in_progress_files)redpanda\_cloud\_storage\_cache\_op\_in\_progress\_files Number of files currently being written to the cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_op_miss)redpanda\_cloud\_storage\_cache\_op\_miss Total count of get requests that did not find the requested object in the cache. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_op_put)redpanda\_cloud\_storage\_cache\_op\_put Total number of objects successfully written into the cache. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_files)redpanda\_cloud\_storage\_cache\_space\_files Current number of objects stored in the cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_hwm_files)redpanda\_cloud\_storage\_cache\_space\_hwm\_files High watermark for the number of objects stored in the cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_hwm_size_bytes)redpanda\_cloud\_storage\_cache\_space\_hwm\_size\_bytes High watermark for the total size (in bytes) of cached objects. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_size_bytes)redpanda\_cloud\_storage\_cache\_space\_size\_bytes Total size (in bytes) of objects currently stored in the cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_tracker_size)redpanda\_cloud\_storage\_cache\_space\_tracker\_size Current count of entries in the cache access tracker. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_space_tracker_syncs)redpanda\_cloud\_storage\_cache\_space\_tracker\_syncs Total number of times the cache access tracker was synchronized with disk data. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_trim_carryover_trims)redpanda\_cloud\_storage\_cache\_trim\_carryover\_trims Count of times the cache trim operation was invoked using a carryover strategy. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_trim_exhaustive_trims)redpanda\_cloud\_storage\_cache\_trim\_exhaustive\_trims Count of instances where a fast cache trim was insufficient and an exhaustive trim was required. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_trim_failed_trims)redpanda\_cloud\_storage\_cache\_trim\_failed\_trims Count of cache trim operations that failed to free the expected amount of space, possibly indicating a bug or misconfiguration. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_trim_fast_trims)redpanda\_cloud\_storage\_cache\_trim\_fast\_trims Count of successful fast cache trim operations. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cache_trim_in_mem_trims)redpanda\_cloud\_storage\_cache\_trim\_in\_mem\_trims Count of cache trim operations performed using the in-memory access tracker. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_cloud_log_size)redpanda\_cloud\_storage\_cloud\_log\_size Total size (in bytes) of user-visible log data stored in Tiered Storage. This value increases with every segment offload and decreases when segments are deleted due to retention or compaction. **Type**: gauge **Usage**: Segmented by `redpanda_namespace` (e.g., `kafka`, `kafka_internal`, or `redpanda`), `redpanda_topic`, and `redpanda_partition`. **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_deleted_segments)redpanda\_cloud\_storage\_deleted\_segments Count of log segments that have been deleted from object storage due to retention policies or compaction processes. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_errors_total)redpanda\_cloud\_storage\_errors\_total Cumulative count of errors encountered during object storage operations, segmented by direction. **Type**: counter **Labels**: - `redpanda_direction` **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_drains)redpanda\_cloud\_storage\_housekeeping\_drains Count of times the object storage upload housekeeping queue was fully drained. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_jobs_completed)redpanda\_cloud\_storage\_housekeeping\_jobs\_completed Total number of successfully executed object storage housekeeping jobs. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_jobs_failed)redpanda\_cloud\_storage\_housekeeping\_jobs\_failed Total number of object storage housekeeping jobs that failed. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_jobs_skipped)redpanda\_cloud\_storage\_housekeeping\_jobs\_skipped Count of object storage housekeeping jobs that were skipped during execution. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_pauses)redpanda\_cloud\_storage\_housekeeping\_pauses Count of times object storage upload housekeeping was paused. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_requests_throttled_average_rate)redpanda\_cloud\_storage\_housekeeping\_requests\_throttled\_average\_rate Average rate (per shard) of requests that were throttled during object storage operations. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_resumes)redpanda\_cloud\_storage\_housekeeping\_resumes Count of instances when object storage upload housekeeping resumed after a pause. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_housekeeping_rounds)redpanda\_cloud\_storage\_housekeeping\_rounds Total number of rounds executed by the object storage upload housekeeping process. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_jobs_cloud_segment_reuploads)redpanda\_cloud\_storage\_jobs\_cloud\_segment\_reuploads Count of log segments reuploaded from object storage sources (either from the cache or via direct download). **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_jobs_local_segment_reuploads)redpanda\_cloud\_storage\_jobs\_local\_segment\_reuploads Count of log segments reuploaded from the local data directory. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_jobs_manifest_reuploads)redpanda\_cloud\_storage\_jobs\_manifest\_reuploads Total number of partition manifest reuploads performed by housekeeping jobs. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_jobs_metadata_syncs)redpanda\_cloud\_storage\_jobs\_metadata\_syncs Total number of archival configuration updates (metadata synchronizations) executed by housekeeping jobs. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_jobs_segment_deletions)redpanda\_cloud\_storage\_jobs\_segment\_deletions Total count of log segments deleted by housekeeping jobs. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_limits_downloads_throttled_sum)redpanda\_cloud\_storage\_limits\_downloads\_throttled\_sum Total cumulative time (in milliseconds) during which downloads were throttled. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_partition_manifest_uploads_total)redpanda\_cloud\_storage\_partition\_manifest\_uploads\_total Total number of successful partition manifest uploads to object storage. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_partition_readers)redpanda\_cloud\_storage\_partition\_readers Number of active partition reader instances (fetch/timequery operations) reading from Tiered Storage. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_partition_readers_delayed)redpanda\_cloud\_storage\_partition\_readers\_delayed Count of partition read operations delayed due to reaching the reader limit, suggesting potential saturation of Tiered Storage reads. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_paused_archivers)redpanda\_cloud\_storage\_paused\_archivers Number of paused archivers. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_readers)redpanda\_cloud\_storage\_readers Total number of segment read cursors for hydrated remote log segments. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segment_index_uploads_total)redpanda\_cloud\_storage\_segment\_index\_uploads\_total Total number of successful segment index uploads to object storage. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segment_materializations_delayed)redpanda\_cloud\_storage\_segment\_materializations\_delayed Count of segment materialization operations that were delayed because of reader limit constraints. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segment_readers_delayed)redpanda\_cloud\_storage\_segment\_readers\_delayed Count of segment reader operations delayed due to reaching the reader limit. This indicates a cluster is saturated with Tiered Storage reads. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segment_uploads_total)redpanda\_cloud\_storage\_segment\_uploads\_total Total number of successful data segment uploads to object storage. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segments)redpanda\_cloud\_storage\_segments Total number of log segments accounted for in object storage for the topic. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_segments_pending_deletion)redpanda\_cloud\_storage\_segments\_pending\_deletion Total number of log segments pending deletion from object storage for the topic. **Type**: gauge **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_spillover_manifest_uploads_total)redpanda\_cloud\_storage\_spillover\_manifest\_uploads\_total Total number of successful spillover manifest uploads to object storage. **Type**: counter **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_spillover_manifests_materialized_bytes)redpanda\_cloud\_storage\_spillover\_manifests\_materialized\_bytes Total bytes of memory used by spilled manifests that are currently cached in memory. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_spillover_manifests_materialized_count)redpanda\_cloud\_storage\_spillover\_manifests\_materialized\_count Count of spilled manifests currently held in memory cache. **Type**: gauge **Available in Serverless**: No * * * ### [](#redpanda_cloud_storage_uploaded_bytes)redpanda\_cloud\_storage\_uploaded\_bytes Total number of bytes uploaded for the topic to object storage. **Type**: counter **Labels**: - `redpanda_namespace` - `redpanda_topic` **Available in Serverless**: No * * * ## [](#shadow-link-metrics)Shadow link metrics ### [](#redpanda_shadow_link_shadow_lag)redpanda\_shadow\_link\_shadow\_lag The lag of the shadow partition against the source partition, calculated as source partition last stable offset (LSO) minus shadow partition high watermark (HWM). Monitor this metric to understand replication lag for each partition and ensure your recovery point objective (RPO) requirements are being met. **Type**: gauge **Labels**: - `shadow_link_name` - Name of the shadow link - `topic` - Topic name - `partition` - Partition identifier * * * ### [](#redpanda_shadow_link_shadow_topic_state)redpanda\_shadow\_link\_shadow\_topic\_state Number of shadow topics in the respective states. Monitor this metric to track the health and status distribution of shadow topics across your shadow links. **Type**: gauge **Labels**: - `shadow_link_name` - Name of the shadow link - `state` - Topic state (active, failed, paused, failing\_over, failed\_over, promoting, promoted) * * * ### [](#redpanda_shadow_link_client_errors)redpanda\_shadow\_link\_client\_errors Total number of errors encountered by the Kafka client during shadow link operations. Monitor this metric to identify connection issues, authentication failures, or other client-side problems affecting shadow link replication. **Type**: counter **Labels**: - `shadow_link_name` - Name of the shadow link * * * ### [](#redpanda_shadow_link_total_bytes_fetched)redpanda\_shadow\_link\_total\_bytes\_fetched Total number of bytes fetched by a sharded replicator (bytes received by the client). Use this metric to track data transfer volume from the source cluster. **Type**: counter **Labels**: - `shadow_link_name` - Name of the shadow link - `shard` - Shard identifier * * * ### [](#redpanda_shadow_link_total_bytes_written)redpanda\_shadow\_link\_total\_bytes\_written Total number of bytes written by a sharded replicator (bytes written to the write\_at\_offset\_stm). Use this metric to monitor data written to the shadow cluster. **Type**: counter **Labels**: - `shadow_link_name` - Name of the shadow link - `shard` - Shard identifier * * * ### [](#redpanda_shadow_link_total_records_fetched)redpanda\_shadow\_link\_total\_records\_fetched Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster. **Type**: counter **Labels**: - `shadow_link_name` - Name of the shadow link - `shard` - Shard identifier * * * ### [](#redpanda_shadow_link_total_records_written)redpanda\_shadow\_link\_total\_records\_written Total number of records written by a sharded replicator (records written to the write\_at\_offset\_stm). Use this metric to monitor message throughput to the shadow cluster. **Type**: counter **Labels**: - `shadow_link_name` - Name of the shadow link - `shard` - Shard identifier ## [](#related-topics)Related topics - [Learn how to monitor Redpanda](../../manage/monitor-cloud/) --- # Page 500: rpk Commands **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk.md --- # rpk Commands --- title: rpk Commands latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/index.adoc description: Index page of Redpanda Cloud rpk commands in alphabetical order. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-07-25" --- This page contains an alphabetized list of `rpk` commands. Each command includes a table of flags and their descriptions. You can also get descriptions for each flag by running `rpk --help` in your locally-installed Redpanda, and you can get descriptions of all rpk-specific options by running `rpk -X help`. > 📝 **NOTE** > > All `rpk` commands feature autocompletion. To use the feature, press tab. See [`rpk generate shell-completion`](rpk-generate/rpk-generate-shell-completion/). - [rpk](rpk-commands/) - [rpk -X](rpk-x-options/) - [rpk cloud](rpk-cloud/rpk-cloud/) - [rpk cluster](rpk-cluster/rpk-cluster/) - [rpk generate](rpk-generate/rpk-generate/) - [rpk group](rpk-group/rpk-group/) - [rpk help](rpk-help/) - [rpk plugin](rpk-plugin/rpk-plugin/) - [rpk profile](rpk-profile/rpk-profile/) - [rpk registry](rpk-registry/rpk-registry/) - [rpk security](rpk-security/rpk-security/) - [rpk shadow](rpk-shadow/rpk-shadow/) - [rpk topic](rpk-topic/rpk-topic/) - [rpk transform](rpk-transform/rpk-transform/) - [rpk version](rpk-version/) --- # Page 501: rpk cloud auth delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-auth-delete.md --- # rpk cloud auth delete --- title: rpk cloud auth delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-auth-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-auth-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-auth-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete an `rpk` cloud authentication (auth). Deleting a cloud authentication removes it from the `rpk.yaml` file. If the deleted authentication was the current authentication, `rpk` will use a default SSO authentication the next time you try to login, and if the login is successful, it will save the authentication. If you delete an authentication that is used by profiles, affected profiles have their authentication cleared and you will only be able to access the profile’s cluster using SASL credentials. ## [](#usage)Usage ```bash rpk cloud auth delete [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 502: rpk cloud auth list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-auth-list.md --- # rpk cloud auth list --- title: rpk cloud auth list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-auth-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-auth-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-auth-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List `rpk` cloud authentications (auths). ## [](#usage)Usage ```bash rpk cloud auth list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 503: rpk cloud auth use **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-auth-use.md --- # rpk cloud auth use --- title: rpk cloud auth use latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-auth-use page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-auth-use.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-auth-use.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Select the `rpk` cloud authentication (auth) to use. This swaps the current cloud authentication to the specified cloud authentication. If your current profile is a cloud profile, this unsets the current profile (because the authorization is now different). If your current profile is for a Redpanda Self-Managed cluster, the profile is kept. ## [](#usage)Usage ```bash rpk cloud auth use [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for use. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 504: rpk cloud auth **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-auth.md --- # rpk cloud auth --- title: rpk cloud auth latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-auth page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-auth.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-auth.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage `rpk` cloud authentications (auths). An `rpk` cloud authentication allows you to talk to Redpanda Cloud. Most likely, you will only ever need to use a single SSO based login and you will not need this command space. Multiple authentications can be useful if you have multiple Redpanda Cloud accounts for different organizations and you want to swap between them, or if you use both SSO and client credentials. Redpanda Data recommends using only a single SSO based login. ## [](#usage)Usage ```bash rpk cloud auth [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for auth. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 505: rpk cloud byoc install **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-byoc-install.md --- # rpk cloud byoc install --- title: rpk cloud byoc install latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-byoc-install page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-byoc-install.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-byoc-install.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Install the BYOC plugin. Redpanda installs an agent service in your BYOC cluster. The agent then provisions infrastructure and, eventually, a full Redpanda cluster. The command downloads the `byoc` plugin from Redpanda Cloud. The BYOC command runs Terraform to create and start the agent. You first need a `redpanda-id` (or cluster ID); this is used to get the details of how your agent should be provisioned. > 📝 **NOTE** > > To create a BYOC cluster, use the [Cloud API](../../../../manage/api/cloud-byoc-controlplane-api/#create-a-new-cluster) or the Redpanda Cloud UI. The UI contains the parameters necessary to run `rpk cloud byoc apply` with your cloud provider. This command downloads the BYOC managed plugin, if necessary. The plugin is installed by default if you run a non-install command. This command exists if you want to download the plugin ahead of time. To define your `client_id` and `client_secret` use the `-X` flag. ## [](#example)Example ```bash rpk cloud byoc install -X cloud.client_id= -X cloud.client_secret= ``` ## [](#usage)Usage ```bash rpk cloud byoc install [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for install. | | --redpanda-id | string | The redpanda ID of the cluster you are creating. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 506: rpk cloud byoc uninstall **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-byoc-uninstall.md --- # rpk cloud byoc uninstall --- title: rpk cloud byoc uninstall latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-byoc-uninstall page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-byoc-uninstall.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-byoc-uninstall.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Uninstall the BYOC plugin. Redpanda installs an agent service in your BYOC cluster. The agent then provisions infrastructure and, eventually, a full Redpanda cluster. The command downloads the `byoc` plugin from Redpanda Cloud. The BYOC command runs Terraform to create and start the agent. You first need a `redpanda-id` (or cluster ID); this is used to get the details of how your agent should be provisioned. > 📝 **NOTE** > > To create a BYOC cluster, use the [Cloud API](../../../../manage/api/cloud-byoc-controlplane-api/#create-a-new-cluster) or the Redpanda Cloud UI. The UI contains the parameters necessary to run `rpk cloud byoc apply` with your cloud provider. This command deletes your locally-downloaded BYOC managed plugin, if it exists. You generally only need to download the plugin one time to create your cluster, and then you never need the plugin again. You can uninstall it to save a small bit of disk space. ## [](#usage)Usage ```bash rpk cloud byoc uninstall [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for uninstall. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 507: rpk cloud byoc **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-byoc.md --- # rpk cloud byoc --- title: rpk cloud byoc latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-byoc page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-byoc.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-byoc.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage a Redpanda Cloud BYOC agent. Redpanda installs an agent service in your BYOC cluster. The agent then provisions infrastructure and, eventually, a full Redpanda cluster. The command downloads the `byoc` plugin from Redpanda Cloud. The BYOC command runs Terraform to create and start the agent. You first need a `redpanda-id` (or cluster ID); this is used to get the details of how your agent should be provisioned. > 📝 **NOTE** > > To create a BYOC cluster, use the [Cloud API](../../../../manage/api/cloud-byoc-controlplane-api/#create-a-new-cluster) or the Redpanda Cloud UI. The UI contains the parameters necessary to run `rpk cloud byoc apply` with your cloud provider. ## [](#usage)Usage ```bash rpk cloud byoc [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --client-id | string | The client ID of the organization in Redpanda Cloud. | | --client-secret | string | The client secret of the organization in Redpanda Cloud. | | -h, --help | - | Help for byoc. | | --redpanda-id | string | The redpanda ID of the cluster you are creating. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 508: rpk cloud cluster select **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-cluster-select.md --- # rpk cloud cluster select --- title: rpk cloud cluster select latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-cluster-select page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-cluster-select.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-cluster-select.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Update your rpk profile to communicate with the requested cluster. This command is essentially an alias for the following command: ```bash rpk profile create --from-cloud=${NAME} ``` If you want to name this profile rather than creating or updating values in the default cloud-dedicated profile, you can use the `--profile` flag. For Serverless clusters that support both public and private networking, you are prompted to select a network type unless you specify `--serverless-network`. To avoid prompts in automation, explicitly set `--serverless-network` to `public` or `private`. ## [](#usage)Usage ```bash rpk cloud cluster select [NAME] [flags] ``` ## [](#aliases)Aliases ```bash select, use ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for select. | | --profile | string | Name of a profile to create or update (avoids updating "rpk-cloud"). | | --serverless-network | string | Networking type for Serverless clusters: public or private (if not specified, will prompt if both are available). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings; '-X help' for detail or '-X list' for terser detail. | | -v, --verbose | - | Enable verbose logging. | --- # Page 509: rpk cloud cluster **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-cluster.md --- # rpk cloud cluster --- title: rpk cloud cluster latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-cluster page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-cluster.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-cluster.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage rpk cloud clusters. This command allows you to manage cloud clusters, and to easily switch between the clusters you are communicating with. ## [](#usage)Usage ```bash rpk cloud cluster [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for cluster. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings; '-X help' for detail or '-X list' for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 510: rpk cloud login **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-login.md --- # rpk cloud login --- title: rpk cloud login latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-login page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-login.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-login.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Log in to Redpanda Cloud. This command checks for an existing Redpanda Cloud API token and, if present, ensures it is still valid. If no token is found or the token is no longer valid, this command will login and save your token along with the client ID used to request the token. ## [](#login-credentials)Login credentials You may use either SSO or client credentials to log in. ### [](#sso)SSO This will automatically launch your default web browser and prompt you to authenticate via our Redpanda Cloud page. Once you have successfully authenticated, you will be ready to use `rpk cloud` commands. ### [](#client-credentials)Client credentials Cloud client credentials can be used to login to Redpanda, they can be created in the Clients tab of the Users section in the Redpanda Cloud online interface. client credentials can be provided in three ways, in order of preference: - In your `rpk cloud auth`, `client_id` and `client_secret` fields - Through `RPK_CLOUD_CLIENT_ID` and `RPK_CLOUD_CLIENT_SECRET` environment variables - Through the `--client-id` and `--client-secret` flags If none of these are provided, `rpk` will use the SSO method to login. If you specify environment variables or flags, they will not be synced to the `rpk.yaml` file unless the `--save` flag is passed. The cloud authorization token and client ID is always synced. ## [](#usage)Usage ```bash rpk cloud login [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --client-id | string | The client ID of the organization in Redpanda Cloud. | | --client-secret | string | The client secret of the organization in Redpanda Cloud. | | -h, --help | - | Help for login. | | --no-profile | - | Skip automatic profile creation and any associated prompts. | | --save | - | Save environment or flag specified client ID and client secret to the configuration file. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 511: rpk cloud logout **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-logout.md --- # rpk cloud logout --- title: rpk cloud logout latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-logout page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-logout.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-logout.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Log out from Redpanda cloud. This command deletes your cloud authentication token. If you want to log out entirely and switch to a different organization, you can use the `--clear-credentials` flag to additionally clear your client ID and client secret. You can use the --all flag to log out of all organizations you may be logged into. ## [](#usage)Usage ```bash rpk cloud logout [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -a, --all | - | Log out of all organizations you may be logged into, rather than just the current authentication’s organization. | | -c, --clear-credentials | - | Clear the client ID and client secret in addition to the authentication token. | | -h, --help | - | Help for logout. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 512: rpk cloud mcp install **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-mcp-install.md --- # rpk cloud mcp install --- title: rpk cloud mcp install latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-mcp-install page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-mcp-install.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-mcp-install.adoc description: Install the Redpanda Cloud Management MCP Server. page-git-created-date: "2025-09-08" page-git-modified-date: "2026-01-14" --- Install the MCP client configuration to connect your AI assistant to the local MCP server for Redpanda Cloud. This command generates and installs the necessary configuration files for your MCP client (like Claude Code) to automatically connect to the local MCP server for Redpanda Cloud. The local MCP server provides your AI assistant with tools to manage your Redpanda Cloud account and clusters. Supports Claude Desktop and Claude Code. ## [](#usage)Usage ```bash rpk cloud mcp install [flags] ``` ## [](#examples)Examples Install configuration for Claude Code: ```bash rpk cloud mcp install --client claude-code ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-delete | - | Allow delete operations (RPCs). Off by default. | | --client | string | Name of the MCP client to configure. Supported values: claude or claude-code. | | -h, --help | - | Help for install. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#suggested-reading)Suggested reading - [Redpanda Cloud Management MCP Server Quickstart](../../../../ai-agents/mcp/local/quickstart/) - [rpk cloud mcp stdio](../rpk-cloud-mcp-stdio/) --- # Page 513: rpk cloud mcp proxy **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-mcp-proxy.md --- # rpk cloud mcp proxy --- title: rpk cloud mcp proxy latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-mcp-proxy page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-mcp-proxy.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-mcp-proxy.adoc description: Proxy MCP requests to remote MCP servers in Redpanda Cloud. page-git-created-date: "2025-10-21" page-git-modified-date: "2025-10-21" --- **Introduced in version 25.2.3**. Proxy MCP requests from your local AI client to a remote MCP server running in your Redpanda Cloud cluster. This command acts as a bridge between your AI assistant (like Claude) and your remote MCP server. It handles connection management, and request proxying so your AI client can use tools hosted in your cluster. ## [](#modes-of-operation)Modes of operation **Install mode** (recommended): Generates and installs MCP client configuration files that tell your AI client how to connect. This is a one-time setup operation. **Proxy mode** (default): Serves stdio and proxies requests in real-time. Your AI client connects to this command’s stdio interface, and requests are forwarded to the remote MCP server. ## [](#usage)Usage ```bash rpk cloud mcp proxy [flags] ``` ## [](#examples)Examples Proxy requests to a specific MCP server: ```bash rpk cloud mcp proxy --cluster-id --mcp-server-id ``` Install Claude Code configuration for connecting to your MCP server in Redpanda Cloud BYOC or Dedicated: ```bash rpk cloud mcp proxy --install --client claude-code --cluster-id --mcp-server-id ``` Install Claude Code configuration for connecting to your MCP server in Redpanda Cloud Serverless: ```bash rpk cloud mcp proxy --install --client claude-code --serverless-cluster-id --mcp-server-id ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --client | string | Name of the MCP client to configure. Required when using --install. Supported values: claude and claude-code. | | --cluster-id | string | Cluster ID where your Remote MCP server is running. Find this in the Redpanda Cloud Console. | | --serverless-cluster-id | string | Serverless cluster ID where your Remote MCP server is running. Find this in the Redpanda Cloud Console. | | -h, --help | - | Help for proxy. | | --install | - | Install MCP client configuration instead of serving stdio. Use this for one-time setup. | | --mcp-server-id | string | ID of the Remote MCP server to connect to. Find this in your cluster’s Remote MCP page. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#see-also)See also - [Remote MCP Server Quickstart](../../../../ai-agents/mcp/remote/quickstart/) - [rpk cloud mcp install](../rpk-cloud-mcp-install/) --- # Page 514: rpk cloud mcp stdio **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-mcp-stdio.md --- # rpk cloud mcp stdio --- title: rpk cloud mcp stdio latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-mcp-stdio page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-mcp-stdio.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-mcp-stdio.adoc description: Communicate with the Redpanda Cloud Management MCP Server using the stdio protocol. page-git-created-date: "2025-09-08" page-git-modified-date: "2026-01-14" --- Communicate with the local MCP server for Redpanda Cloud using the stdio protocol. This command provides a direct stdio interface for communicating with the local MCP server for Redpanda Cloud. The local MCP server runs on your machine and provides tools for managing your Redpanda Cloud account and clusters. It’s typically used as the transport mechanism when your MCP client is configured to use `rpk` as the stdio server process. Most users should use [`rpk cloud mcp install`](../rpk-cloud-mcp-install/) instead, which automatically configures your MCP client. ## [](#usage)Usage ```bash rpk cloud mcp stdio [flags] ``` ## [](#examples)Examples Start the local MCP server for Redpanda Cloud using the stdio protocol: ```bash rpk cloud mcp stdio ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-delete | - | Allow delete operations (RPCs). Off by default. | | -h, --help | - | Help for stdio. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#suggested-reading)Suggested reading - [rpk cloud mcp install](../rpk-cloud-mcp-install/) - [Redpanda Cloud Management MCP Server](../../../../ai-agents/mcp/local/overview/) --- # Page 515: rpk cloud mcp **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud-mcp.md --- # rpk cloud mcp --- title: rpk cloud mcp latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud-mcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud-mcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud-mcp.adoc description: Manage connections to MCP servers in Redpanda Cloud. page-git-created-date: "2025-10-21" page-git-modified-date: "2025-10-21" --- Manage connections to MCP servers in Redpanda Cloud. This includes both the local MCP server for Redpanda Cloud and remote MCP servers (managed). These commands help you connect AI assistants like Claude to different types of MCP servers: - **Local MCP server for Redpanda Cloud**: Runs on your local machine and provides access to your Redpanda Cloud account and clusters - **Remote MCP servers**: Custom servers you build and deploy that run inside your Redpanda Cloud clusters ## [](#usage)Usage ```bash rpk cloud mcp [flags] rpk cloud mcp [command] ``` ## [](#subcommands)Subcommands | Command | Description | | --- | --- | | install | Install the local MCP server for Redpanda Cloud configuration. | | proxy | Proxy requests to Remote MCP servers. | | stdio | Communicate with the local MCP server for Redpanda Cloud using the stdio protocol. | ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for mcp. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#suggested-reading)Suggested reading - [Redpanda Cloud Management MCP Server Quickstart](../../../../ai-agents/mcp/local/quickstart/) - [Remote MCP Server Quickstart](../../../../ai-agents/mcp/remote/quickstart/) - [MCP Servers for Redpanda Cloud Overview](../../../../ai-agents/mcp/overview/) --- # Page 516: rpk cloud **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cloud/rpk-cloud.md --- # rpk cloud --- title: rpk cloud latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cloud/rpk-cloud page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cloud/rpk-cloud.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cloud/rpk-cloud.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Interact with Redpanda Cloud. ## [](#usage)Usage ```bash rpk cloud [flags] [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for cloud. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 517: rpk cluster config get **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-config-get.md --- # rpk cluster config get --- title: rpk cluster config get latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-config-get page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-config-get.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-config-get.adoc page-git-created-date: "2025-06-13" page-git-modified-date: "2025-06-13" --- Get a cluster configuration property. ## [](#usage)Usage ```bash rpk cluster config get [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for get. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 518: rpk cluster config list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-config-list.md --- # rpk cluster config list --- title: rpk cluster config list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-config-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-config-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-config-list.adoc page-git-created-date: "2025-08-01" page-git-modified-date: "2025-08-01" --- This command lists all available cluster configuration properties. Use [`rpk cluster config get`](../rpk-cluster-config-get/) to retrieve specific property values. Use the `--filter` flag with a regular expression to filter configuration keys. This is useful for exploring related configuration properties or finding specific settings. ## [](#usage)Usage ```bash rpk cluster config list [flags] ``` ## [](#examples)Examples List all cluster configuration properties: ```bash rpk cluster config list ``` List configuration properties matching a filter: ```bash rpk cluster config list --filter="kafka.*" ``` Filter properties containing "log": ```bash rpk cluster config list --filter=".*log.*" ``` Filter with case-insensitive matching: ```bash rpk cluster config list --filter="(?i)batch.*" ``` List configuration properties in JSON format: ```bash rpk cluster config list --format=json ``` List configuration properties in YAML format: ```bash rpk cluster config list --format=yaml ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --filter | string | Filter configuration keys using regular expression. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 519: rpk cluster config set **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-config-set.md --- # rpk cluster config set --- title: rpk cluster config set latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-config-set page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-config-set.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-config-set.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Set a cluster configuration property. You can set a single property or multiple properties at once, for example: ```bash rpk cluster config set audit_enabled true ``` ```bash rpk cluster config set iceberg_enabled=true iceberg_catalog_type=rest ``` You must use `=` notation to set multiple properties. The output returns an operation ID. Use the [`status`](../rpk-cluster-config-status/) command to check the progress of the configuration change. For a list of available properties, see [Cluster Configuration Properties](../../../properties/cluster-properties/). ## [](#usage)Usage ```bash rpk cluster config set [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for set. | | --no-confirm | - | Disable confirmation prompt. | | --timeout | duration | Maximum time to poll for operation completion before displaying operation ID for manual status checking (for example 300ms, 1.5s, 30s). Default 10s. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | > 📝 **NOTE** > > Setting properties to non-number values (such as setting string values with `-`) can be problematic for some terminals due to how POSIX flags are parsed. For example, the following command may not work from some terminals: > > ```none > rpk cluster config set log_retention_ms -1 > ``` > > Workaround: Use `--` to disable parsing for all subsequent characters. For example: > > ```none > rpk cluster config set -- log_retention_ms -1 > ``` --- # Page 520: rpk cluster config status **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-config-status.md --- # rpk cluster config status --- title: rpk cluster config status latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-config-status page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-config-status.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-config-status.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Check the progress of a cluster configuration change. Some cluster properties require a rolling restart when updated, and it can take several minutes for the update to complete. This command lists the long-running operations run by the update and their status: - In progress (running) - Completed - Failed ```bash OPERATION-ID STATUS STARTED COMPLETED d0ec1obmpnr7lv17bfpg RUNNING 2025-05-08 14:34:09 d0ec0sor49uba166af3g RUNNING 2025-05-08 14:32:20 ``` ## [](#usage)Usage ```bash rpk cluster config status [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for status. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 521: rpk cluster config **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-config.md --- # rpk cluster config --- title: rpk cluster config latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-config page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-config.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-config.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Interact with cluster configuration properties. Cluster properties are Redpanda settings that apply to all brokers in the cluster. Modified properties are propagated immediately to all brokers. Use the `status` subcommand to verify that all brokers are up to date and identify any settings which were rejected by a broker; for example, if the broker is running a different Redpanda version that does not recognize certain properties. ## [](#usage)Usage ```bash rpk cluster config [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for config. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 522: rpk cluster connections list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-connections-list.md --- # rpk cluster connections list --- title: rpk cluster connections list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-connections-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-connections-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-connections-list.adoc page-git-created-date: "2025-11-19" page-git-modified-date: "2025-11-19" --- Display statistics about current Kafka connections. This command displays a table of active and recently closed connections within the cluster. Use filtering and sorting to identify the connections of the client applications that you are interested in. See `--help` for the list of filtering arguments and sorting arguments. In addition to filtering shorthand CLI arguments (For example, `--client-id`, `--state`), you can also use the `--filter-raw` and `--order-by` arguments that take string expressions. To understand the syntax of these arguments, refer to the Admin API docs of the filter and order-by fields of the [`GET /v1/monitoring/kafka/connections`](/api/doc/cloud-dataplane/operation/operation-monitoringservice_listkafkaconnections) Data Plane API endpoint. By default only a subset of the per-connection data is printed. To see all of the available data, use `--format=json`. ## [](#usage)Usage ```bash rpk cluster connections list [flags] ``` ## [](#examples)Examples List connections ordered by their recent produce throughput: ```bash rpk cluster connections list --order-by="recent_request_statistics.produce_bytes desc" ``` List connections ordered by their recent fetch throughput: ```bash rpk cluster connections list --order-by="recent_request_statistics.fetch_bytes desc" ``` List connections ordered by the time that they’ve been idle: ```bash rpk cluster connections list --order-by="idle_duration desc" ``` List connections ordered by those that have made the least requests: ```bash rpk cluster connections list --order-by="total_request_statistics.request_count asc" ``` List extended output for open connections in JSON format: ```bash rpk cluster connections list --format=json --state="OPEN" ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --client-id | string | Filter results by the client ID. | | --client-software-name | string | Filter results by the client software name. | | --client-software-version | string | Filter results by the client software version. | | --filter-raw | string | Filter connections based on a raw query (overrides other filters). | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -g, --group-id | string | Filter by client group ID. | | -h, --help | - | Help for connections list. | | -i, --idle-ms | int | Show connections idle for more than i milliseconds. | | --ip-address | string | Filter results by the client IP address. | | --limit | int32 | Limit how many records can be returned (default 20). | | --order-by | string | Order the results by their values. See Examples. | | -s, --state | string | Filter results by state. Acceptable values: OPEN, CLOSED. | | -u, --user | string | Filter results by a specific user principal. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 523: rpk cluster connections **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-connections.md --- # rpk cluster connections --- title: rpk cluster connections latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-connections page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-connections.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-connections.adoc page-git-created-date: "2025-11-19" page-git-modified-date: "2025-11-19" --- Manage and monitor cluster connections. ## [](#usage)Usage ```bash rpk cluster connections [command] [flags] ``` ## [](#aliases)Aliases ```bash connections, connection ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for connections. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 524: rpk cluster info **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-info.md --- # rpk cluster info --- title: rpk cluster info latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-info page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-info.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-info.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Request broker metadata information. The Kafka protocol’s metadata contains information about brokers, topics, and the cluster as a whole. This command only runs if specific sections of metadata are requested. There are currently three sections: the cluster, the list of brokers, and the topics. If no section is specified, this defaults to printing all sections. If the topic section is requested, all topics are requested by default unless some are manually specified as arguments. Expanded per-partition information can be printed with the -d flag, and internal topics can be printed with the -i flag. In the broker section, the controller node is suffixed with `\*`. ## [](#usage)Usage ```bash rpk cluster info [flags] ``` ## [](#aliases)Aliases ```bash metadata, status, info ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for metadata. | | -b, --print-brokers | - | Print brokers section. | | -c, --print-cluster | - | Print cluster section. | | -d, --print-detailed-topics | - | Print per-partition information for topics (implies -t). | | -i, --print-internal-topics | - | Print internal topics (if all topics requested, implies -t). | | -t, --print-topics | - | Print topics section (implied if any topics are specified). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 525: rpk cluster logdirs describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-logdirs-describe.md --- # rpk cluster logdirs describe --- title: rpk cluster logdirs describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-logdirs-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-logdirs-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-logdirs-describe.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe log directories on Redpanda brokers. This command prints information about log directories on brokers, particularly, the base directory that topics and partitions are located in, and the size of data that has been written to the partitions. The size you see may not exactly match the size on disk as reported by du: Redpanda allocates files in chunks. The chunks will show up in du, while the actual bytes so far written to the file will show up in this command. The directory returned is the root directory for partitions. Within Redpanda, the partition data lives underneath the returned root directory in `kafka/{topic}/{partition}_{revision}/`, where `revision` is a Redpanda internal concept. ## [](#usage)Usage ```bash rpk cluster logdirs describe [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --aggregate-into | string | If non-empty, what column to aggregate into starting from the partition column (broker, dir, topic). | | -b, --broker | int32 | If non-negative, the specific broker to describe (default -1). | | -h, --help | - | Help for describe. | | -H, --human-readable | - | Print the logdirs size in a human-readable form. | | --sort-by-size | - | If true, sort by size. | | --topics | strings | Specific topics to describe. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 526: rpk cluster logdirs **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-logdirs.md --- # rpk cluster logdirs --- title: rpk cluster logdirs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-logdirs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-logdirs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-logdirs.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe log directories on Redpanda brokers. ## [](#usage)Usage ```bash rpk cluster logdirs [flags] [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for logdirs. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 527: rpk cluster quotas alter **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-quotas-alter.md --- # rpk cluster quotas alter --- title: rpk cluster quotas alter latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-quotas-alter page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-quotas-alter.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-quotas-alter.adoc page-git-created-date: "2025-08-19" page-git-modified-date: "2025-08-19" --- Add or delete a client quota. A client quota consists of an entity (to which the quota is applied) and a quota type (what is being applied). There are two entity types supported by Redpanda: client ID and client ID prefix. Use the `--default` flag to assign quotas to default entity types. You can perform a dry run using the `--dry` flag. ## [](#usage)Usage ```bash rpk cluster quotas alter [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --add | strings | Key=value quota to add, where the value is a float number (repeatable). | | --default | strings | Entity type for default matching, where type is client-id or client-id-prefix (repeatable). | | --delete | strings | Key of the quota to delete (repeatable). | | --dry | - | Perform a dry run. Validate the request without altering the quotas. Show what would be done, but do not execute the command. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for alter. | | --name | strings | Entity for exact matching. Format type=name where type is the client-id or client-id-prefix (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#examples)Examples Add quota (consumer\_byte\_rate) to client ID ``: ```bash rpk cluster quotas alter --add consumer_byte_rate=200000 --name client-id= ``` Add quota (consumer\_byte\_rate) to client ID starting with `-`: ```bash rpk cluster quotas alter --add consumer_byte_rate=200000 --name client-id-prefix=- ``` Add quota (producer\_byte\_rate) to default client ID: ```bash rpk cluster quotas alter --add producer_byte_rate=180000 --default client-id ``` Remove quota (producer\_byte\_rate) from client ID `foo`: ```bash rpk cluster quotas alter --delete producer_byte_rate --name client-id= ``` --- # Page 528: rpk cluster quotas describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-quotas-describe.md --- # rpk cluster quotas describe --- title: rpk cluster quotas describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-quotas-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-quotas-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-quotas-describe.adoc page-git-created-date: "2025-08-19" page-git-modified-date: "2025-08-19" --- Describe client quotas. This command describes client quotas that match the provided filtering criteria. Running the command without filters returns all client quotas. Use the `--strict` flag for strict matching, which means that the only quotas returned exactly match the filters. You can specify filters in terms of entities. An entity consists of either a client ID or a client ID prefix. ## [](#usage)Usage ```bash rpk cluster quotas describe [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --any | strings | Type for any matching (names or default), where type is client-id or client-id-prefix (repeatable). | | --default | strings | Type for default matching, where type is client-id or client-id-prefix (repeatable). | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for describe. | | --name | strings | The type=name pair for exact name matching, where type is client-id or client-id-prefix (repeatable). | | --strict | - | Specifies whether matches are strict. If true, entities with unspecified entity types are excluded. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#examples)Examples Describe all client quotas: ```bash rpk cluster quotas describe ``` Describe all client quota with client ID ``: ```bash rpk cluster quotas describe --name client-id= ``` Describe client quotas for a given client ID prefix `.`: ```bash rpk cluster quotas describe --name client-id=. ``` --- # Page 529: rpk cluster quotas import **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-quotas-import.md --- # rpk cluster quotas import --- title: rpk cluster quotas import latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-quotas-import page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-quotas-import.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-quotas-import.adoc page-git-created-date: "2025-08-19" page-git-modified-date: "2025-08-19" --- Use this command to import client quotas in the format produced by `rpk cluster quotas describe --format json/yaml`. The schema of the import string matches the schema from `rpk cluster quotas describe --format help`: #### YAML ```yaml quotas: - entity: - name: string - type: string values: - key: string - values: string ``` #### JSON ```yaml { "quotas": [ { "entity": [ { "name": "string", "type": "string" } ], "values": [ { "key": "string", "values": "string" } ] } ] } ``` Use the `--no-confirm` flag to avoid the confirmation prompt. ## [](#usage)Usage ```bash rpk cluster quotas import [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --from | string | Either the quotas or a path to a file containing the quotas to import; check help text for more information. | | -h, --help | - | Help for import. | | --no-confirm | - | Disable confirmation prompt. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#examples)Examples Import client quotas from a file: ```bash rpk cluster quotas import --from /path/to/file ``` Import client quotas from a string: ```bash rpk cluster quotas import --from '{"quotas":...}' ``` Import client quotas from a JSON string: ```bash rpk cluster quotas import --from ' { "quotas": [ { "entity": [ { "name": "retrievals-", "type": "client-id-prefix" } ], "values": [ { "key": "consumer_byte_rate", "value": "140000" } ] }, { "entity": [ { "name": "consumer-1", "type": "client-id" } ], "values": [ { "key": "producer_byte_rate", "value": "140000" } ] } ] } ' ``` Import client quotas from a YAML string: ```bash rpk cluster quotas import --from ' quotas: - entity: - name: retrievals- type: client-id-prefix values: - key: consumer_byte_rate value: "140000" - entity: - name: consumer-1 type: client-id values: - key: producer_byte_rate value: "140000" ' ``` --- # Page 530: rpk cluster quotas **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-quotas.md --- # rpk cluster quotas --- title: rpk cluster quotas latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-quotas page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-quotas.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-quotas.adoc page-git-created-date: "2025-08-19" page-git-modified-date: "2025-08-19" --- Manage Redpanda client quotas. ## [](#usage)Usage ```bash rpk cluster quotas [command] [flags] ``` ## [](#aliases)Aliases ```bash quotas, quota ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for quotas. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 531: rpk cluster storage cancel mount **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-cancel-mount.md --- # rpk cluster storage cancel mount --- title: rpk cluster storage cancel mount latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-cancel-mount page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-cancel-mount.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-cancel-mount.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. Cancels a mount/unmount operation on a topic. Use the migration ID that is emitted when the mount or unmount operation is executed. You can also get the migration ID by listing the mount/unmount operations. ## [](#usage)Usage ```bash rpk cluster storage cancel-mount [MIGRATION ID] [flags] ``` ## [](#aliases)Aliases ```bash cancel-mount, cancel-unmount ``` ## [](#examples)Examples Cancel a mount/unmount operation: ```bash rpk cluster storage cancel-mount 123 ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for cancel-mount. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 532: rpk cluster storage list mount **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-list-mount.md --- # rpk cluster storage list mount --- title: rpk cluster storage list mount latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-list-mount page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-list-mount.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-list-mount.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. List mount/unmount operations on a topic in the Redpanda cluster from [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage). You can also filter the list by state using the `--filter` flag. The possible states are: - `planned` - `prepared` - `executed` - `finished` If no filter is provided, all migrations are listed. ## [](#usage)Usage ```bash rpk cluster storage list-mount [flags] ``` ## [](#aliases)Aliases ```bash list-mount, list-unmount ``` ## [](#examples)Examples Lists mount/unmount operations: ```bash rpk cluster storage list-mount ``` Use a filter to list only migrations in a specific state: ```bash rpk cluster storage list-mount --filter planned ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --filter | string | Filter the list of migrations by state. Only valid for text (default all). | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 533: rpk cluster storage list-mountable **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-list-mountable.md --- # rpk cluster storage list-mountable --- title: rpk cluster storage list-mountable latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-list-mountable page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-list-mountable.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-list-mountable.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. List topics that are available to mount from object storage. This command displays topics that exist in object storage and can be mounted to your Redpanda cluster. Each topic includes its location in object storage and namespace information if applicable. ## [](#usage)Usage ```bash rpk cluster storage list-mountable [flags] ``` ## [](#examples)Examples List all mountable topics: ```bash rpk cluster storage list-mountable ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list-mountable. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 534: rpk cluster storage mount **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-mount.md --- # rpk cluster storage mount --- title: rpk cluster storage mount latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-mount page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-mount.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-mount.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. Mount a topic to the Redpanda cluster from [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage). This command mounts a topic in the Redpanda cluster using log segments stored in Tiered Storage. You can optionally rename the topic using the `--to` flag. Requirements: - Log segments for the topic must be available in Tiered Storage. - A topic with the same name must not already exist in the cluster. ## [](#usage)Usage ```bash rpk cluster storage mount [TOPIC] [flags] ``` ## [](#examples)Examples Mounts topic ` from Tiered Storage to the cluster in the my-namespace: ```bash rpk cluster storage mount ``` Mount topic `` from Tiered Storage to the cluster in the `` with `` as the new topic name: ```bash rpk cluster storage mount / --to / ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --to | string | New namespace/topic name for the mounted topic (optional). | | -h, --help | - | Help for mount. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 535: rpk cluster storage status mount **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-status-mount.md --- # rpk cluster storage status mount --- title: rpk cluster storage status mount latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-status-mount page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-status-mount.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-status-mount.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. Status of mount/unmount operation on topic in a Redpanda cluster from [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage). ## [](#usage)Usage ```bash rpk cluster storage status-mount [MIGRATION ID] [flags] ``` ## [](#aliases)Aliases ```bash status-mount, status-unmount ``` ## [](#examples)Examples Status for a mount/unmount operation: ```bash rpk cluster storage status-mount 123 ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for status-mount. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 536: rpk cluster storage unmount **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage-unmount.md --- # rpk cluster storage unmount --- title: rpk cluster storage unmount latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage-unmount page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage-unmount.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage-unmount.adoc page-git-created-date: "2024-12-03" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. Unmount a topic from the Redpanda cluster and secure it in [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage). This command performs an operation that: 1. Rejects all writes to the topic. 2. Flushes data to Tiered Storage. 3. Removes the topic from the cluster. Key Points: - During unmounting, any attempted writes or reads will receive an `UNKNOWN_TOPIC_OR_PARTITION` error. - The unmount operation works independently of other topic configurations like `remote.delete=false`. - After unmounting, the topic can be remounted to this cluster or a different cluster if the log segments are moved to that cluster’s Tiered Storage. ## [](#usage)Usage ```bash rpk cluster storage unmount [TOPIC] [flags] ``` ## [](#examples)Examples Unmount topic '' from the cluster in the '': ```bash rpk cluster storage unmount / ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for unmount. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 537: rpk cluster storage **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-storage.md --- # rpk cluster storage --- title: rpk cluster storage latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-storage page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-storage.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-storage.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- > 📝 **NOTE** > > This command is only supported in BYOC and Dedicated clusters. Manage the cluster storage. ## [](#usage)Usage ```bash rpk cluster storage [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for storage. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 538: rpk cluster txn describe-producers **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-txn-describe-producers.md --- # rpk cluster txn describe-producers --- title: rpk cluster txn describe-producers latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-txn-describe-producers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-txn-describe-producers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-txn-describe-producers.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe transactional producers to partitions. This command describes partitions that active transactional producers are producing to. For more information on the producer ID and epoch columns, see `rpk cluster txn --help`. ## [](#concept)Concept The last timestamp corresponds to the timestamp of the last record that was written by the client. The transaction start offset corresponds to the offset that the transaction is began at. All consumers configured to read only committed records cannot read past the transaction start offset. The output includes a few advanced fields that can be used for sanity checking: the last sequence is the last sequence number that the producer has written, and the coordinator epoch is the epoch of the broker that is being written to. The last sequence should always go up and then wrap back to 0 at MaxInt32. The coordinator epoch should remain fixed, or rarely, increase. You can query all topics and partitions that have active producers with --all. To filter for specific topics, use `--topics`. You can additionally filter by partitions with `--partitions`. ## [](#usage)Usage ```bash rpk cluster txn describe-producers [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -a, --all | - | Query all producer IDs on any topic. | | -h, --help | - | Help for describe-producers. | | -p, --partitions | int32 | int32Slice Partitions to describe producers for (repeatable) (default []). | | -t, --topics | strings | Topic to describe producers for (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 539: rpk cluster txn describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-txn-describe.md --- # rpk cluster txn describe --- title: rpk cluster txn describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-txn-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-txn-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-txn-describe.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe transactional IDs. This command, in comparison to `list`, is a more detailed per-transaction view of transactional IDs. In addition to the state and producer ID, this command also outputs when a transaction started, the epoch of the producer ID, how long until the transaction times out, and the partitions currently a part of the transaction. For information on what the columns in the output mean, see `rpk cluster txn --help`. By default, all topics in a transaction are merged into one line. To print a row per topic, use `--format=long`. To include partitions with topics, use `--print-partitions`; `--format=json/yaml` will return the equivalent of the long format with print partitions included. If no transactional IDs are requested, all transactional IDs are printed. ## [](#usage)Usage ```bash rpk cluster txn describe [TXN-IDS...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for describe. | | -p, --print-partitions | - | Include per-topic partitions that are in the transaction. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 540: rpk cluster txn list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-txn-list.md --- # rpk cluster txn list --- title: rpk cluster txn list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-txn-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-txn-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-txn-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List transactions and their current states. This command lists all known transactions in the cluster, the producer ID for the transactional ID, and the and the state of the transaction. For information on what the columns in the output mean, see `rpk cluster txn --help`. ## [](#usage)Usage ```bash rpk cluster txn list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 541: rpk cluster txn **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster-txn.md --- # rpk cluster txn --- title: rpk cluster txn latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster-txn page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster-txn.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster-txn.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Information about transactions and transactional producers. ## [](#concept)Concept Transactions allow producing, or consume-modifying-producing, to Redpanda. The consume-modify-produce loop is also referred to as EOS (exactly once semantics). Transactions involve a lot of technical complexity that is largely hidden within clients. This command space helps shed a light on what is actually happening in clients and brokers while transactions are in use. ### [](#transactional-id)Transactional ID The transactional ID is the string you define in clients when actually using transactions. ### [](#producer-id-epoch)Producer ID & Epoch The producer ID is generated within clients when you transactionally produce. The producer ID is a number that maps to your transactional ID, allowing requests to be smaller when producing, and allowing some optimizations within brokers when managing transactions. Some clients expose the producer ID, allowing you to track the transactional ID that a producer ID maps to. If possible, it is recommended to monitor the producer ID used in your applications. The producer epoch is a number that somewhat "counts" the number of times your transaction has been initialized or expired. If you have one client that uses a transactional ID, it may receive producer ID 3 epoch 0. Another client that uses that same transactional ID will receive producer ID 3 epoch 1. If the client starts a transaction but does not finish it in time, the cluster will internally bump the epoch to 2. The epoch allows the cluster to "fence" clients: if a client attempts to use a producer ID with an old epoch, the cluster will reject the client’s produce request as stale. ### [](#transaction-state)Transaction State The state of a transaction indicates what is currently happening with a transaction. A high level overview of transactional states: - Empty: The transactional ID is ready, but there are no partitions nor groups added to it. There is no active transaction. - Ongoing: The transactional ID is being used in a began transaction. - PrepareCommit: A commit is in progress. - PrepareAbort: An abort is in progress. - PrepareEpochFence: The transactional ID is timing out. - Dead: The transactional ID has expired and/or is not in use. ### [](#last-stable-offset)Last Stable Offset The last stable offset is the offset at which a transaction has begun and clients cannot consume past, if the client is configured to read only committed offsets. The last stable offset can be seen when describing active transactional producers by looking for the earliest transaction start offset per partition. ## [](#usage)Usage ```bash rpk cluster txn [command] [flags] ``` ## [](#aliases)Aliases ```bash txn, transaction ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for txn. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 542: rpk cluster **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-cluster/rpk-cluster.md --- # rpk cluster --- title: rpk cluster latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-cluster/rpk-cluster page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-cluster/rpk-cluster.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-cluster/rpk-cluster.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Interact with a Redpanda cluster. ## [](#usage)Usage ```bash rpk cluster [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for cluster. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 543: rpk **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-commands.md --- # rpk --- title: rpk latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-commands page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-commands.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-commands.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- `rpk` is a command line interface (CLI) toolbox that let you configure, manage, and tune Redpanda clusters. It also lets you manage topics, groups, and access control lists (ACLs). `rpk` stands for Redpanda Keeper. ## [](#rpk)rpk `rpk` is the Redpanda CLI toolbox. ### [](#usage)Usage ```bash rpk [command] ``` ### [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for rpk`. | | -v, --verbose | - | Enable verbose logging (default false). | ## [](#related-topics)Related topics - [Introduction to rpk](../../../manage/rpk/rpk-install/) * * * ## [](#suggested-reading)Suggested reading - [Introducing rpk container](https://redpanda.com/blog/rpk-container/) - [Get started with rpk commands](https://redpanda.com/blog/getting-started-rpk/) --- # Page 544: rpk generate app **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-generate/rpk-generate-app.md --- # rpk generate app --- title: rpk generate app latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-generate/rpk-generate-app page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-generate/rpk-generate-app.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-generate/rpk-generate-app.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- > 📝 **NOTE** > > This command is only supported in Serverless clusters. Generate a sample application to connect with Redpanda. This command generates a starter application to produce and consume from the settings defined in the `rpk profile`. Its goal is to get you producing and consuming quickly with Redpanda in a language that is familiar to you. By default, this runs interactively, prompting you to select a language and a user with which to create your application. To use this without interactivity, specify how you want your application to be created using flags. The `--language` flag lets you specify the language. There is no default. Available language: `go`. The `--new-sasl—​user` flag lets you generate a new SASL user with admin ACLs. If you don’t want to use your current profile user or don’t want to create a new one, you can use the `--no-user` flag to generate the starter app without the user. ## [](#examples)Examples - Generate an app with interactive prompts: ```bash rpk generate app ``` - Generate an app in a specified language with the existing SASL user: ```bash rpk generate app --language ``` - Generate an app in the specified language with a new SASL user: ```bash rpk generate app -l --new-sasl-user : ``` - Generate an app in the `tmp` directory, but take no action on the user: ```bash rpk generate app -l --no-user --output /tmp ``` ## [](#usage)Usage ```bash rpk generate app [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for app. | | -l, --language | string | The language you want the code sample to be generated with. Available language: go. | | --new-sasl-credentials | string | If provided, rpk will generate and use these credentials (:). | | --no-user | - | Generates the sample app without SASL user. | | -o, --output | string | The path where the app will be written. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 545: rpk generate grafana-dashboard **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-generate/rpk-generate-grafana-dashboard.md --- # rpk generate grafana-dashboard --- title: rpk generate grafana-dashboard latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-generate/rpk-generate-grafana-dashboard page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-generate/rpk-generate-grafana-dashboard.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-generate/rpk-generate-grafana-dashboard.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-12-11" --- Generate Grafana dashboards for Redpanda metrics. Use this command to generate sample Grafana dashboards for Redpanda metrics. These dashboards can be imported into a Grafana or Grafana Cloud instance. To select a specific dashboard, use the `--dashboard` flag followed by the dashboard name. For example, to generate the operations dashboard, run: ```bash rpk generate grafana-dashboard --dashboard operations ``` The selected dashboard will be downloaded from Redpanda Data’s [observability GitHub repository](https://github.com/redpanda-data/observability). > 📝 **NOTE** > > The legacy dashboard is still available as an option (`legacy`), but it isn’t downloaded from GitHub. Instead, the generated dashboard is based on which metrics endpoint is used (`--metrics-endpoint`). ## [](#available-dashboards)Available dashboards You can select one of the following dashboard types: | Name | Description | | --- | --- | | consumer-metrics | Monitoring of Java Kafka consumers, using the Prometheus JMX Exporter and the Kafka Sample Configuration. | | consumer-offsets | Metrics and KPIs that provide details of topic consumers and how far they are lagging behind the end of the log. | | operations (default) | Provides an overview of KPIs for a Redpanda cluster with health indicators. This is suitable for ops or SRE to monitor on a daily or continuous basis. | | serverless | Monitoring dashboard for Redpanda Serverless clusters. | | topic-metrics | Provides throughput, read/write rates, and on-disk sizes of each/all topics. | | legacy | Generates dashboard based on selected metrics endpoint (--metrics-endpoint). Modify prometheus datasource and job-name with --datasource and --job-name flags. | ## [](#usage)Usage ```bash rpk generate grafana-dashboard [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --dashboard | string | The name of the dashboard you wish to download. Use --dashboard help for more info (default: operations). | | -h, --help | - | Help for grafana-dashboard. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 546: rpk generate shell-completion **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-generate/rpk-generate-shell-completion.md --- # rpk generate shell-completion --- title: rpk generate shell-completion latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-generate/rpk-generate-shell-completion page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-generate/rpk-generate-shell-completion.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-generate/rpk-generate-shell-completion.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Shell completion can help autocomplete `rpk` commands when you press tab. ## [](#bash)Bash Bash autocompletion relies on the bash-completion package. You can test if you have this by running `type \_init_completion`, if you do not, you can install the package through your package manager. If you have bash-completion installed, and the command still fails, you likely need to add the following line to your `~/.bashrc`: ```bash source /usr/share/bash-completion/bash_completion ``` To ensure autocompletion of `rpk` exists in all shell sessions, add the following to your `~/.bashrc`: ```bash command -v rpk >/dev/null && . <(rpk generate shell-completion bash) ``` Alternatively, to globally enable `rpk` completion, you can run the following: ```bash rpk generate shell-completion bash > /etc/bash_completion.d/rpk ``` ## [](#zsh)Zsh To enable autocompletion in any zsh session for any user, follow these steps: Determine which directory in your `$fpath` to use to store the completion file. You can inspect your `fpath` by running: ```zsh echo $fpath ``` Choose one of the directories listed. For example, if `/usr/local/share/zsh/site-functions` is present in your `fpath`, you can place the `_rpk` completion file there: ```zsh rpk generate shell-completion zsh > /usr/local/share/zsh/site-functions/_rpk ``` If the directory you chose is not already in `fpath`, add it to your `.zshrc`: ```zsh fpath+=(/usr/local/share/zsh/site-functions) ``` Finally, ensure that `compinit` is run. Add (or verify) the following in your `.zshrc`: ```zsh autoload -U compinit && compinit ``` After restarting your shell, `rpk` completion should be active. ## [](#fish)Fish To enable autocompletion in any `fish` session, run: ```fish rpk generate shell-completion fish > ~/.config/fish/completions/rpk.fish ``` ## [](#usage)Usage ```bash rpk generate shell-completion [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for shell-completion. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 547: rpk generate **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-generate/rpk-generate.md --- # rpk generate --- title: rpk generate latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-generate/rpk-generate page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-generate/rpk-generate.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-generate/rpk-generate.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- ## [](#rpk-generate)rpk generate Generate a configuration template for related services. ## [](#usage)Usage ```bash rpk generate [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for generate. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 548: rpk group delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group-delete.md --- # rpk group delete --- title: rpk group delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete consumer groups explicitly through `rpk group delete`. This allows you to proactively manage offsets, for example, when you’ve created temporary groups for quick investigation or testing and you want to clear offsets sooner than the automatic cleanup. Consumer groups are automatically deleted when the last committed offset expires. Group offset deletion can happen through: - Kafka `OffsetDelete` API: Offsets can be explicitly deleted using the Kafka `OffsetDelete` API. See [`rpk group offset delete`](../rpk-group-offset-delete/). - Periodic Offset Expiration: Offsets expire automatically when the group has been empty for a set duration. ## [](#usage)Usage ```bash rpk group delete [GROUPS...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 549: rpk group describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group-describe.md --- # rpk group describe --- title: rpk group describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group-describe.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe group offset status & lag. This command describes group members, calculates their lag, and prints detailed information about the members. The `COORDINATOR-PARTITION` column indicates the partition in the `__consumer_offsets` topic responsible for the group, if topic details are available; run with `--verbose` for more information if it is missing. The `--regex` flag (`-r`) parses arguments as regular expressions and describes groups that match any of the expressions. ## [](#usage)Usage ```bash rpk group describe [GROUPS...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for describe. | | -i, --instance-ID | - | Include each group member’s instance ID. | | -c, --print-commits | - | Print only the group commits section. | | -s, --print-summary | - | Print only the group summary section. | | -r, --regex | string | Parse arguments as regex. Describe any group that matches any input group expression. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#examples)Examples Describe groups `` and ``: ```bash rpk group describe ``` Describe any group starting with f and ending in r: ```bash rpk group describe -r '^f.*' '.*r$' ``` Describe all groups: ```bash rpk group describe -r '*' ``` Describe any one-character group: ```bash rpk group describe -r . ``` --- # Page 550: rpk group list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group-list.md --- # rpk group list --- title: rpk group list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List all groups. This command lists all groups currently known to Redpanda, including empty groups that have not yet expired. The BROKER column is which broker node is the coordinator for the group. This command can be used to track down unknown groups, or to list groups that need to be cleaned up. The STATE column shows which state the group is in: - `PreparingRebalance`: The group is preparing to rebalance. - `CompletingRebalance`: The group is waiting for the leader to provide assignments. - `Stable`: The group is not empty and has no group membership changes in process. - `Dead`: Transient state as the group is being removed. - `Empty`: The group currently has no members. ## [](#usage)Usage ```bash rpk group list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | -s, --states | strings | Comma-separated list of group states to filter for. Possible states: [PreparingRebalance, CompletingRebalance, Stable, Dead, Empty]. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 551: rpk group offset-delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group-offset-delete.md --- # rpk group offset-delete --- title: rpk group offset-delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group-offset-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group-offset-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group-offset-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Forcefully delete offsets for a Kafka group. The broker will only allow the request to succeed if the group is in a Empty state (no subscriptions) or there are no subscriptions for offsets for topic/partitions requested to be deleted. Use either the `--from-file` or the `--topic` option. They are mutually exclusive. To indicate which topics or topic partitions you’d like to remove offsets from use the `--topic` (`-t`) flag, followed by a comma separated list of partition IDs. Supplying no list will delete all offsets for all partitions for a given topic. You may also provide a text file to indicate topic/partition tuples. Use the `--from-file` flag for this option. The file must contain lines of topic/partitions separated by a tab or space. Example: topic\_a 0 topic\_a 1 topic\_b 0 ## [](#usage)Usage ```bash rpk group offset-delete [GROUP] --from-file FILE --topic foo:0,1,2 [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --from-file | string | File of topic/partition tuples for which to delete offsets for. | | -h, --help | - | Help for offset-delete. | | -t, --topic | stringArray | topic:partition_id (repeatable; e.g. -t foo:0,1,2 ). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 552: rpk group seek **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group-seek.md --- # rpk group seek --- title: rpk group seek latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group-seek page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group-seek.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group-seek.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Modify a group’s current offsets. This command allows you to modify a group’s offsets. Sometimes, you may need to rewind a group if you had a mistaken deploy, or fast-forward a group if it is falling behind on messages that can be skipped. The `--to` option allows you to seek to the start of partitions, end of partitions, or after a specific timestamp. The default is to seek any topic previously committed. Using `--topics` allows to you set commits for only the specified topics; all other commits will remain untouched. Topics with no commits will not be committed unless allowed with `--allow-new-topics`. The `--to-group` option allows you to seek to commits that are in another group. This is a merging operation: if g1 is consuming topics A and B, and g2 is consuming only topic B, `rpk group seek g1 --to-group g2` will update g1’s commits for topic B only. The `--topics` flag can be used to further narrow which topics are updated. Unlike `--to`, all non-filtered topics are committed, even topics not yet being consumed, meaning `--allow-new-topics` is not needed. The `--to-file` option allows to seek to offsets specified in a text file with the following format: \[TOPIC\] \[PARTITION\] \[OFFSET\] \[TOPIC\] \[PARTITION\] \[OFFSET\] Each line contains the topic, the partition, and the offset to seek to. As with the prior options, `--topics` allows filtering which topics are updated. Similar to `--to-group`, all non-filtered topics are committed, even topics not yet being consumed, meaning --allow-new-topics is not needed. The `--to`, `--to-group`, and `--to-file` options are mutually exclusive. If you are not authorized to describe or read some topics used in a group, you will not be able to modify offsets for those topics. ## [](#examples)Examples Seek group G to June 1st, 2021: ```bash rpk group seek g --to 1622505600 ``` or ```bash rpk group seek g --to 1622505600000 ``` or ```bash rpk group seek g --to 1622505600000000000 ``` Seek group X to the commits of group Y topic foo: ```bash rpk group seek X --to-group Y --topics foo ``` Seek group G’s topics foo, bar, and biz to the end: ```bash rpk group seek G --to end --topics foo,bar,biz ``` Seek group G to the beginning of a topic it was not previously consuming: ```bash rpk group seek G --to start --topics foo --allow-new-topics ``` ## [](#usage)Usage ```bash rpk group seek [GROUP] --to (start|end|timestamp) --to-group ... --topics ... [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-new-topics | - | Allow seeking to new topics not currently consumed (implied with --to-group or --to-file). | | -h, --help | - | Help for seek. | | --to | string | Where to seek (start, end, unix second | millisecond | nanosecond). | | --to-file | string | Seek to offsets as specified in the file. | | --to-group | string | Seek to the commits of another group. | | --topics | strings | Only seek these topics, if any are specified. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 553: rpk group **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-group/rpk-group.md --- # rpk group --- title: rpk group latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-group/rpk-group page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-group/rpk-group.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-group/rpk-group.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Describe, list, and delete consumer groups and manage their offsets. Consumer groups allow you to horizontally scale consuming from topics. A non-group consumer consumes all records from all partitions you assign it. In contrast, consumer groups allow many consumers to coordinate and divide work. If you have two members in a group consuming topics A and B, each with three partitions, then both members consume three partitions. If you add another member to the group, then each of the three members will consume two partitions. This allows you to horizontally scale consuming of topics. The unit of scaling is a single partition. If you add more consumers to a group than there are are total partitions to consume, then some consumers will be idle. More commonly, you have many more partitions than consumer group members and each member consumes a chunk of available partitions. One scenario where you may want more members than partitions is if you want active standby’s to take over load immediately if any consuming member dies. How group members divide work is entirely client driven (the "partition assignment strategy" or "balancer" depending on the client). Brokers know nothing about how consumers are assigning partitions. A broker’s role in group consuming is to choose which member is the leader of a group, forward that member’s assignment to every other member, and ensure all members are alive through heartbeats. Consumers periodically commit their progress when consuming partitions. Through these commits, you can monitor just how far behind a consumer is from the latest messages in a partition. This is called "lag". Large lag implies that the client is having problems, which could be from the server being too slow, or the client being oversubscribed in the number of partitions it is consuming, or the server being in a bad state that requires restarting or removing from the server pool, and so on. You can manually manage offsets for a group, which allows you to rewind or forward commits. If you notice that a recent deploy of your consumers had a bug, you may want to stop all members, rewind the commits to before the latest deploy, and restart the members with a patch. This command allows you to list all groups, describe a group (to view the members and their lag), and manage offsets. ## [](#usage)Usage ```bash rpk group [command] ``` ## [](#aliases)Aliases ```bash group, g ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for group. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 554: rpk help **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-help.md --- # rpk help --- title: rpk help latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-help page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-help.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-help.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-08" --- Help provides additional information for any command in the application. Simply type `rpk help [command]` for full details. ## [](#usage)Usage ```bash rpk help [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | help for help. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | verride rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 555: rpk plugin install **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-plugin/rpk-plugin-install.md --- # rpk plugin install --- title: rpk plugin install latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-plugin/rpk-plugin-install page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-plugin/rpk-plugin-install.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-plugin/rpk-plugin-install.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Install an `rpk plugin`. An `rpk plugin` must be saved in `$HOME/.local/bin` or in a directory that is in your `$PATH`. By default, this command installs plugins to `$HOME/.local/bin`. This can be overridden by specifying the `--dir` flag. If `--dir` is not present, `rpk` will create `$HOME/.local/bin` if it does not exist. ## [](#usage)Usage ```bash rpk plugin install [PLUGIN] [flags] ``` ## [](#aliases)Aliases ```bash install, download ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --dir | string | Destination directory to save the installed plugin (default: "$HOME/.local/bin"). | | -h, --help | - | Help for install. | | -u, --update | - | Update a locally installed plugin if it differs from the current remote version. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 556: rpk plugin list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-plugin/rpk-plugin-list.md --- # rpk plugin list --- title: rpk plugin list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-plugin/rpk-plugin-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-plugin/rpk-plugin-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-plugin/rpk-plugin-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List all available plugins. By default, this command fetches the remote manifest and prints plugins available for download. Any plugin that is already downloaded is prefixed with an asterisk. If a locally installed plugin has a different `SHA-256 SUM` as the one specified in the manifest, or if the `SHA-256 SUM` could not be calculated for the local plugin, an additional message is printed. You can specify `--local` to print all locally installed plugins, as well as whether you have "shadowed" plugins (the same plugin specified multiple times). ## [](#usage)Usage ```bash rpk plugin list [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | -l, --local | - | List locally installed plugins and shadowed plugins. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 557: rpk plugin uninstall **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-plugin/rpk-plugin-uninstall.md --- # rpk plugin uninstall --- title: rpk plugin uninstall latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-plugin/rpk-plugin-uninstall page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-plugin/rpk-plugin-uninstall.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-plugin/rpk-plugin-uninstall.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Uninstall or remove an existing local plugin. This command lists locally installed plugins and removes the first plugin that matches the requested removal. If `--include-shadowed` is specified, this command also removes all shadowed plugins of the same name. To remove a command under a nested namespace, concatenate the namespace. For example, for the nested namespace `rpk foo bar`, use the name `foo_bar`. ## [](#usage)Usage ```bash rpk plugin uninstall [NAME] [flags] ``` ## [](#aliases)Aliases ```bash uninstall, rm ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for uninstall. | | --include-shadowed | - | Also remove shadowed plugins that have the same name. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 558: rpk plugin **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-plugin/rpk-plugin.md --- # rpk plugin --- title: rpk plugin latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-plugin/rpk-plugin page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-plugin/rpk-plugin.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-plugin/rpk-plugin.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List, download, update, and remove `rpk` plugins. Plugins augment `rpk` with new commands. For a plugin to be used, it must be in `$HOME/.local/bin` or somewhere discoverable by `rpk` in your `$PATH`. All plugins follow a defined naming scheme: ```bash .rpk- .rpk.ac- ``` All plugins are prefixed with either `.rpk-` or `.rpk.ac-.` When `rpk` starts up, it searches all directories in your `$PATH` for any executable binary that begins with either of those prefixes. For any binary it finds, `rpk` adds a command for that name to the `rpk` command space itself. No plugin name can shadow an existing `rpk` command, and only one plugin can exist under a given name at once. Plugins are added to the `rpk` command space on a first-seen basis. If you have two plugins `rpk-foo`, and the second is discovered later on in the `$PATH` directories, then only the first will be used. The second will be ignored. Plugins that have an `.rpk.ac-` prefix indicate that they support the `--help-autocomplete` flag. If `rpk` sees this, `rpk` will exec the plugin with that flag when `rpk` starts up, and the plugin will return all commands it supports as well as short and long help test for each command. `rpk` uses this return to build a shadow command space within `rpk` itself so that it looks as if the plugin exists within `rpk`. This is particularly useful if you enable autocompletion. The expected return for plugins from `--help-autocomplete` is an array of the following: ```c type pluginHelp struct { Path string `json:"path,omitempty"` Short string `json:"short,omitempty"` Long string `json:"long,omitempty"` Example string `json:"example,omitempty"` Args []string `json:"args,omitempty"` } ``` where `path` is an underscore delimited argument path to a command. For example, `foo_bar_baz` corresponds to the command `rpk foo bar baz`. ## [](#usage)Usage ```bash rpk plugin [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for plugin. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 559: rpk profile clear **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-clear.md --- # rpk profile clear --- title: rpk profile clear latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-clear page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-clear.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-clear.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Clear the current profile. This command clears and removes configuration values of the current profile, which can be useful to unset a production cluster profile. ## [](#usage)Usage ```bash rpk profile clear [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for clear. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 560: rpk profile create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-create.md --- # rpk profile create --- title: rpk profile create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create an `rpk profile`. There are multiple ways to create a profile. A name must be provided if not using `--from-cloud` or `--from-rpk-container`. - You can use `--from-redpanda` to generate a new profile from an existing `redpanda.yaml` file. The special value `current` creates a profile from the current `redpanda.yaml` as it is loaded within `rpk`. - You can use `--from-rpk-container` to generate a profile from an existing cluster created using `rpk container start` command. The name is not needed when using this flag. - You can use `--from-profile` to generate a profile from an existing profile or from from a profile in a yaml file. First, the filename is checked, then an existing profile name is checked. The special value `current` creates a new profile from the existing profile with any active environment variables or flags applied. - You can use `--from-cloud` to generate a profile from an existing cloud cluster ID. Note that you must be logged in with `rpk cloud login` first. The special value `prompt` will prompt to select a cloud cluster to create a profile for. - For serverless clusters that support both public and private networking, you will be prompted to select a network type unless you specify `--serverless-network`. To avoid prompts in automation, explicitly set `--serverless-network` to `public` or `private`. - You can use `--set key=value` to directly set fields. The key can either be the name of a `-X` flag or the path to the field in the profile’s YAML format. For example, using `--set tls.enabled=true` OR `--set kafka_api.tls.enabled=true` is equivalent. The former corresponds to the `-X` flag `tls.enabled`, while the latter corresponds to the path `kafka_api.tls.enabled` in the profile’s YAML. The `--set` flag is always applied last and can be used to set additional fields in tandem with `--from-redpanda` or `--from-cloud`. The `--set` flag supports autocompletion, suggesting the `-X` key format. If you begin writing a YAML path, the flag will suggest the rest of the path. It is recommended to always use the `--description` flag; the description is printed in the output of [`rpk profile list`](../rpk-profile-list/). Once the command completes successfully, `rpk` switches to the newly created profile. ## [](#usage)Usage ```bash rpk profile create [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -d, --description | string | Optional description of the profile. | | --from-cloud | string | [="prompt"] Create and switch to a new profile generated from a Redpanda Cloud cluster ID. | | --from-profile | string | Create and switch to a new profile from an existing profile or from a profile in a yaml file. | | --from-redpanda | string | Create and switch to a new profile from a redpanda.yaml file. | | --from-rpk-container | - | Create and switch to a new profile generated from a running cluster created with rpk container. | | -h, --help | - | Help for create. | | -s, --set | strings | Create and switch to a new profile, setting profile fields with key=value pairs. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 561: rpk profile current **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-current.md --- # rpk profile current --- title: rpk profile current latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-current page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-current.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-current.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Print the current `rpk profile` by name. This command simply prints the current profile name. This may be useful in scripts, or a custom prompt variable (for example, PS1), or to confirm what you have selected. ## [](#usage)Usage ```bash rpk profile current [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for current. | | -n, --no-newline | - | Do not print a newline after the profile name. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 562: rpk profile delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-delete.md --- # rpk profile delete --- title: rpk profile delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete an `rpk profile`. Deleting a profile removes it from the `rpk.yaml` file. If the deleted profile was the selected, current profile, `rpk` will use in-memory defaults until a new profile is selected. ## [](#usage)Usage ```bash rpk profile delete [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 563: rpk profile edit-globals **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-edit-globals.md --- # rpk profile edit-globals --- title: rpk profile edit-globals latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-edit-globals page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-edit-globals.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-edit-globals.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Edit `rpk profile` globals. This command opens your default editor to edit the `rpk` global configurations. ## [](#usage)Usage ```bash rpk profile edit-globals [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for edit-globals. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 564: rpk profile edit **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-edit.md --- # rpk profile edit --- title: rpk profile edit latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-edit page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-edit.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-edit.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Edit an `rpk profile`. This command opens your default editor to edit the specified profile, or the current profile if no profile is specified. If the profile does not exist, this command creates it and switches to it. ## [](#usage)Usage ```bash rpk profile edit [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for edit. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 565: rpk profile list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-list.md --- # rpk profile list --- title: rpk profile list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List `rpk profile`. Lists the profiles available from your `rpk.yaml` file. ## [](#usage)Usage ```bash rpk profile list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 566: rpk profile print-globals **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-print-globals.md --- # rpk profile print-globals --- title: rpk profile print-globals latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-print-globals page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-print-globals.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-print-globals.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Print `rpk profile` global configuration. ## [](#usage)Usage ```bash rpk profile print-globals [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for print-globals. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 567: rpk profile print **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-print.md --- # rpk profile print --- title: rpk profile print latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-print page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-print.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-print.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Print `rpk` profile configuration. If no name is specified, this command prints the current profile as it exists in the `rpk.yaml` file. To print both the profile as it exists in the `rpk.yaml` file and the current profile as it is loaded in `rpk` with internal defaults, user-specified flags, and environment variables applied, use the `-v/--verbose` flag. ## [](#usage)Usage ```bash rpk profile print [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for print. | | --raw | - | Print raw configuration from rpk.yaml, without environment variables nor flags applied. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 568: rpk profile prompt **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-prompt.md --- # rpk profile prompt --- title: rpk profile prompt latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-prompt page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-prompt.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-prompt.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Print a profile name formatted for a PS1 prompt. This command prints ANSI-escaped text per your current profile’s `prompt` field. If the current profile does not have a prompt, this prints nothing. If the prompt is invalid, this exits `0` with no message. To validate the current prompt, use the `--validate` flag. This command may introduce other `%` variables in the future. If you want to print a `%` directly, use `%%` to escape it. > 📝 **NOTE** > > - To use this in `zsh`, be sure to add `setopt PROMPT_SUBST` to your `.zshrc`. > > - To edit your `PS1`, use something like `PS1='$(rpk profile prompt)'` in your shell rc file. ## [](#format)Format The `prompt` field supports space or comma separated modifiers and a quoted string that is be modified. Inside the string, the variable `%p` or `%n` refers to the profile name. As a few examples: ```text prompt: hi-white, bg-red, bold, "[%p]" prompt: hi-red "PROD" prompt: white, "dev-%n ``` If you want to have multiple formats, you can wrap each formatted section in parentheses. ```text prompt: ("--") (hi-white bg-red bold "[%p]") ``` ## [](#colors)Colors All ANSI colors are supported, with names matching the color name: `black`, `red`, `green`, `yellow`, `blue`, `magenta`, `cyan`, `white`. The `hi-` prefix indicates a high-intensity color: `hi-black`, `hi-red`, for example. The `bg-` prefix modifies the background color: `bg-black`, `bg-hi-red`, for example. ## [](#modifiers)Modifiers Four modifiers are supported: "bold", "faint", "underline", and "invert". ## [](#usage)Usage ```bash rpk profile prompt [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for prompt. | | --validate | - | Exit with an error message if the prompt is invalid. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 569: rpk profile rename-to **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-rename-to.md --- # rpk profile rename-to --- title: rpk profile rename-to latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-rename-to page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-rename-to.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-rename-to.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Rename the current `rpk profile`. ## [](#usage)Usage ```bash rpk profile rename-to [NAME] [flags] ``` ## [](#aliases)Aliases ```bash rename-to, rename ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for rename-to. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 570: rpk profile set-globals **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-set-globals.md --- # rpk profile set-globals --- title: rpk profile set-globals latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-set-globals page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-set-globals.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-set-globals.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Set `rpk` globals fields. This command takes a list of key=value pairs to write to the global config section of `rpk.yaml`. The globals section contains a set of settings that apply to all profiles and changes the way that `rpk` acts. For a list of global flags and what they mean, see [`rpk -X`](../../rpk-x-options/) and look for any key that begins with "globals". This command supports autocompletion of valid keys. You can also use the format `set key value` if you intend to only set one key. ## [](#usage)Usage ```bash rpk profile set-globals [KEY=VALUE]+ [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for set-globals. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 571: rpk profile set **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-set.md --- # rpk profile set --- title: rpk profile set latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-set page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-set.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-set.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Set fields in the current `rpk profile`. As in the create command, this command takes a list of `key=value` pairs to write to the current profile. The key can either be the name of a `-X` flag or the path to the field in the profile’s yaml format. For example, using `--set tls.enabled=true` or `--set kafka_api.tls.enabled=true` is equivalent. The former corresponds to the `-X` flag `tls.enabled`, while the latter corresponds to the path `kafka_api.tls.enabled` in the profile’s yaml. This command supports autocompletion of valid keys, suggesting the `-X` key format. If you begin writing a YAML path, this command will suggest the rest of the path. You can also use the format `set key value` if you intend to only set one key. > ⚠️ **CAUTION** > > Profile files may contain sensitive information such as passwords or SASL credentials. Do not commit `rpk.yaml` files to version control systems like Git. ## [](#usage)Usage ```bash rpk profile set [KEY=VALUE]+ [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for set. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 572: rpk profile use **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile-use.md --- # rpk profile use --- title: rpk profile use latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile-use page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile-use.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile-use.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Select the Profile to use. See [`rpk profile`](../rpk-profile/) for more details. ## [](#usage)Usage ```bash rpk profile use [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for use. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 573: rpk profile **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-profile/rpk-profile.md --- # rpk profile --- title: rpk profile latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-profile/rpk-profile page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-profile/rpk-profile.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-profile/rpk-profile.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage `rpk profiles`. An `rpk profile` talks to a single Redpanda cluster. You can create multiple profiles for multiple clusters and swap between them with `rpk profile use`. Multiple profiles may be useful if, for example, you use `rpk` to talk to a localhost cluster, a dev cluster, and a prod cluster, and you want to keep your configuration in one place. ## [](#usage)Usage ```bash rpk profile [flags] [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for profile. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 574: rpk registry compatibility-level get **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-compatibility-level-get.md --- # rpk registry compatibility-level get --- title: rpk registry compatibility-level get latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-compatibility-level-get page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-compatibility-level-get.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-compatibility-level-get.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Get the global or per-subject compatibility levels. Running this command with no subject returns the global compatibility level. Use the `--global` flag to get the global level at the same time as per-subject levels. ## [](#usage)Usage ```bash rpk registry compatibility-level get [SUBJECT...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --global | - | Return the global level in addition to subject levels. | | -h, --help | - | Help for get. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 575: rpk registry compatibility-level set **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-compatibility-level-set.md --- # rpk registry compatibility-level set --- title: rpk registry compatibility-level set latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-compatibility-level-set page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-compatibility-level-set.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-compatibility-level-set.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Set the global or per-subject compatibility levels. Running this command without a subject sets the global compatibility level. To set the global level at the same time as per-subject levels, use the `--global` flag. ## [](#concept)Concept ### [](#levels)Levels - BACKWARD (default): Consumers using the new schema (for example, version 10) can read data from producers using the previous schema (for example, version 9). - BACKWARD\_TRANSITIVE: Consumers using the new schema (for example, version 10) can read data from producers using all previous schemas (for example, versions 1-9). - FORWARD: Consumers using the previous schema (for example, version 9) can read data from producers using the new schema (for example, version 10). - FORWARD\_TRANSITIVE: Consumers using any previous schema (for example, versions 1-9) can read data from producers using the new schema (for example, version 10). - FULL: A new schema and the previous schema (for example, versions 10 and 9) are both backward and forward compatible with each other. - FULL\_TRANSITIVE: Each schema is both backward and forward compatible with all registered schemas. - NONE: No schema compatibility checks are done. ## [](#usage)Usage ```bash rpk registry compatibility-level set [SUBJECT...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --global | - | Set the global level in addition to subject levels. | | -h, --help | - | Help for set. | | --level | string | Level to set, one of NONE, BACKWARD,BACKWARD_TRANSITIVE, FORWARD,FORWARD_TRANSITIVE, FULL, FULL_TRANSITIVE. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 576: rpk registry compatibility-level **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-compatibility-level.md --- # rpk registry compatibility-level --- title: rpk registry compatibility-level latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-compatibility-level page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-compatibility-level.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-compatibility-level.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage global or per-subject compatibility levels. ## [](#usage)Usage ```bash rpk registry compatibility-level [flags] [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for compatibility-level. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 577: rpk registry mode get **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-mode-get.md --- # rpk registry mode get --- title: rpk registry mode get latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-mode-get page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-mode-get.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-mode-get.adoc page-git-created-date: "2024-08-09" page-git-modified-date: "2025-05-07" --- Check the mode Schema Registry is in. Running this command with no subject returns the global mode for Schema Registry. Alternatively, use the `--global` flag to return the global mode at the same time as per-subject modes. ## [](#usage)Usage ```bash rpk registry mode get [SUBJECT...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --global | - | Return the global mode in addition to subject modes. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for get. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 578: rpk registry mode reset **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-mode-reset.md --- # rpk registry mode reset --- title: rpk registry mode reset latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-mode-reset page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-mode-reset.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-mode-reset.adoc page-git-created-date: "2024-08-09" page-git-modified-date: "2025-05-07" --- Reset the mode Schema Registry runs in. This command deletes any subject modes and reverts to the global default. The command also prints the subject mode before reverting to the global default. ## [](#usage)Usage ```bash rpk registry mode reset [SUBJECT...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for reset. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 579: rpk registry mode set **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-mode-set.md --- # rpk registry mode set --- title: rpk registry mode set latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-mode-set page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-mode-set.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-mode-set.adoc page-git-created-date: "2024-08-09" page-git-modified-date: "2025-05-07" --- Set the mode Schema Registry runs in. Running this command with no subject sets the global mode for Schema Registry. Alternatively, use the `--global` flag to set the global mode for Schema Registry at the same time as per-subject modes. Acceptable mode values: - `READONLY` - `READWRITE` - `IMPORT` You can only enable `IMPORT` mode on an empty schema registry (if setting mode globally) or an empty subject (if setting at the subject level). Empty means no schemas have ever been registered. Soft deletions are not sufficient, so you must hard-delete any existing schemas before enabling `IMPORT` mode. To override this emptiness check, use the `--force` flag. ## [](#usage)Usage ```bash rpk registry mode set [SUBJECT...] [flags] ``` ## [](#examples)Examples Set the global schema registry mode to `READONLY`: ```bash rpk registry mode set --mode READONLY ``` Set the schema registry mode to `READWRITE` in subjects `` and ``: ```bash rpk registry mode set --mode READWRITE ``` Set the schema registry mode to IMPORT, overriding the emptiness check: ```bash rpk registry mode set --mode IMPORT --global --force ``` > 📝 **NOTE** > > Replace the placeholder values with your own values. ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --force | - | Forces the setting mode to IMPORT when there are existing schemas. | | --global | - | Set the global schema registry mode in addition to subject modes. | | -h, --help | - | Help for set. | | --mode | string | Schema registry mode to set. Acceptable values: READONLY, READWRITE, IMPORT (case insensitive). | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 580: rpk registry mode **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-mode.md --- # rpk registry mode --- title: rpk registry mode latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-mode page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-mode.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-mode.adoc page-git-created-date: "2024-08-09" page-git-modified-date: "2025-05-07" --- Manage the mode Schema Registry runs in. Alternatively, you can use the [Schema Registry API](../../../../manage/schema-reg/schema-reg-api/#use-readonly-mode-for-disaster-recovery) to do this. ## [](#usage)Usage ```bash rpk registry mode [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for mode. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 581: rpk registry schema check-compatibility **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-check-compatibility.md --- # rpk registry schema check-compatibility --- title: rpk registry schema check-compatibility latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-check-compatibility page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-check-compatibility.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-check-compatibility.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Check schema compatibility with existing schemas in the subject. ## [](#usage)Usage ```bash rpk registry schema check-compatibility [SUBJECT] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for check-compatibility. | | --references | string | Comma-separated list of references (name:subject:version), or path to reference file. | | --schema | string | Schema file path to check. Must be .avro, .json or .proto. | | --schema-version | string | Schema version to check compatibility with (latest, 0, 1…​). | | --type | string | Schema type (avro, json, protobuf). Overrides schema file extension. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 582: rpk registry schema create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-create.md --- # rpk registry schema create --- title: rpk registry schema create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create a schema for the given subject. This uploads a schema to the registry, creating the schema if it does not exist. The schema type is detected by the filename extension: `.avro` or `.avsc` for Avro, `json` for JSON, and `.proto` for Protobuf. You can manually specify the type with the `--type` flag. You may pass the references using the --reference flag, which accepts either a comma separated list of `::` or a path to a file. The file must contain lines of name, subject, and version separated by a tab or space, or the equivalent in json / yaml format. ## [](#examples)Examples Create a Protobuf schema with subject `foo`: ```bash rpk registry schema create foo --schema path/to/file.proto ``` Create an avro schema, passing the type via flags: ```bash rpk registry schema create foo --schema /path/to/file --type avro ``` Create a Protobuf schema that references the schema in subject `my_subject`, version 1: ```bash rpk registry schema create foo --schema /path/to/file.proto --references my_name:my_subject:1 ``` Create a schema with a specific ID and version in import mode: ```bash rpk registry schema create foo --schema /path/to/file.proto --id 42 --schema-version 3 ``` ## [](#usage)Usage ```bash rpk registry schema create SUBJECT --schema {filename} [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for create. | | --id | int | Optional schema ID to use when creating the schema in import mode (default -1). | | --references | string | Comma-separated list of references (name:subject:version) or path to reference file. | | --schema | string | Schema filepath to upload, must be .avro, .avsc, or .proto. | | --schema-version | int | Optional schema version to use when creating the schema in import mode (requires --id and the default is -1). | | --type | string | Schema type avro or protobuf ; overrides schema file extension. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 583: rpk registry schema delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-delete.md --- # rpk registry schema delete --- title: rpk registry schema delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete a specific schema for the given subject. ## [](#usage)Usage ```bash rpk registry schema delete SUBJECT --schema-version {version} [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --permanent | - | Perform a hard (permanent) delete of the schema. | | --schema-version | string | Schema version to check compatibility with (latest, 0, 1…​). | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 584: rpk registry schema get **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-get.md --- # rpk registry schema get --- title: rpk registry schema get latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-get page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-get.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-get.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Get a schema by version, ID, or by an existing schema. This returns a lookup of an existing schema or schemas in one of the following mutually exclusive ways: - By version, returning a schema for a required subject and version. - By ID, returning all subjects using the schema, or filtered by the provided subject. - By schema, checking if the schema has been created in the subject. To print the schema, use the `--print-schema` flag. ## [](#usage)Usage ```bash rpk registry schema get [SUBJECT] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --deleted | - | If true, also return deleted schemas. | | -h, --help | - | Help for get. | | --id | int | Schema ID to look up usage; subject optional. | | --print-schema | - | Prints the schema in JSON format. | | --schema | string | Schema filepath to upload, must be .avro, .avsc, json, or .proto. | | --schema-version | string | Schema version to check compatibility with (latest, 0, 1…​). | | --type | string | Schema type of the file used to lookup (avro, json, protobuf). Overrides schema file extension. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 585: rpk registry schema list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-list.md --- # rpk registry schema list --- title: rpk registry schema list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List the schemas by subject, or list all schemas. ## [](#usage)Usage ```bash rpk registry schema list [SUBJECT...] [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --deleted | - | If true, list deleted schemas as well. | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 586: rpk registry schema references **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema-references.md --- # rpk registry schema references --- title: rpk registry schema references latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema-references page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema-references.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema-references.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Retrieve a list of schemas that reference the subject. ## [](#usage)Usage ```bash rpk registry schema references SUBJECT --schema-version {version} [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --deleted | - | If true, list deleted schemas as well. | | -h, --help | - | Help for references. | | --schema-version | string | Schema version to check compatibility with (latest, 0, 1…​). | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 587: rpk registry schema **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-schema.md --- # rpk registry schema --- title: rpk registry schema latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-schema page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-schema.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-schema.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage schemas in the Schema Registry. ## [](#usage)Usage ```bash rpk registry schema [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for schema. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 588: rpk registry subject delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-subject-delete.md --- # rpk registry subject delete --- title: rpk registry subject delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-subject-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-subject-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-subject-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Soft or hard delete subjects. ## [](#usage)Usage ```bash rpk registry subject delete [SUBJECT...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --permanent | - | Perform a hard (permanent) delete of the subject. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 589: rpk registry subject list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-subject-list.md --- # rpk registry subject list --- title: rpk registry subject list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-subject-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-subject-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-subject-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Display all subjects. ## [](#usage)Usage ```bash rpk registry subject list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --deleted | - | If true, list deleted subjects as well. | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 590: rpk registry subject **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry-subject.md --- # rpk registry subject --- title: rpk registry subject latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry-subject page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry-subject.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry-subject.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List or delete Schema Registry subjects. ## [](#usage)Usage ```bash rpk registry subject [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for subject. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 591: rpk registry **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-registry/rpk-registry.md --- # rpk registry --- title: rpk registry latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-registry/rpk-registry page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-registry/rpk-registry.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-registry/rpk-registry.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Commands to interact with the Schema Registry. ## [](#usage)Usage ```bash rpk registry [command] [flags] ``` ## [](#aliases)Aliases ```bash registry, sr ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for registry. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 592: rpk security acl create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-acl-create.md --- # rpk security acl create --- title: rpk security acl create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-acl-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-acl-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-acl-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create ACLs. Following the multiplying effect of combining flags, the create command works on a straightforward basis: every ACL combination is a created ACL. As mentioned in the `rpk security acl` help text, if no host is specified, an allowed principal is allowed access from all hosts. The wildcard principal `*` allows all principals. At least one principal, one host, one resource, and one operation is required to create a single ACL. ## [](#examples)Examples Allow all permissions to user bar on topic `foo` and group `g`: ```bash rpk security acl create --allow-principal bar --operation all --topic foo --group g ``` Allow all permissions to role bar on topic `foo` and group `g`: ```bash rpk security acl create --allow-role bar --operation all --topic foo --group g ``` Allow read permissions to all users on topics biz and baz: ```bash rpk security acl create --allow-principal '*' --operation read --topic biz,baz ``` Allow write permissions to user buzz to transactional ID `txn`: ```bash rpk security acl create --allow-principal User:buzz --operation write --transactional-id txn ``` Allow read permissions to user `panda` on topic `bar` and schema registry subject `bar-value`: ```bash rpk security acl create --allow-principal panda --operation read --topic bar --registry-subject bar-value ``` Grant schema migration permissions for migrating schemas between clusters: ```bash # Source cluster (read-only) rpk security acl create --allow-principal User:migrator-user \ --operation read,describe --registry-global --brokers # Target cluster (read-write and IMPORT mode management) rpk security acl create --allow-principal User:migrator-user \ --operation write,describe,alter_configs,describe_configs \ --registry-global --brokers ``` > 📝 **NOTE** > > These are Schema Registry ACLs only. You also require Kafka ACLs for topics, consumer groups, and cluster operations. See [Configure Access Control Lists](../../../../security/authorization/acl/). ## [](#usage)Usage ```bash rpk security acl create [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-host | strings | Hosts from which access will be granted (repeatable). | | --allow-principal | strings | Principals for which these permissions will be granted (repeatable). | | --allow-role | strings | Roles for which these permissions will be granted (repeatable). | | --cluster | - | Whether to grant ACLs to the cluster. | | --deny-host | strings | Hosts from from access will be denied (repeatable). | | --deny-principal | strings | Principal for which these permissions will be denied (repeatable). | | --deny-role | strings | Role for which these permissions will be denied (repeatable). | | --group | strings | Group to grant ACLs for (repeatable). | | -h, --help | - | Help for create. | | --operation | strings | Operation to grant (repeatable). | | --registry-global | - | Whether to grant ACLs for the schema registry. | | --registry-subject | strings | Schema Registry subjects to grant ACLs for (repeatable). | | --resource-pattern-type | string | Pattern to use when matching resource names (literal or prefixed) (default "literal"). | | --topic | strings | Topic to grant ACLs for (repeatable). | | --transactional-id | strings | Transactional IDs to grant ACLs for (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 593: rpk security acl delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-acl-delete.md --- # rpk security acl delete --- title: rpk security acl delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-acl-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-acl-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-acl-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete ACLs. See the `rpk security acl` help text for a full write up on ACLs. Delete flags work in a similar multiplying effect as creating ACLs, but delete is more advanced: deletion works on a filter basis. Any unspecified flag defaults to matching everything (all operations, or all allowed principals, etc). To ensure that you do not accidentally delete more than you intend, this command prints everything that matches your input filters and prompts for a confirmation before the delete request is issued. Anything matching more than 10 ACLs doubly confirms. As mentioned, not specifying flags matches everything. If no resources are specified, all resources are matched. If no operations are specified, all operations are matched. You can also opt in to matching everything with "any": --operation any matches any operation. The --resource-pattern-type, defaulting to "any", configures how to filter resource names: - "any" returns exact name matches of either prefixed or literal pattern type - "match" returns wildcard matches, prefix patterns that match your input, and literal matches - "prefix" returns prefix patterns that match your input (prefix "fo" matches "foo") - "literal" returns exact name matches ## [](#examples)Examples Delete all permissions to user bar on topic `foo` and group `g`: ```bash rpk security acl delete --allow-principal bar --operation all --topic foo --group g ``` In a scenario that 2 ACLs were created for the same role (red-role), 1 that allows access to topic foo, 1 that deny access to topic bar: ```bash rpk security acl create --topic foo --operation all --allow-role red-role rpk security acl create --topic bar --operation all --deny-role red-role ``` It’s possible to delete one of the roles: ```bash rpk security acl delete --topic foo --operation all --allow-role red-role ``` ## [](#usage)Usage ```bash rpk security acl delete [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-host | strings | Allowed host ACLs to remove (repeatable). | | --allow-principal | strings | Allowed principal ACLs to remove (repeatable). | | --allow-role | strings | Allowed role to remove this ACL from (repeatable). | | --cluster | - | Whether to remove ACLs to the cluster. | | --deny-host | strings | Denied host ACLs to remove (repeatable). | | --deny-principal | strings | Denied principal ACLs to remove (repeatable). | | --deny-role | strings | Denied role for ACLs to remove (repeatable). | | -d, --dry | - | Dry run: validate what would be deleted. | | --group | strings | Group to remove ACLs for (repeatable). | | -h, --help | - | Help for delete. | | --no-confirm | - | Disable confirmation prompt. | | --operation | strings | Operation to remove (repeatable). | | -f, --print-filters | - | Print the filters that were requested (failed filters are always printed). | | --registry-global | - | Whether to remove ACLs for the schema registry. | | --registry-subject | strings | Schema Registry subjects to remove ACLs for (repeatable). | | --resource-pattern-type | string | Pattern to use when matching resource names (any, match, literal, or prefixed) (default "any"). | | --topic | strings | Topic to remove ACLs for (repeatable). | | --transactional-id | strings | Transactional IDs to remove ACLs for (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 594: rpk security acl list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-acl-list.md --- # rpk security acl list --- title: rpk security acl list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-acl-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-acl-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-acl-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List ACLs. See the `rpk security acl` help text for a full write up on ACLs. List flags work in a similar multiplying effect as creating ACLs, but list is more advanced: listing works on a filter basis. Any unspecified flag defaults to matching everything (all operations, or all allowed principals, etc). As mentioned, not specifying flags matches everything. If no resources are specified, all resources are matched. If no operations are specified, all operations are matched. You can also opt in to matching everything with "any": --operation any matches any operation. The --resource-pattern-type, defaulting to "any", configures how to filter resource names: - "any" returns exact name matches of either prefixed or literal pattern type - "match" returns wildcard matches, prefix patterns that match your input, and literal matches - "prefix" returns prefix patterns that match your input (prefix "fo" matches "foo") - "literal" returns exact name matches The list command lists ACLs for both Kafka and Schema Registry. To limit the results to a specific subsystem, use the `--subsystem` flag with either `kafka` or `registry`. ## [](#examples)Examples List all ACLs: ```bash rpk security acl list ``` List all Schema Registry ACLs: ```bash rpk security acl list --subsystem registry ``` List all ACLs for topic "foo": ```bash rpk security acl list --topic foo ``` List all ACLs for user "bar" on topic "foo": ```bash rpk security acl list --allow-principal bar --topic foo ``` List all ACLs for role "admin" on schema registry subject "foo-value": ```bash rpk security acl list --allow-role admin --registry-subject foo-value ``` ## [](#usage)Usage ```bash rpk security acl list [flags] ``` ## [](#aliases)Aliases ```bash list, ls, describe ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --allow-host | strings | Allowed host ACLs to match (repeatable). | | --allow-principal | strings | Allowed principal ACLs to match (repeatable). | | --allow-role | strings | Allowed role for ACLs to match (repeatable). | | --cluster | - | Whether to match ACLs to the cluster. | | --deny-host | strings | Denied host ACLs to match (repeatable). | | --deny-principal | strings | Denied principal ACLs to match (repeatable). | | --deny-role | strings | Denied role for ACLs to match (repeatable). | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --group | strings | Group to match ACLs for (repeatable). | | -h, --help | - | Help for list. | | --operation | strings | Operation to match (repeatable). | | -f, --print-filters | - | Print the filters that were requested (failed filters are always printed). | | --registry-global | - | Whether to grant ACLs for the schema registry. | | --registry-subject | strings | Schema Registry subjects to grant ACLs for (repeatable). | | --resource-pattern-type | string | Pattern to use when matching resource names (any, match, literal, or prefixed) (default "any"). | | --subsystem | strings | Subsystem to match ACLs for. Possible values: kafka, registry, kafka,registry (both). Default: kafka,registry. | | --topic | strings | Topic to match ACLs for (repeatable). | | --transactional-id | strings | Transactional IDs to match ACLs for (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 595: rpk security acl **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-acl.md --- # rpk security acl --- title: rpk security acl latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-acl page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-acl.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-acl.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage ACLs and SASL users. These commands let you create SASL users and create, list, and delete ACLs. The help text below is specific to ACLs. To learn about SASL users, see the help text under the `user` command. When using SASL, ACLs allow or deny you access to certain requests. The `create`, `delete`, and `list` commands help you manage your ACLs. An ACL is made up of five components: - a principal (the user) or role - a host, which the principal (or role) is allowed or denied requests from - what resource to access (such as topic name, group ID) - the operation (such as read, write) - the permission (whether to allow or deny the above) ACL commands work on a multiplicative basis. If creating, specifying two principals and two permissions creates four ACLs: both permissions for the first principal, as well as both permissions for the second principal. Adding two resources further doubles the ACLs created. It is recommended to be as specific as possible when granting ACLs. Granting more ACLs than necessary per principal may inadvertently allow clients to do things they should not, such as deleting topics or joining the wrong consumer group. > 💡 **TIP** > > To set multiple principals in a single comma-separated string, you must enclose the string with quotes. Otherwise, `rpk` splits the string on commas and fails to read the option correctly. > > For example, use double quotes: > > ```bash > rpk security acl create --allow-principal="\"C=UK,ST=London,L=London,O=Redpanda,OU=engineering,CN=__schema_registry\"" > ``` > > Alternatively, use single quotes: > > ```bash > rpk security acl create --allow-principal='"C=UK,ST=London,L=London,O=Redpanda,OU=engineering,CN=__schema_registry"' > ``` ## [](#principals)Principals All ACLs require a principal or a role. A principal is composed of a user and a type. Within Redpanda, only the "User" type is supported. Having prefixes for new types ensures that potential future authorizers can add authorization using other types, such as "Group". When you create a user, you need to add ACLs for it before it can be used. You can create/delete/list ACLs for that user with either `User:bar` or `bar` in the `--allow-principal` and `--deny-principal` flags. This command will add the `User:` prefix for you if it is missing. The wildcard `*` matches any user. Creating an ACL with user `*` grants or denies the permission for all users. ## [](#hosts)Hosts Hosts can be seen as an extension of the principal, and effectively gate where the principal can connect from. When creating ACLs, unless otherwise specified, the default host is the wildcard `*` which allows or denies the principal from all hosts (where allow & deny are based on whether `--allow-principal` or `--deny-principal` is used). If specifying hosts, you must pair the `--allow-host` flag with the `--allow-principal` flag, and the `--deny-host` flag with the `--deny-principal` flag. ## [](#roles)Roles You can bind ACLs to a role. A role has only one part: the name. In contrast to principals, there is no need to supply the type. If a type-like prefix is present, it is treated as text rather than as principal type information. When you create a role, you must bind or associate ACLs to it before it can be used. You can create / delete / list ACLs for that role with "" in the `--allow-role` and `--deny-role` flags. Note that the wildcard role name **is not permitted here. For example `rpk security acl create --allow-role '`**`' …​` will produce an error. ## [](#resources)Resources A resource is what an ACL allows or denies access to. There are six resources within Redpanda: topics, groups, the cluster itself, transactional IDs, schema registry, and schema registry subjects. Names for each of these resources can be specified with their respective flags. Resources combine with the operation that is allowed or denied on that resource. The next section describes which operations are required for which requests, and further fleshes out the concept of a resource. By default, resources are specified on an exact name match (a `literal` match). The --resource-pattern-type flag can be used to specify that a resource name is `prefixed`, meaning to allow anything with the given prefix. A literal name of `foo` will match only the topic `foo`, while the prefixed name of `foo-` will match both `foo-bar` and `foo-baz`. The special wildcard resource name `*` matches any name of the given resource type (--topic `*` matches all topics). ## [](#operations)Operations Pairing with resources, operations are the actions that are allowed or denied. Redpanda has the following operations: | Operation | Description | | --- | --- | | all | Allows all operations below. | | read | Allows reading a given resource. | | write | Allows writing to a given resource. | | create | Allows creating a given resource (except for Redpanda Schema Registry). | | delete | Allows deleting a given resource. | | alter | Allows altering non-configurations. | | describe | Allows querying non-configurations. | | describe_configs | Allows describing configurations. | | alter_configs | Allows altering configurations. | You can run `rpk security acl --help-operations` to see which operations are required for which requests. In flag form to set up a general producing/consuming client, you can invoke `rpk security acl create` three times with the following (including your `--allow-principal`): `rpk security acl create --operation write,read,describe --topic [topics]` `rpk security acl create --operation describe,read --group [group.id]` `rpk security acl create --operation describe,write --transactional-id [transactional.id]` ## [](#permissions)Permissions A client can be allowed access or denied access. By default, all permissions are denied. You only need to specifically deny a permission if you allow a wide set of permissions and then want to deny a specific permission in that set. You could allow all operations, and then specifically deny writing to topics. ## [](#management)Management Creating ACLs works on a specific ACL basis, but listing and deleting ACLs works on filters. Filters allow matching many ACLs to be printed listed and deleted at once. Because this can be risky for deleting, the delete command prompts for confirmation by default. More details and examples for creating, listing, and deleting can be seen in each of the commands. Using SASL requires setting `enable_sasl: true` in the redpanda section of your `redpanda.yaml`. User management is a separate, simpler concept that is described in the user command. ## [](#usage)Usage ```bash rpk security acl [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for acl. | | --help-operations | - | Print more help about ACL operations. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 596: rpk security role assign **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-assign.md --- # rpk security role assign --- title: rpk security role assign latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-assign page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-assign.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-assign.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Assign a Redpanda role to a principal. The `--principal` flag accepts principals with the format `:`. If `PrincipalPrefix` is not provided, then defaults to `User:`. ## [](#examples)Examples Assign role `redpanda-admin` to user `red`: ```bash rpk security role assign redpanda-admin --principal red ``` Assign role `redpanda-admin` to users `red` and `panda`: ```bash rpk security role assign redpanda-admin --principal red,panda ``` Assign role `topic-reader` to group `analytics`: ```bash rpk security role assign topic-reader --principal Group:analytics ``` Assign role `ops-admin` to both a user and a group: ```bash rpk security role assign ops-admin --principal alice,Group:sre ``` ## [](#usage)Usage ```bash rpk security role assign [ROLE] --principal [PRINCIPALS...] [flags] ``` ## [](#aliases)Aliases ```bash assign, add ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for assign. | | --principal | strings | Principal to assign the role to (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 597: rpk security role create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-create.md --- # rpk security role create --- title: rpk security role create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Create a role in Redpanda. After creating a role you may bind ACLs to the role using the `--allow-role` flag in the `rpk security acl create` command. ## [](#usage)Usage ```bash rpk security role create [ROLE] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for create. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings; -X help for detail or -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 598: rpk security role delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-delete.md --- # rpk security role delete --- title: rpk security role delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Delete a role in Redpanda. This action will remove all associated ACLs from the role and unassign members. The flag `--no-confirm` can be used to avoid the confirmation prompt. ## [](#usage)Usage ```bash rpk security role delete [ROLE] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | --no-confirm | - | Disable confirmation prompt. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 599: rpk security role describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-describe.md --- # rpk security role describe --- title: rpk security role describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-describe.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Describe a Redpanda role. This command describes a role, including the ACLs associated to the role, and lists members who are assigned the role. ## [](#examples)Examples Describe the role `red` (print members and ACLs): ```bash rpk security role describe red ``` Print only the members of role `red`: ```bash rpk security role describe red --print-members ``` Print only the ACL associated to the role `red`: ```bash rpk security role describe red --print-permissions ``` ## [](#usage)Usage ```bash rpk security role describe [ROLE] [flags] ``` ## [](#aliases)Aliases ```bash describe, info ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for describe. | | -a, --print-all | - | Print all sections. | | -m, --print-members | - | Print the members section. | | -p, --print-permissions | - | Print the role permissions section. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 600: rpk security role list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-list.md --- # rpk security role list --- title: rpk security role list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- List roles created in Redpanda. ## [](#examples)Examples List all roles in Redpanda: ```bash rpk security role list ``` List all roles assigned to the user `red`: ```bash rpk security role list --principal red ``` List all roles with the prefix `agent-`: ```bash rpk security role list --prefix "agent-" ``` List all roles assigned to the group `analytics`: ```bash rpk security role list --principal Group:analytics ``` ## [](#usage)Usage ```bash rpk security role list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --prefix | string | Return the roles matching the specified prefix. | | --principal | string | Return the roles matching the specified principal; if no principal prefix is given, User: is used. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 601: rpk security role unassign **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role-unassign.md --- # rpk security role unassign --- title: rpk security role unassign latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role-unassign page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role-unassign.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role-unassign.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Unassign a Redpanda role from a principal. The `--principal` flag accepts principals with the format `:`. Command defaults to `User:` if `PrincipalPrefix` is not provided. ## [](#examples)Examples Unassign role `redpanda-admin` from user `red`: ```bash rpk security role unassign redpanda-admin --principal red ``` Unassign role `redpanda-admin` from users `red` and `panda`: ```bash rpk security role unassign redpanda-admin --principal red,panda ``` Unassign role `topic-reader` from group `contractors`: ```bash rpk security role unassign topic-reader --principal Group:contractors ``` ## [](#usage)Usage ```bash rpk security role unassign [ROLE] --principal [PRINCIPALS...] [flags] ``` ## [](#aliases)Aliases ```bash unassign, remove ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for unassign. | | --principal | strings | Principal to unassign the role from (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 602: rpk security role **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-role.md --- # rpk security role --- title: rpk security role latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-role page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-role.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-role.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2026-01-20" --- Manage Redpanda roles. ## [](#usage)Usage ```bash rpk security role [command] [flags] ``` ## [](#aliases)Aliases ```bash role, access, roles ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for role. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 603: rpk security secret create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-secret-create.md --- # rpk security secret create --- title: rpk security secret create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-secret-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-secret-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-secret-create.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Create a new secret for your cluster. Scopes define the areas where the secret can be used. Available scopes are: - `redpanda_connect` - `redpanda_cluster` You can set one or both scopes on a secret. ## [](#usage)Usage ```bash rpk security secret create [flags] ``` ## [](#examples)Examples To create a secret and set its scope to `redpanda_connect`: ```bash rpk security secret create --name NETT --value value --scopes redpanda_connect ``` To set the scope to both `redpanda_connect` and `redpanda_cluster`: ```bash rpk security secret create --name NETT2 --value value --scopes redpanda_connect,redpanda_cluster ``` You can also pass the scopes as a string: ```bash rpk security secret create --name NETT2 --value value --scopes "redpanda_connect,redpanda_cluster" ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for create. | | --name | string | Name of the secret (required). Must be in uppercase and can only contain letters, digits, and underscores. | | --scopes | stringArray | Scope(s) of the secret, for example, redpanda_connect (required). | | --value | string | Value of the secret (required). | | --config | string | Redpanda or rpk config file. Default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or run rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 604: rpk security secret delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-secret-delete.md --- # rpk security secret delete --- title: rpk security secret delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-secret-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-secret-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-secret-delete.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Delete an existing secret from your cluster. Deleting a secret is irreversible. Ensure you have backups or no longer need the secret before proceeding. ## [](#usage)Usage ```bash rpk security secret delete [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --name | string | Name of the secret to delete (required). | | --config | string | Redpanda or rpk config file. Default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or run rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 605: rpk security secret list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-secret-list.md --- # rpk security secret list --- title: rpk security secret list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-secret-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-secret-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-secret-list.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- List all secrets in your cluster. ## [](#usage)Usage ```bash rpk security secret list [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --name-contains | string | Filter secrets whose names contain the specified substring. | | --config | string | Redpanda or rpk config file. Default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or run rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 606: rpk security secret update **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-secret-update.md --- # rpk security secret update --- title: rpk security secret update latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-secret-update page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-secret-update.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-secret-update.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Update an existing secret for your cluster. Scopes define the areas where the secret can be used. Available scopes are: - `redpanda_connect` - `redpanda_cluster` You can set one or both scopes on a secret. Updating a secret’s scopes will overwrite its current scopes. ## [](#usage)Usage ```bash rpk security secret update [flags] ``` ## [](#examples)Examples To update the value of the secret: ```bash rpk security secret update --name NETT --value new_value ``` To update the scope of a secret to both `redpanda_connect` and `redpanda_cluster`: ```bash rpk security secret update --name NETT2 --value value --scopes redpanda_connect,redpanda_cluster ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for update. | | --name | string | Name of the secret. The name must be in uppercase and can only contain letters, digits, and underscores. You cannot update the name of an existing secret. | | --scopes | stringArray | Scope(s) of the secret (for example, redpanda_connect). | | --value | string | New value of the secret. | | --config | string | Redpanda or rpk config file. Default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or run rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 607: rpk security secret **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-secret.md --- # rpk security secret --- title: rpk security secret latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-secret page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-secret.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-secret.adoc page-git-created-date: "2025-05-09" page-git-modified-date: "2025-05-09" --- Manage secrets for your cluster. ## [](#usage)Usage ```bash rpk security secret [flags] rpk security secret [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for secret. | | --config | string | Redpanda or rpk config file. Default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or run rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 608: rpk security user create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-user-create.md --- # rpk security user create --- title: rpk security user create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-user-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-user-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-user-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create a SASL user. This command creates a single SASL user with the given password, optionally with a custom mechanism. SASL consists of three parts: a username, a password, and a mechanism. The mechanism determines which authentication flow the client will use for this user/pass. Redpanda currently supports two mechanisms: SCRAM-SHA-256, the default, and SCRAM-SHA-512, which is the same flow but uses sha512 rather than sha256. Using SASL requires setting `enable_sasl: true` in the redpanda section of your `redpanda.yaml`. Before a created SASL account can be used, you must also create ACLs to grant the account access to certain resources in your cluster. See the acl help text for more info. ## [](#usage)Usage ```bash rpk security user create [USER] -p [PASS] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for create. | | --mechanism | string | SASL mechanism to use for the user you are creating (scram-sha-256, scram-sha-512, case insensitive) (default: scram-sha-256). | | --password | string | New user’s password (NOTE: if using --password for the admin API, use --new-password). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 609: rpk security user delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-user-delete.md --- # rpk security user delete --- title: rpk security user delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-user-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-user-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-user-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete a SASL user. This command deletes the specified SASL account from Redpanda. This does not delete any ACLs that may exist for this user. ## [](#usage)Usage ```bash rpk security user delete [USER] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 610: rpk security user list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-user-list.md --- # rpk security user list --- title: rpk security user list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-user-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-user-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-user-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List SASL users. ## [](#usage)Usage ```bash rpk security user list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 611: rpk security user update **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-user-update.md --- # rpk security user update --- title: rpk security user update latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-user-update page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-user-update.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-user-update.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Update SASL user credentials > ⚠️ **CAUTION** > > The default value for the `--mechanism` flag is `SCRAM-SHA-256`. Set the flag when using a different mechanism to avoid unexpected changes. ## [](#usage)Usage ```bash rpk security user update [USER] --new-password [PW] --mechanism [MECHANISM] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for update. | | --mechanism | string | SASL mechanism to use for the user you are updating. Case insensitive. Acceptable values: SCRAM-SHA-256, SCRAM-SHA-512. | | --new-password | string | New user’s password. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 612: rpk security user **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security-user.md --- # rpk security user --- title: rpk security user latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security-user page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security-user.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security-user.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Manage SCRAM users. If SCRAM is enabled, a SCRAM user is what you use to talk to Redpanda, and ACLs control what your user has access to. See `rpk security acl --help` for more information about ACLs, and `rpk security user create --help` for more information about creating SCRAM users. Using SCRAM requires setting `kafka_enable_authorization: true` and `authentication_method: sasl` in the redpanda section of your `redpanda.yaml`, and setting `sasl_mechanisms` with `SCRAM` for your Redpanda cluster. ## [](#usage)Usage ```bash rpk security user [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for user. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 613: rpk security **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-security/rpk-security.md --- # rpk security --- title: rpk security latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-security/rpk-security page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-security/rpk-security.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-security/rpk-security.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- ## [](#usage)Usage ```bash rpk security [command] [flags] ``` ## [](#aliases)Aliases ```bash security, sec ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for security. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 614: rpk shadow config generate **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-config-generate.md --- # rpk shadow config generate --- title: rpk shadow config generate latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-config-generate page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-config-generate.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-config-generate.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Generate a configuration file for creating a shadow link. By default, this command creates a sample configuration file with placeholder values that you customize for your environment. Use the `--for-cloud` flag when generating your configuration. Use the `--print-template` flag to generate a configuration template with detailed field documentations. By default, this command prints the configuration to standard output. Use the `--output` flag to save the configuration to a file. After you generate the configuration file, update the placeholder values with your actual connection details and settings. Then use [`rpk shadow create`](../rpk-shadow-create/) to create the shadow link. ## [](#usage)Usage ```bash rpk shadow config generate [flags] ``` ## [](#examples)Examples Generate a sample configuration and print it to standard output: ```bash rpk shadow config generate ``` Generate a configuration template with all the field documentation: ```bash rpk shadow config generate --print-template ``` Save the sample configuration to a file: ```bash rpk shadow config generate -o shadow-link.yaml ``` Save the template with documentation to a file: ```bash rpk shadow config generate --print-template -o shadow-link.yaml ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --for-cloud | - | Generate configuration suitable for Cloud deployments. | | -o, --output | string | File path identifying where to save the generated configuration file. If not specified, prints to standard output. | | --print-template | - | Generate a configuration template with field documentation instead of a sample configuration. | | -h, --help | - | Help for generate. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 615: rpk shadow create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-create.md --- # rpk shadow create --- title: rpk shadow create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-create.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Creates a Redpanda shadow link. This command creates a shadow link using a configuration file that defines the connection details and synchronization settings. Before you create a shadow link, generate a configuration file with [`rpk shadow config generate`](../rpk-shadow-config-generate/) and update it with your source cluster details. The command prompts you to confirm the creation. Use the `--no-confirm` flag to skip the confirmation prompt. When creating a shadow link in Redpanda Cloud, use the `--for-cloud` flag. First log in and select the cluster where you want to create the shadow link before running this command. See [`rpk cloud login`](../../rpk-cloud/rpk-cloud-login/) and [`rpk cloud cluster select`](../../rpk-cloud/rpk-cloud-cluster-select/). For SCRAM authentication, store your password in the shadow cluster’s secrets store (using either the cluster’s secret store or [`rpk security secret`](../../rpk-security/rpk-security-secret/)), then reference it in your configuration file using `${secrets.SECRET_NAME}` syntax. After you create the shadow link, use [`rpk shadow status`](../rpk-shadow-status/) to monitor the replication progress. ## [](#usage)Usage ```bash rpk shadow create [flags] ``` ## [](#examples)Examples Create a shadow link using a configuration file: ```bash rpk shadow create --config-file shadow-link.yaml ``` Create a shadow link without a confirmation prompt: ```bash rpk shadow create -c shadow-link.yaml --no-confirm ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -c, --config-file | string | Path to configuration file to use for the shadow link; use --help for details. | | --no-confirm | - | Disable confirmation prompt. | | -h, --help | - | Help for create. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 616: rpk shadow delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-delete.md --- # rpk shadow delete --- title: rpk shadow delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-delete.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Delete a Redpanda shadow link. This command deletes a shadow link by name. By default, you cannot delete a shadow link that has active shadow topics. Use [`rpk shadow failover`](../rpk-shadow-failover/) first to deactivate topics before deletion, or use the `--force` flag to delete the shadow link and failover all its active shadow topics. The command prompts you to confirm the deletion. Use the `--no-confirm` flag to skip the confirmation prompt. The `--force` flag automatically disables the confirmation prompt. > ⚠️ **WARNING** > > Deleting a shadow link with `--force` permanently removes all shadow topics and stops replication. This operation cannot be undone. ## [](#usage)Usage ```bash rpk shadow delete [LINK_NAME] [flags] ``` ## [](#examples)Examples Delete a shadow link: ```bash rpk shadow delete my-shadow-link ``` Delete a shadow link without confirmation: ```bash rpk shadow delete my-shadow-link --no-confirm ``` Force delete a shadow link with active shadow topics: ```bash rpk shadow delete my-shadow-link --force ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --force | - | If set, forces a delete while there are active shadow topics; disables confirmation prompts as well. | | --no-confirm | - | Disable confirmation prompt. | | -h, --help | - | Help for delete. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 617: rpk shadow describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-describe.md --- # rpk shadow describe --- title: rpk shadow describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-describe.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Describes a Redpanda shadow link. This command shows the shadow link configuration, including connection settings, synchronization options, and filters. Use the flags to display specific sections or all sections of the configuration. By default, the command displays the overview and client configuration sections. Use the flags to display additional sections such as topic synchronization, consumer offset synchronization, and security synchronization settings. The command uses the Redpanda ID of the cluster you are currently logged into. To use a different cluster, either log in and create a profile for it, or use the `--redpanda-id` flag to specify it directly. ## [](#usage)Usage ```bash rpk shadow describe [LINK_NAME] [flags] ``` ## [](#examples)Examples Describe a shadow link with default sections (overview and client): ```bash rpk shadow describe my-shadow-link ``` Display all configuration sections: ```bash rpk shadow describe my-shadow-link --print-all ``` Display specific sections: ```bash rpk shadow describe my-shadow-link --print-overview --print-topic ``` Display only the client configuration: ```bash rpk shadow describe my-shadow-link -c ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -a, --print-all | - | Print all sections. | | -c, --print-client | - | Print the client configuration section. | | -r, --print-consumer | - | Print the detailed consumer offset configuration section. | | -o, --print-overview | - | Print the overview section. | | -s, --print-security | - | Print the detailed security configuration section. | | -t, --print-topic | - | Print the detailed topic configuration section. | | -h, --help | - | Help for describe. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 618: rpk shadow failover **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-failover.md --- # rpk shadow failover --- title: rpk shadow failover latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-failover page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-failover.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-failover.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Failover a Redpanda shadow link. Failover converts shadow topics into regular topics on the shadow cluster, allowing producers and consumers to interact with them directly. After failover, the shadow link stops replicating data from the source cluster. Use the `--all` flag to failover all shadow topics associated with the shadow link, or use the `--topic` flag to failover a specific topic. You must specify either `--all` or `--topic`. The command prompts you to confirm the failover operation. Use the `--no-confirm` flag to skip the confirmation prompt. > ⚠️ **WARNING** > > Failover is a critical operation. After failover, shadow topics become regular topics and replication stops. Ensure your applications are ready to connect to the shadow cluster before performing a failover. ## [](#usage)Usage ```bash rpk shadow failover [LINK_NAME] [flags] ``` ## [](#examples)Examples Failover all topics for a shadow link: ```bash rpk shadow failover my-shadow-link --all ``` Failover a specific topic: ```bash rpk shadow failover my-shadow-link --topic my-topic ``` Failover without confirmation: ```bash rpk shadow failover my-shadow-link --all --no-confirm ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --all | - | Failover all shadow topics associated with the shadow link. | | --no-confirm | - | Disable confirmation prompt. | | --topic | string | Specific topic to failover. If --all is not set, at least one topic must be provided. | | -h, --help | - | Help for failover. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 619: rpk shadow list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-list.md --- # rpk shadow list --- title: rpk shadow list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-list.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Lists Redpanda shadow links. This command lists all shadow links on the shadow cluster, showing their names, unique identifiers, and current states. Use this command to get an overview of all configured shadow links and their operational status. ## [](#usage)Usage ```bash rpk shadow list [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 620: rpk shadow status **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-status.md --- # rpk shadow status --- title: rpk shadow status latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-status page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-status.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-status.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Shows the status of a Redpanda shadow link. This command shows the current status of a shadow link, including the overall state, task statuses, and per-topic replication progress. Use this command to monitor replication health and track how closely shadow topics follow the source cluster. By default, the command displays all status sections. Use the `--print-*` flags to select specific sections (overview, task status, or topic status). The `--format json|yaml` flag changes only the output format, not which sections are included. ## [](#usage)Usage ```bash rpk shadow status [LINK_NAME] [flags] ``` ## [](#examples)Examples Display the status of a shadow link: ```bash rpk shadow status my-shadow-link ``` Display specific sections: ```bash rpk shadow status my-shadow-link --print-overview --print-topic ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -a, --print-all | - | Print all sections. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -o, --print-overview | - | Print the overview section. | | -k, --print-task | - | Print the task status section. | | -t, --print-topic | - | Print the detailed topic status section. | | -h, --help | - | Help for status. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 621: rpk shadow update **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow-update.md --- # rpk shadow update --- title: rpk shadow update latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow-update page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow-update.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow-update.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Updates a shadow link. This command opens your default editor with the current shadow link configuration, and allows you to update the fields you want to change, save the file, and close the editor. The command applies only the changed fields to the shadow link. You cannot change the shadow link name. If you need to rename a shadow link, delete it and create a new one with the desired name. The editor respects your EDITOR environment variable. If EDITOR is not set, the command uses 'vi' on Unix-like systems. ## [](#usage)Usage ```bash rpk shadow update [LINK_NAME] [flags] ``` ## [](#examples)Examples Update a shadow link configuration: ```bash rpk shadow update my-shadow-link ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for update. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 622: rpk shadow **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-shadow/rpk-shadow.md --- # rpk shadow --- title: rpk shadow latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-shadow/rpk-shadow page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-shadow/rpk-shadow.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-shadow/rpk-shadow.adoc page-git-created-date: "2025-12-12" page-git-modified-date: "2025-12-12" --- Manage Redpanda shadow links. Shadowing is Redpanda’s enterprise-grade disaster recovery solution that establishes asynchronous, offset-preserving replication between two distinct Redpanda clusters. A cluster is able to create a dedicated client that continuously replicates source cluster data, including offsets, timestamps, and cluster metadata. ## [](#usage)Usage ```bash rpk shadow [command] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for shadow. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 623: rpk topic add-partitions **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-add-partitions.md --- # rpk topic add-partitions --- title: rpk topic add-partitions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-add-partitions page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-add-partitions.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-add-partitions.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Add partitions to existing topics. > 📝 **NOTE** > > Existing topic data is not redistributed to the newly-added partitions. ## [](#usage)Usage ```bash rpk topic add-partitions [TOPICS...] --num [#] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --force | - | Force change the partition count in internal topics. For example, the internal topic __consumer_offsets. | | -h, --help | - | Help for add-partitions. | | -n, --num | int | Number of partitions to add to each topic. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 624: rpk topic alter-config **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-alter-config.md --- # rpk topic alter-config --- title: rpk topic alter-config latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-alter-config page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-alter-config.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-alter-config.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Set, delete, add, and remove key/value configs for a topic. This command allows you to incrementally alter the configuration for multiple topics at a time. Incremental altering supports four operations: 1. Setting a key=value pair 2. Deleting a key’s value 3. Appending a new value to a list-of-values key 4. Subtracting (removing) an existing value from a list-of-values key The `--dry` option will validate whether the requested configuration change is valid, but does not apply it. ## [](#usage)Usage ```bash rpk topic alter-config [TOPICS...] --set key=value --delete key2,key3 [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --append | stringArray | key=value; Value to append to a list-of-values key (repeatable). | | -d, --delete | stringArray | Key to delete (repeatable). | | --dry | - | Dry run: validate the alter request, but do not apply. | | -h, --help | - | Help for alter-config. | | -s, --set | stringArray | key=value; Pair to set (repeatable). | | --subtract | stringArray | key=value; Value to remove from list-of-values key (repeatable). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 625: rpk topic consume **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-consume.md --- # rpk topic consume --- title: rpk topic consume latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-consume page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-consume.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-consume.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Consume records from topics. Consuming records reads from any amount of input topics, formats each record according to `--format`, and prints them to `STDOUT`. The output formatter understands a wide variety of formats. The default output format `--format json` is a special format that outputs each record as JSON. ## [](#formatting)Formatting Formatting is based on percent escapes and modifiers. Slashes can be used for common escapes: | Escape | Description | | --- | --- | | \t | Tabs | | \n | Newlines | | \r | Carriage returns | | \\ | Slashes | | \xNN | Hex encoded characters | The percent encodings are represented like this: | Percent encoding | Description | | --- | --- | | %t | Topic | | %T | Topic length | | %k | Key | | %K | Key length | | %v | Value | | %V | Value length | | %h | Begin the header specification | | %H | Number of headers | | %p | Partition | | %o | Offset | | %e | Leader epoch | | %d | Timestamp (formatting described below) | | %x | Producer ID | | %y | Producer epoch | | %[ | Partition log start offset | | %| | Partition last stable offset | | %] | Partition high watermark | | %% | Record attributes (formatting described below) | | %a | Percent sign | | %{ | Left brace | | %} | Right brace | | %i | Number of records formatted | ### [](#modifiers)Modifiers Text and numbers can be formatted in many different ways, and the default format can be changed within brace modifiers. `%v` prints a value, while `%v{hex}` prints the value hex encoded. `%T` prints the length of a topic in ASCII, while `%T{big8}` prints the length of the topic as an eight byte big endian. All modifiers go within braces following a percent-escape. ### [](#numbers)Numbers Formatting number values can have the following modifiers: | Format | Description | | --- | --- | | ascii | Print the number as ASCII (default) | | hex64 | Sixteen hex characters | | hex32 | Eight hex characters | | hex16 | Four hex characters | | hex8 | Two hex characters | | hex4 | One hex character | | big64 | Eight byte big endian number | | big32 | Four byte big endian number | | big16 | Two byte big endian number | | big8 | Alias for byte | | little64 | Eight byte little endian number | | little32 | Four byte little endian number | | little16 | Two byte little endian number | | little8 | Alias for byte | | byte | One byte number | | bool | true if the number is non-zero, false if the number is zero | All numbers are truncated as necessary per the modifier. Printing `%V{byte}` for a length 256 value prints a single null, whereas printing `%V{big8}` prints the bytes 1 and 0. When writing number sizes, the size corresponds to the size of the raw values, not the size of encoded values. `%T% t{hex}` for the topic `foo` prints `3 666f6f`, not `6 666f6f`. ### [](#timestamps)Timestamps By default, the timestamp field is printed as a millisecond number value. In addition to the number modifiers above, timestamps can be printed with either `Go` formatting: ```go %d{go[2006-01-02T15:04:05Z07:00]} ``` Or `strftime` formatting: ```go %d{strftime[%F]} ``` An arbitrary amount of brackets (or braces, or # symbols) can wrap your date formatting: ```go %d{strftime=== [%F] ===} ``` This prints `[YYYY-MM-DD]`, while the surrounding three # on each side are used to wrap the formatting. For more information on Go time formatting, see the [Go documentation](https://pkg.go.dev/time). For more information on `strftime` formatting, run `man strftime`. ## [](#attributes)Attributes Each record (or batch of records) has a set of possible attributes. Internally, these are packed into bit flags. Printing an attribute requires first selecting which attribute you want to print, and then optionally specifying how you want it to be printed: ```bash %a{compression} %a{compression;number} %a{compression;big64} %a{compression;hex8} ``` Compression is by default printed as text (`none`, `gzip`, …​). Compression can be printed as a number with `;number`, where number is any number formatting option described above. No compression is `0`, gzip is `1`, etc. ```bash %a{timestamp-type} %a{timestamp-type;big64} ``` The record’s timestamp type prints as: - `-1` for very old records (before timestamps existed) - `0` for client-generated timestamps - `1` for broker-generated timestamps > 📝 **NOTE** > > Number formatting can be controlled with `;number`. ```bash %a{transactional-bit} %a{transactional-bit;bool} ``` Prints `1` if the record is a part of a transaction or `0` if it is not. ```bash %a{control-bit} %a{control-bit;bool} ``` Prints `1` if the record is a commit marker or `0` if it is not. ## [](#text)Text Text fields without modifiers default to writing the raw bytes. Alternatively, there are the following modifiers: | Modifier | Description | | --- | --- | | %t{hex} | Hex encoding | | %k{base64} | Base64 standard encoding | | %k{base64raw} | Base64 encoding raw | | %v{unpack[iIqQc.$]} | The unpack modifier has a further internal specification, similar to timestamps above. | Unpacking text can allow translating binary input into readable output. If a value is a big-endian uint32, `%v` prints the raw four bytes, while `%v{unpack[>I]}` prints the number in as ASCII. If unpacking exhausts the input before something is unpacked fully, an error message is appended to the output. ## [](#headers)Headers Headers are formatted with percent encoding inside of the modifier: ```none %h{%k=%v{hex}} ``` This prints all headers with a space before the key and after the value, an equals sign between the key and value, and with the value hex encoded. Header formatting actually just parses the internal format as a record format, so all of the above rules about `%K`, `%V`, text, and numbers apply. ## [](#values)Values Values for consumed records can be omitted by using the `--meta-only` flag. Tombstone records (records with a `null` value) have their value omitted from the JSON output by default. All other records, including those with an empty-string value (`""`), will have their values printed. ## [](#offsets)Offsets The `--offset` flag allows for specifying where to begin consuming, and optionally, where to stop consuming. The literal words `start` and `end` specify consuming from the start and the end. | Offset | Description | | --- | --- | | start | Consume from the beginning | | end | Consume from the end | | :end | Consume until the current end | | +oo | Consume oo after the current start offset | | -oo | Consume oo before the current end offset | | oo | Consume after an exact offset | | oo: | Alias for oo | | :oo | Consume until an exact offset | | o1:o2 | Consume from exact offset o1 until exact offset o2 | | @t | Consume starting from a given timestamp | | @t: | alias for @t | | @:t | Consume until a given timestamp | | @t1:t2 | Consume from timestamp t1 until timestamp t2 | Each timestamp option is evaluated until one succeeds. | Timestamp | Description | | --- | --- | | 13 digits | Parsed as a unix millisecond | | 9 digits | Parsed as a unix second | | YYYY-MM-DD | Parsed as a day, UTC | | YYYY-MM-DDTHH:MM:SSZ | Parsed as RFC3339, UTC; fractional seconds optional (.MMM) | | -dur | Duration; from now (as t1) or from t1 (as t2) | | dur | For t2 in @t1:t2, relative duration from t1 | | end | For t2 in @t1:t2, the current end of the partition | Durations are parsed simply: ```none 3ms three milliseconds 10s ten seconds 9m nine minutes 1h one hour 1m3ms one minute and three milliseconds ``` For example: ```none -o @2022-02-14:1h consume 1h of time on Valentine's Day 2022 -o @-48h:-24h consume from 2 days ago to 1 day ago -o @-1m:end consume from 1m ago until now -o @:-1hr consume from the start until an hour ago ``` ## [](#examples)Examples A key and value, separated by a space and ending in newline: ```none -f '%k %v\n' ``` A key length as four big endian bytes and the key as hex: ```none -f '%K{big32}%k{hex}' ``` A little endian uint32 and a string unpacked from a value: ```none -f '%v{unpack[is$]}' ``` ## [](#usage)Usage ```bash rpk topic consume TOPICS... [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -b, --balancer | string | Group balancer to use if group consuming (range, roundrobin, sticky, cooperative-sticky) (default "cooperative-sticky"). | | --fetch-max-bytes | int32 | Maximum amount of bytes per fetch request per broker (default 1048576). | | --fetch-max-wait | duration | Maximum amount of time to wait when fetching from a broker before the broker replies (default 5s). | | -f, --format | string | Output format (see --help for details) (default "json"). | | -g, --group | string | Group to use for consuming (incompatible with -p). | | -h, --help | - | Help for consume. | | --meta-only | - | Print all record info except the record value (for -f json). | | -n, --num | int | Quit after consuming this number of records (0 is unbounded). | | -o, --offset | string | Offset to consume from / to (start, end, 47, +2, -3) (default "start"). | | -p, --partitions | int32 | int32Slice Comma delimited list of specific partitions to consume (default []). | | --pretty-print | - | Pretty print each record over multiple lines (for -f json) (default true). | | --print-control-records | - | Opt in to printing control records. | | --rack | string | Rack to use for consuming, which opts into follower fetching. | | --read-committed | - | Opt in to reading only committed offsets. | | -r, --regex | - | Parse topics as regex; consume any topic that matches any expression. | | --use-schema-registry | strings | [=key,value] If present, rpk will decode the key and the value with the schema registry. Also accepts use-schema-registry=key or use-schema-registry=value. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 626: rpk topic create **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-create.md --- # rpk topic create --- title: rpk topic create latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-create page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-create.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-create.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create topics. Topics created with this command will have the same number of partitions, replication factor, and key/value configs. ## [](#usage)Usage ```bash rpk topic create [TOPICS...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -d, --dry | - | Dry run: validate the topic creation request; do not create topics. | | -h, --help | - | Help for create. | | --if-not-exists | - | Only create the topic if it does not already exist. | | -p, --partitions | int32 | Number of partitions to create per topic; -1 defaults to the cluster property default_topic_partitions (default -1). | | -r, --replicas | int16 | Replication factor (must be odd); -1 defaults to the cluster’s default_topic_replications (default -1). In Redpanda Cloud, the replication factor is set to 3. | | -c, --topic-config | string (repeatable) | Topic properties can be set by using =. For example -c cleanup.policy=compact. This flag is repeatable, so you can set multiple parameters in a single command. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | > ❗ **IMPORTANT** > > Starting in Redpanda v25.3, several topic properties support enhanced tristate behavior. Properties like `retention.ms`, `retention.bytes`, `segment.ms`, and others now distinguish between zero values (immediate eligibility for cleanup/compaction) and negative values (disable the feature entirely). Previously, zero and negative values were treated the same way. Review your topic configurations if you currently use zero values for these properties. ## [](#examples)Examples ### [](#create-a-topic)Create a topic Create a topic named `my-topic`: ```bash rpk topic create my-topic ``` Output: ```bash TOPIC STATUS my-topic OK ``` ### [](#create-multiple-topics)Create multiple topics Create two topics (`my-topic-1`, `my-topic-2`) at the same time with one command: ```bash rpk topic create my-topic-1 my-topic-2 ``` Output: ```bash TOPIC STATUS my-topic-1 OK my-topic-2 OK ``` ### [](#set-a-topic-property)Set a topic property Create topic `my-topic-3` with the topic property `cleanup.policy=compact`: ```bash rpk topic create my-topic-3 -c cleanup.policy=compact ``` Output: ```bash TOPIC STATUS my-topic-3 OK ``` ### [](#create-topic-with-multiple-partitions)Create topic with multiple partitions Create topic `my-topic-4` with 20 partitions: ```bash rpk topic create my-topic-4 -p 20 ``` Output: ```bash TOPIC STATUS my-topic-4 OK ``` ### [](#create-topic-with-multiple-replicas)Create topic with multiple replicas > ❗ **IMPORTANT** > > The replication factor must be a positive, odd number (such as 3), and it must be equal to or less than the number of available brokers. Create topic `my-topic-5` with 3 replicas: ```bash rpk topic create my-topic-5 -r 3 ``` Output: ```bash TOPIC STATUS my-topic-5 OK ``` ### [](#combine-flags)Combine flags You can combine flags in any way you want. This example creates two topics, `topic-1` and `topic-2`, each with 20 partitions, 3 replicas, and the cleanup policy set to compact: ```bash rpk topic create -c cleanup.policy=compact -r 3 -p 20 topic-1 topic-2 ``` Output: ```bash TOPIC STATUS topic-1 OK topic-2 OK ``` --- # Page 627: rpk topic delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-delete.md --- # rpk topic delete --- title: rpk topic delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete topics. This command deletes all requested topics, printing the success or fail status per topic. The `--regex` or `-r` flag opts into parsing the input topics as regular expressions and deleting any non-internal topic that matches any of expressions. The input expressions are wrapped with `^` and `$` so that the expression must match the whole topic name (which also prevents accidental delete-everything mistakes). The topic list command accepts the same input regex format as this delete command. If you want to check what your regular expressions will delete before actually deleting them, you can check the output of `rpk topic list -r`. ## [](#examples)Examples Deletes topics foo and bar: ```bash rpk topic delete foo bar ``` Deletes any topic starting with `f` and any topics ending in `r`: ```bash rpk topic delete -r '^f.*' '.*r$' ``` Deletes all topics: ```bash rpk topic delete -r '.*' ``` Deletes any one-character topics: ## [](#usage)Usage ```bash rpk topic delete [TOPICS...] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | -r, --regex | - | Parse topics as regex; delete any topic that matches any input topic expression. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 628: rpk topic describe **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-describe.md --- # rpk topic describe --- title: rpk topic describe latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-describe page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-describe.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-describe.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- This command prints detailed information about topics. There are three potential views: a summary of the topic, the topic configurations, and a detailed partitions section. By default, the summary and configs sections are printed. Using the `--format` flag with either JSON or YAML prints all the topic information. The `--regex` flag (`-r`) parses arguments as regular expressions and describes topics that match any of the expressions. ## [](#usage)Usage ```bash rpk topic describe [TOPICS] [flags] ``` ## [](#aliases)Aliases ```bash describe, info ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for describe. | | -a, --print-all | - | Print all sections. | | -c, --print-configs | - | Print the config section. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -p, --print-partitions | - | Print the detailed partitions section. | | -s, --print-summary | - | Print the summary section. | | -r, --regex | - | Parse arguments as regex; describe any topic that matches any input topic expression. | | --stable | - | Include the stable offsets column in the partitions section; only relevant if you produce to this topic transactionally. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 629: rpk topic list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-list.md --- # rpk topic list --- title: rpk topic list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List topics, optionally listing specific topics. This command lists all topics that you have access to by default. If specifying topics or regular expressions, this command can be used to know exactly what topics you would delete if using the same input to the delete command. Alternatively, you can request specific topics to list, which can be used to check authentication errors (do you not have access to a topic you were expecting to see?), or to list all topics that match regular expressions. The `--regex` or `-r` flag opts into parsing the input topics as regular expressions and listing any non-internal topic that matches any of expressions. The input expressions are wrapped with `^` and `$` so that the expression must match the whole topic name. Regular expressions cannot be used to match internal topics, as such, specifying both `-i` and `-r` will exit with failure. Lastly, `--detailed` or `-d` flag opts in to printing extra per-partition information. ## [](#usage)Usage ```bash rpk topic list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -d, --detailed | - | Print per-partition information for topics. | | --format | string | Output format. Possible values: json, yaml, text, wide, help. Default: text. | | -h, --help | - | Help for list. | | -i, --internal | - | Print internal topics. | | -r, --regex | - | Parse topics as regex; list any topic that matches any input topic expression. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 630: rpk topic produce **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-produce.md --- # rpk topic produce --- title: rpk topic produce latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-produce page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-produce.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-produce.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Produce records to a topic. Producing records reads from `STDIN`, parses input according to `--format`, and produce records to Redpanda. The input formatter understands a wide variety of formats. Parsing input operates on either sizes or on delimiters, both of which can be specified in the same formatting options. If using sizes to specify something, the size must come before what it is specifying. Delimiters match on an exact text basis. This command will quit with an error if any input fails to match your specified format. ## [](#formatting)Formatting Formatting is based on percent escapes and modifiers. Slashes can be used for common escapes: | Escape | Description | | --- | --- | | \t | Tabs | | \n | Newlines | | \r | Carriage returns | | \\ | Slashes | | \xNN | Hex encoded characters | The percent encodings are represented like this: | Percent encoding | Description | | --- | --- | | %t | Topic | | %T | Topic length | | %k | Key | | %K | Key length | | %v | Value | | %V | Value length | | %h | Begin the header specification | | %H | Number of headers | | %p | Partition | | %o | Offset | | %e | Leader epoch | | %d | Timestamp (formatting described below) | | %x | Producer ID | | %y | Producer epoch | | %[ | Partition log start offset | | %| | Partition last stable offset | | %] | Partition high watermark | | %% | Record attributes (formatting described below) | | %a | Percent sign | | %{ | Left brace | | %} | Right brace | | %i | Number of records formatted | ### [](#modifiers)Modifiers Text and numbers can be formatted in many different ways, and the default format can be changed within brace modifiers. `%v` prints a value, while `%v{hex}` prints the value hex encoded. `%T` prints the length of a topic in ASCII, while `%T{big8}` prints the length of the topic as an eight byte big endian. All modifiers go within braces following a percent-escape. ### [](#numbers)Numbers Formatting number values can have the following modifiers: | Format | Description | | --- | --- | | ascii | Print the number as ASCII (default) | | hex64 | Sixteen hex characters | | hex32 | Eight hex characters | | hex16 | Four hex characters | | hex8 | Two hex characters | | hex4 | One hex character | | big64 | Eight byte big endian number | | big32 | Four byte big endian number | | big16 | Two byte big endian number | | big8 | Alias for byte | | little64 | Eight byte little endian number | | little32 | Four byte little endian number | | little16 | Two byte little endian number | | little8 | Alias for byte | | byte | One byte number | | bool | true if the number is non-zero, false if the number is zero | All numbers are truncated as necessary per the modifier. Printing `%V{byte}` for a length 256 value prints a single null, whereas printing `%V{big8}` prints the bytes 1 and 0. When writing number sizes, the size corresponds to the size of the raw values, not the size of encoded values. `%T% t{hex}` for the topic `foo` prints `3 666f6f`, not `6 666f6f`. ### [](#timestamps)Timestamps By default, the timestamp field is printed as a millisecond number value. In addition to the number modifiers above, timestamps can be printed with either `Go` formatting: ```go %d{go[2006-01-02T15:04:05Z07:00]} ``` Or `strftime` formatting: ```go %d{strftime[%F]} ``` An arbitrary amount of brackets (or braces, or # symbols) can wrap your date formatting: ```go %d{strftime=== [%F] ===} ``` This prints `[YYYY-MM-DD]`, while the surrounding three # on each side are used to wrap the formatting. For more information on Go time formatting, see the [Go documentation](https://pkg.go.dev/time). For more information on `strftime` formatting, run `man strftime`. ## [](#schema-registry)Schema registry Records can be encoded using a specified schema from our schema registry. Use the `--schema-id` or `--schema-key-id` flags to define the schema ID, `rpk` will retrieve the schemas and encode the record accordingly. Additionally, utilizing `topic` in the mentioned flags allows for the use of the Topic Name Strategy. This strategy identifies a schema subject name based on the topic itself. For example: Produce to `foo`, encode using the latest schema in the subject `foo-value`: ```bash rpk topic produce foo --schema-id=topic ``` For protobuf schemas, you can specify the fully qualified name of the message you want the record to be encoded with. Use the `schema-type` flag or `schema-key-type`. If the schema contains only one message, specifying the message name is unnecessary. For example: Produce to `foo`, using schema ID 1, message FQN Person.Name: ```bash rpk topic produce foo --schema-id 1 --schema-type Person.Name ``` ## [](#tombstones)Tombstones By default, records produced without a value will have an empty-string value, `""`. The below example produces a record with the key `not_a_tombstone_record` and the value `""`: ```bash rpk topic produce foo -k not_a_tombstone_record [Enter] ``` Tombstone records (records with a `null` value) can be produced by using the `-Z` flag and creating empty-string value records. Using the same example from above, but adding the `-Z` flag will produce a record with the key `tombstone_record` and the value `null`: ```bash rpk topic produce foo -k tombstone_record -Z [Enter] ``` It is important to note that records produced with values of string `"null"` are not considered tombstones by Redpanda. ## [](#examples)Examples In the below examples, we can parse many records at once. The produce command reads input and tokenizes based on your specified format. Every time the format is completely matched, a record is produced and parsing begins anew. - A key and value, separated by a space and ending in newline: `-f '%k %v\n'` - A four byte topic, four byte key, and four byte value: `-f '%T{4}%K{4}%V{4}%t%k%v'` - A value to a specific partition, if using a non-negative --partition flag: `-f '%p %v\n'` - A big-endian uint16 key size, the text " foo ", and then that key: `-f '%K{big16} foo %k'` - A value that can be two or three characters followed by a newline: `-f '%v{re#...?#}\n'` - A key and a json value, separated by a space: `-f '%k %v{json}'` ## [](#miscellaneous)Miscellaneous Producing requires a topic to produce to. The topic can be specified either directly on as an argument, or in the input text through %t. A parsed topic takes precedence over the default passed in topic. If no topic is specified directly and no topic is parsed, this command will quit with an error. The input format can parse partitions to produce directly to with %p. Doing so requires specifying a non-negative --partition flag. Any parsed partition takes precedence over the --partition flag; specifying the flag is the main requirement for being able to directly control which partition to produce to. You can also specify an output format to write when a record is produced successfully. The output format follows the same formatting rules as the topic consume command. See that command’s help text for a detailed description. ## [](#usage)Usage ```bash rpk topic produce [TOPIC] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --acks | int | Number of acks required for producing (-1=all, 0=none, 1=leader) (default -1). | | --allow-auto-topic-creation | - | Auto-create non-existent topics; requires auto_create_topics_enabled on the broker. | | -z, --compression | string | Compression to use for producing batches (none, gzip, snappy, lz4, zstd) (default "snappy"). | | --delivery-timeout | duration | Per-record delivery timeout, if non-zero, min 1s. | | -f, --format | string | Input record format (default "%v\n"). | | -H, --header | stringArray | Headers in format key:value to add to each record (repeatable). | | -h, --help | - | Help for produce. | | -k, --key | string | A fixed key to use for each record (parsed input keys take precedence). | | --max-message-bytes | int32 | If non-negative, maximum size of a record batch before compression (default -1). | | -o, --output-format | string | what to write to stdout when a record is successfully produced (default "Produced to partition %p at offset %o with timestamp %d.\n"). | | -p, --partition | int32 | Partition to directly produce to, if non-negative (also allows %p parsing to set partitions) (default -1). | | --schema-id | string | Schema ID to encode the record value with, use topic for TopicName strategy. | | --schema-key-id | string | Schema ID to encode the record key with, use topic for TopicName strategy. | | --schema-key-type | string | Name of the protobuf message type to be used to encode the record key using schema registry. | | --schema-type | string | Name of the protobuf message type to be used to encode the record value using schema registry. | | -Z, --tombstone | - | Produce empty values as tombstones. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 631: rpk topic trim-prefix **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic-trim-prefix.md --- # rpk topic trim-prefix --- title: rpk topic trim-prefix latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic-trim-prefix page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic-trim-prefix.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic-trim-prefix.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Trim records from topics This command allows you to trim records from topics, where Redpanda sets the LogStartOffset for partitions to the requested offset. All segments whose base offset is less than the requested offset are deleted, and any records within the segment before the requested offset can no longer be read. The `--offset/-o` flag allows you to indicate which index you want to set the partition’s low watermark (start offset) to. It can be a single integer value denoting the offset, or it can be a timestamp if you prefix the offset with an '@'. You can select which partition to trim the offset from using the `--partitions/-p` flag. The `--from-file` option allows to trim the offsets specified in a text file with the following format: \[TOPIC\] \[PARTITION\] \[OFFSET\] \[TOPIC\] \[PARTITION\] \[OFFSET\] ... or the equivalent keyed JSON/YAML file. > ⚠️ **WARNING** > > When you delete records from a topic with a timestamp, Redpanda advances the partition start offset to the first record whose timestamp is after the threshold. If record timestamps are not in order with respect to offsets, this may result in unintended deletion of data. Before using a timestamp, verify that timestamps increase in the same order as offsets in the topic to avoid accidental data loss. For example: > > ```bash > rpk topic consume -n 50 --format '%o %d{go[2006-01-02T15:04:05Z07:00]} %k %v' > ``` ## [](#examples)Examples - Trim records in 'foo' topic to offset 120 in partition 1: ```bash rpk topic trim-prefix foo --offset 120 --partitions 1 ``` - Trim records in all partitions of topic foo previous to an specific timestamp: ```bash rpk topic trim-prefix foo -o "@1622505600" ``` - Trim records from a JSON file: ```bash rpk topic trim-prefix --from-file /tmp/to_trim.json ``` ## [](#usage)Usage ```bash rpk topic trim-prefix [TOPIC] [flags] ``` ## [](#aliases)Aliases ```bash trim-prefix, trim ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --from-file | string | File of topic/partition/offset for which to trim offsets for. | | -h, --help | - | Help for trim-prefix. | | --no-confirm | - | Disable confirmation prompt. | | -o, --offset | string | Offset to set the partition’s start offset to, either as an integer or timestamp (@). | | -p, --partitions | int32 | int32Slice Comma-separated list of partitions to trim records from (default to all) (default []). | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 632: rpk topic **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-topic/rpk-topic.md --- # rpk topic --- title: rpk topic latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-topic/rpk-topic page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-topic/rpk-topic.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-topic/rpk-topic.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Create, delete, produce to and consume from Redpanda topics. ## [](#usage)Usage ```bash rpk topic [flags] [command] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for topic. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 633: rpk transform build **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-build.md --- # rpk transform build --- title: rpk transform build latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-build page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-build.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-build.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Build a data transform. This command looks in the current working directory for a `transform.yaml` file. It installs the appropriate build plugin, then builds a `.wasm` file. When invoked, it passes extra arguments directly to the underlying toolchain. For example, to add debug symbols and use the `asyncify` scheduler for `tinygo`: ```bash rpk transform build -- -scheduler=asyncify -no-debug=false ``` Language-specific details: TinyGo - By default, TinyGo are release builds (-opt=2) and goroutines are disabled, for maximum performance. ## [](#usage)Usage ```bash rpk transform build [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for build. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 634: rpk transform delete **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-delete.md --- # rpk transform delete --- title: rpk transform delete latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-delete page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-delete.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-delete.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Delete a data transform. ## [](#usage)Usage ```bash rpk transform delete [NAME] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for delete. | | --no-confirm | - | Disable confirmation prompt. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 635: rpk transform deploy **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-deploy.md --- # rpk transform deploy --- title: rpk transform deploy latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-deploy page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-deploy.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-deploy.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Deploy a data transform. When run in the same directory as a `transform.yaml`, this reads the configuration file, then looks for a `.wasm` file with the same name as your project. If the input and output topics are specified in the configuration file, those are used. Otherwise, the topics can be specified on the command line using the `--input-topic` and `--output-topic` flags. You can specify environment variables for the transform using the `--var` flag. Variables are separated by an equal sign. For example: `--var=KEY=VALUE`. The `--var` flag can be repeated to specify multiple variables. You can specify the `--from-offset` flag to identify where on the input topic the transform should begin processing. Expressed as: - `@T` - Begin reading records with committed timestamp >= T (UNIX time, ms from epoch) - `+N` - Begin reading N records from the start of each input partition - `-N` - Begin reading N records prior to the end of each input partition Note that the broker will only respect `--from-offset` on the first deploy for a given transform. Re-deploying the transform will cause processing to pick up at the last committed offset. This state is maintained until the transform is deleted. ## [](#usage)Usage ```bash rpk transform deploy [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | --compression | string | Output batch compression type. | | --file | string | The WebAssembly module to deploy. | | --from-offset | string | Process an input topic partition from a relative offset. | | -h, --help | - | Help for deploy. | | -i, --input-topic | string | The input topic to apply the transform to. | | --name | string | The name of the transform. | | -o, --output-topic | strings | The output topic to write the transform results to (repeatable). | | --var | environmentVariable | Specify an environment variable in the form of KEY=VALUE. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | ## [](#examples)Examples Deploy Wasm files directly without a `transform.yaml` file: ```bash rpk transform deploy --file transform.wasm --name myTransform \ --input-topic my-topic-1 \ --output-topic my-topic-2 --output-topic my-topic-3 ``` Deploy a transformation with multiple environment variables: ```bash rpk transform deploy --var FOO=BAR --var FIZZ=BUZZ ``` Configure compression for batches output by data transforms. The default setting is `none` but you can choose from the following options: - none - gzip - snappy - lz4 - zstd Configure this at deployment using `rpk` with the `--compression` flag: ```bash rpk transform deploy --compression ``` --- # Page 636: rpk transform init **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-init.md --- # rpk transform init --- title: rpk transform init latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-init page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-init.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-init.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Initialize a transform. Create a new data transform using a template in the current directory. ## [](#example)Example Specify a new directory to create by specifying it in the command: ```bash rpk transform init foobar ``` This initializes a transform project in the foobar directory. ## [](#usage)Usage ```bash rpk transform init [DIRECTORY] [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for rpk transform init` | | --install-deps | - | If dependencies should be installed for the project (default prompt). | | -l, --language | string | The language used to develop the transform. | | --name | string | The name of the transform. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 637: rpk transform list **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-list.md --- # rpk transform list --- title: rpk transform list latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-list page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-list.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-list.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- List data transforms. This command lists all data transforms in a cluster, as well as showing the state of a individual transform processor, such as if it’s errored or how many records are pending to be processed (lag). There is a processor assigned to each partition on the input topic, and each processor is a separate entity that can make progress or fail independently. The `--detailed/-d` flag opts in to printing extra per-processor information. ## [](#usage)Usage ```bash rpk transform list [flags] ``` ## [](#aliases)Aliases ```bash list, ls ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -d, --detailed | - | Print per-partition information for data transforms. | | --format | string | Output format: json,yaml,text,wide,help. Default: text. | | -h, --help | - | Help for list. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 638: rpk transform logs **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform-logs.md --- # rpk transform logs --- title: rpk transform logs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform-logs page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform-logs.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform-logs.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- View logs for a transform. Data transform’s STDOUT and STDERR are captured during runtime and written to an internally managed topic `_redpanda.transform_logs`. This command outputs logs for a single transform over a period of time and printing them to STDOUT. The logs can be printed in various formats. By default, only logs that have been emitted are displayed. Use the `--follow` flag to stream new logs continuously. ## [](#filtering)Filtering The `--head` and `--tail` flags are mutually exclusive and limit the number of log entries from the beginning or end of the range, respectively. The `--since` and `--until` flags define a time range. Use one or both flags to limit the log output to a desired period of time. Both flags accept values in the following formats: | Value | Description | | --- | --- | | now | the current time, useful for --since=now | | 13 digits | parsed as a Unix millisecond | | 9 digits | parsed as a Unix second | | YYYY-MM-DD | parsed as a day, UTC | | YYYY-MM-DDTHH:MM:SSZ | parsed as RFC3339, UTC; fractional seconds optional (.MMM) | | -dur | a negative duration from now | | dur | a positive duration from now | Durations are parsed simply: | Value | Description | | --- | --- | | 3ms | three milliseconds | | 10s | ten seconds | | 9m | nine minutes | | 1h | one hour | | 1m3ms | one minute and three milliseconds | ## [](#formatting)Formatting Logs can be displayed in a variety of formats using `--format`. The default `--format=text` prints the log record’s body line by line. When `--format=wide` is specified, the output includes a prefix that is the date of the log line and a level for the record. The INFO level corresponds to being emitted on the transform’s STDOUT, while the WARN level is used for STDERR. The `--format=json` flag emits logs in the JSON encoded version of the Open Telemetry LogRecord protocol buffer. ## [](#examples)Examples Reads logs within the last hour: ```bash rpk transform logs --since=-1h ``` Reads logs prior to 30 minutes ago: ```bash rpk transform logs --until=-30m ``` The following command reads logs between noon and 1pm on March 12th: ```bash rpk transform logs my-transform --since=2024-03-12T12:00:00Z --until=2024-03-12T13:00:00Z ``` ## [](#usage)Usage ```bash rpk transform logs NAME [flags] ``` ## [](#aliases)Aliases ```bash logs, log ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -f, --follow | - | Specify if the logs should be streamed. | | --format | string | Output format (json,yaml,text,wide,help) (default "text"). | | --head | int | The number of log entries to fetch from the start. | | -h, --help | - | Help for logs. | | --since | timestamp | Start reading logs after this time (now, -10m, 2024-02-10). See Filtering for format details. | | --tail | int | The number of log entries to fetch from the end. | | --until | timestamp | Read logs up unto this time (-1h, 2024-02-10T13:00:00Z). See Filtering for format details. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings; '-X help' for detail or '-X list' for terser detail. | | --profile | string | rpk profile to use. | | -v, --verbose | - | Enable verbose logging. | --- # Page 639: rpk transform **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-transform/rpk-transform.md --- # rpk transform --- title: rpk transform latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-transform/rpk-transform page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-transform/rpk-transform.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-transform/rpk-transform.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-07" --- Develop, deploy, and manage Redpanda data transforms. ## [](#usage)Usage ```bash rpk transform [command] [flags] ``` ## [](#aliases)Aliases ```bash transform, wasm, transfrom ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for transform. | | --config | string | Redpanda or rpk config file; default search paths are ~/.config/rpk/rpk.yaml, $PWD, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 640: rpk version **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-version.md --- # rpk version --- title: rpk version latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-version page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-version.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-version.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-08" --- Prints the current `rpk` and Redpanda version and allows you to list the Redpanda version running on each broker in your cluster. To list the Redpanda version of each broker in your cluster you may pass the Admin API hosts via flags, profile, or environment variables. To get only the rpk version, use `rpk --version`. ## [](#usage)Usage ```bash rpk version [flags] ``` ## [](#flags)Flags | Value | Type | Description | | --- | --- | --- | | -h, --help | - | Help for version. | | --config | string | Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml. | | -X, --config-opt | stringArray | Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail. | | --profile | string | Profile to use. See rpk profile for more details. | | -v, --verbose | - | Enable verbose logging. | --- # Page 641: rpk -X **URL**: https://docs.redpanda.com/redpanda-cloud/reference/rpk/rpk-x-options.md --- # rpk -X --- title: rpk -X latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: rpk/rpk-x-options page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: rpk/rpk-x-options.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/rpk/rpk-x-options.adoc page-git-created-date: "2024-07-25" page-git-modified-date: "2025-05-08" --- Use `rpk -X` flag to override any rpk-specific configuration option. Every configuration flag for `rpk` is a `key=value` option following the `-X` flag. For example, `rpk -X tls.enabled=true` enables TLS for the Kafka API. Every `-X` option can be translated into an environment variable by prefixing with `RPK_` and replacing periods (`.`) with underscores (`_`). For example, the flag `tls.enabled` has the equivalent environment variable `RPK_TLS_ENABLED`. > ❗ **IMPORTANT** > > - Flags common across all `rpk` commands in previous versions (for example, `--brokers`, `--tls-enabled`) are deprecated. > > - Functionality of all deprecated flags are supported as `-X` options. > 💡 **TIP** > > - For persistent configuration across commands and sessions, Redpanda Data recommends using [rpk profiles](../../../manage/rpk/config-rpk-profile/) instead of environment variables or `-X` flags. > > - `rpk` supports command-line (tab) completion for `-X` flag keys. > > - Each `rpk` command’s `-help` text prints information specific to the command. To view a description of `-X` options, run `rpk -X list` to list supported options, or run `rpk -X help` to get details about supported options. ## [](#configuration-priority)Configuration priority `rpk` resolves configuration values in the following priority order where higher priority options override lower priority ones: 1. **Command-line flags** (including `-X` options): Applies to current command only 2. **Environment variables**: `RPK_*` environment variables lasts for shell session 3. **rpk profile** (`rpk.yaml`): Persistent across sessions (recommended) 4. **Redpanda configuration** (`redpanda.yaml` rpk section): System-wide defaults ## [](#environment-variables)Environment variables ### [](#rpk-environment-variables)RPK\_\* environment variables Every `-X` option has a corresponding `RPK_*` environment variable. Convert by prefixing with `RPK_` and replacing dots with underscores: | -X Option | Environment Variable | | --- | --- | | brokers | RPK_BROKERS | | tls.enabled | RPK_TLS_ENABLED | | tls.insecure_skip_verify | RPK_TLS_INSECURE_SKIP_VERIFY | | tls.ca | RPK_TLS_CA | | tls.cert | RPK_TLS_CERT | | tls.key | RPK_TLS_KEY | | sasl.mechanism | RPK_SASL_MECHANISM | | user | RPK_USER | | pass | RPK_PASS | | admin.hosts | RPK_ADMIN_HOSTS | | admin.tls.enabled | RPK_ADMIN_TLS_ENABLED | | admin.tls.insecure_skip_verify | RPK_ADMIN_TLS_INSECURE_SKIP_VERIFY | | admin.tls.ca | RPK_ADMIN_TLS_CA | | admin.tls.cert | RPK_ADMIN_TLS_CERT | | admin.tls.key | RPK_ADMIN_TLS_KEY | | registry.hosts | RPK_REGISTRY_HOSTS | | registry.tls.enabled | RPK_REGISTRY_TLS_ENABLED | | registry.tls.insecure_skip_verify | RPK_REGISTRY_TLS_INSECURE_SKIP_VERIFY | | registry.tls.ca | RPK_REGISTRY_TLS_CA | | registry.tls.cert | RPK_REGISTRY_TLS_CERT | | registry.tls.key | RPK_REGISTRY_TLS_KEY | | cloud.client_id | RPK_CLOUD_CLIENT_ID | | cloud.client_secret | RPK_CLOUD_CLIENT_SECRET | | globals.prompt | RPK_GLOBALS_PROMPT | | globals.no_default_cluster | RPK_GLOBALS_NO_DEFAULT_CLUSTER | | globals.command_timeout | RPK_GLOBALS_COMMAND_TIMEOUT | | globals.dial_timeout | RPK_GLOBALS_DIAL_TIMEOUT | | globals.request_timeout_overhead | RPK_GLOBALS_REQUEST_TIMEOUT_OVERHEAD | | globals.retry_timeout | RPK_GLOBALS_RETRY_TIMEOUT | | globals.fetch_max_wait | RPK_GLOBALS_FETCH_MAX_WAIT | | globals.kafka_protocol_request_client_id | RPK_GLOBALS_KAFKA_PROTOCOL_REQUEST_CLIENT_ID | ## [](#duration-format)Duration format Duration values use Go’s standard duration format. A duration string is a sequence of decimal numbers with unit suffixes: - `ns` = nanoseconds - `us` or `µs` = microseconds - `ms` = milliseconds - `s` = seconds - `m` = minutes - `h` = hours **Examples**: `30s`, `1m30s`, `2h`, `500ms`, `1h15m30s` You can combine multiple units: `2h45m30s` means 2 hours, 45 minutes, and 30 seconds. ## [](#configuration-examples)Configuration examples To persist configuration across sessions, use a [rpk profile](../../../manage/rpk/config-rpk-profile/): Create a profile: ```bash rpk profile create \ --set brokers=, \ --set user= \ --set pass= \ --set sasl.mechanism= \ --set tls.enabled=true \ --description "" ``` Use the profile for commands: ```bash rpk topic list --profile ``` For temporary use or automation scripts, set environment variables: ```bash export RPK_BROKERS="," export RPK_USER="" export RPK_PASS="" export RPK_SASL_MECHANISM="" export RPK_TLS_ENABLED="true" ``` ## [](#options)Options The following options are available: ### [](#brokers)brokers A comma-delimited list of broker `host:port` pairs to connect to the Kafka API. **Type**: string **Default**: `localhost:9092` **Example**: `brokers=127.0.0.1:9092,localhost:9094` **Usage**: ```none rpk topic list -X brokers=, ``` * * * ### [](#tls-enabled)tls.enabled A boolean that enables `rpk` to speak TLS to your broker’s Kafka API listeners. You can use this if you have well known certificates set up on your Kafka API. If you use mTLS, specifying mTLS certificate filepaths automatically opts into `tls.enabled`. **Type**: boolean **Default**: `false` **Example**: `tls.enabled=true` **Usage**: ```none rpk topic list -X tls.enabled= ``` * * * ### [](#tls-insecure_skip_verify)tls.insecure\_skip\_verify A boolean that disables `rpk` from verifying the broker’s certificate chain. **Type**: boolean **Default**: `false` **Example**: `tls.insecure_skip_verify=true` **Usage**: ```none rpk topic list -X tls.insecure_skip_verify= ``` * * * ### [](#tls-ca)tls.ca A filepath to a PEM-encoded CA certificate file to talk to your broker’s Kafka API listeners with mTLS. You may need this option if your listeners are using a certificate by a well known authority that is not bundled with your operating system. **Type**: string **Default**: "" **Example**: `tls.ca=/path/to/ca.pem` **Usage**: ```none rpk topic list -X tls.ca= ``` * * * ### [](#tls-cert)tls.cert A filepath to a PEM-encoded client certificate file to talk to your broker’s Kafka API listeners with mTLS. **Type**: string **Default**: "" **Example**: `tls.cert=/path/to/cert.pem` **Usage**: ```none rpk topic list -X tls.cert= ``` * * * ### [](#tls-key)tls.key A filepath to a PEM-encoded client key file to talk to your broker’s Kafka API listeners with mTLS. **Type**: string **Default**: "" **Example**: `tls.key=/path/to/key.pem` **Usage**: ```none rpk topic list -X tls.key= ``` * * * ### [](#sasl-mechanism)sasl.mechanism The SASL mechanism to use for authentication. **Type**: string **Default**: "" **Acceptable values**: `SCRAM-SHA-256`, `SCRAM-SHA-512`, `PLAIN` > 📝 **NOTE** > > With Redpanda, the Admin API can be configured to require basic authentication with your Kafka API SASL credentials. This defaults to `SCRAM-SHA-256` if no mechanism is specified. **Example**: `sasl.mechanism=SCRAM-SHA-256` **Usage**: ```none rpk topic list -X sasl.mechanism= ``` * * * ### [](#user)user The SASL username to use for authentication. It’s also used for the Admin API if you have configured it to require basic authentication. **Type**: string **Default**: "" **Example**: `user=myusername` **Usage**: ```none rpk topic list -X user= ``` * * * ### [](#pass)pass The SASL password to use for authentication. It’s also used for the Admin API if you have configured it to require basic authentication. **Type**: string **Default**: "" **Example**: `pass=mypassword` **Usage**: ```none rpk topic list -X pass= ``` * * * ### [](#admin-hosts)admin.hosts A comma-delimited list of admin hosts to connect to. **Type**: string **Default**: `localhost:9644` **Example**: `admin.hosts=192.168.1.1:9644,192.168.1.2:9644` * * * ### [](#admin-tls-enabled)admin.tls.enabled A boolean that enables `rpk` to speak TLS to your broker’s Admin API listeners. You can use this if you have well known certificates set up on your Admin API. If you use mTLS, specifying mTLS certificate filepaths automatically opts into `admin.tls.enabled`. **Type**: boolean **Default**: `false` **Example**: `admin.tls.enabled=true` **Usage**: ```none rpk cluster info -X admin.tls.enabled= ``` * * * ### [](#admin-tls-insecure_skip_verify)admin.tls.insecure\_skip\_verify A boolean that disables `rpk` from verifying the broker’s certificate chain. **Type**: boolean **Default**: `false` **Example**: `admin.tls.insecure_skip_verify=true` **Usage**: ```none rpk cluster info -X admin.tls.insecure_skip_verify= ``` * * * ### [](#admin-tls-ca)admin.tls.ca A filepath to a PEM-encoded CA certificate file to talk to your broker’s Admin API listeners with mTLS. You may also need this if your listeners are using a certificate by a well known authority that is not yet bundled with your operating system. **Type**: string **Default**: "" **Example**: `admin.tls.ca=/path/to/ca.pem` **Usage**: ```none rpk cluster info -X admin.tls.ca= ``` * * * ### [](#admin-tls-cert)admin.tls.cert A filepath to a PEM-encoded client certificate file to talk to your broker’s Admin API listeners with mTLS. **Type**: string **Default**: "" **Example**: `admin.tls.cert=/path/to/cert.pem` **Usage**: ```none rpk cluster info -X admin.tls.cert= ``` * * * ### [](#admin-tls-key)admin.tls.key A filepath to a PEM-encoded client key file to talk to your broker’s Admin API listeners with mTLS. **Type**: string **Default**: "" **Example**: `admin.tls.key=/path/to/key.pem` **Usage**: ```none rpk cluster info -X admin.tls.key= ``` * * * ### [](#registry-hosts)registry.hosts A comma-delimited list of Schema Registry hosts to connect to. **Type**: string **Default**: `localhost:8081` **Example**: `registry.hosts=192.168.1.1:8081,192.168.1.2:8081` **Usage**: ```none rpk registry schema list -X registry.hosts=, ``` * * * ### [](#registry-tls-enabled)registry.tls.enabled A boolean that enables `rpk` to use TLS with your broker’s Schema Registry API listeners. You can use this if you have well known certificates set up on your Schema Registry API. If you use mTLS, specifying mTLS certificate filepaths automatically opts into `registry.tls.enabled`. **Type**: boolean **Default**: `false` **Example**: `registry.tls.enabled=true` **Usage**: ```none rpk registry schema list -X registry.tls.enabled= ``` * * * ### [](#registry-tls-insecure_skip_verify)registry.tls.insecure\_skip\_verify A boolean that disables `rpk` from verifying the broker’s certificate chain. **Type**: boolean **Default**: `false` **Example**: `registry.tls.insecure_skip_verify=true` **Usage**: ```none rpk registry schema list -X registry.tls.insecure_skip_verify= ``` * * * ### [](#registry-tls-ca)registry.tls.ca A filepath to a PEM-encoded CA certificate file to talk to your broker’s Schema Registry API listeners with mTLS. **Type**: string **Default**: "" **Example**: `registry.tls.ca=/path/to/ca.pem` **Usage**: ```none rpk registry schema list -X registry.tls.ca= ``` * * * ### [](#registry-tls-cert)registry.tls.cert A filepath to a PEM-encoded client certificate file to talk to your broker’s Schema Registry API listeners with mTLS. **Type**: string **Default**: "" **Example**: `registry.tls.cert=/path/to/cert.pem` **Usage**: ```none rpk registry schema list -X registry.tls.cert= ``` * * * ### [](#registry-tls-key)registry.tls.key A filepath to a PEM-encoded client key file to talk to your broker’s Schema Registry API listeners with mTLS. **Type**: string **Default**: "" **Example**: `registry.tls.key=/path/to/key.pem` **Usage**: ```none rpk registry schema list -X registry.tls.key= ``` * * * ### [](#cloud-client_id)cloud.client\_id An OAuth client ID to use for authenticating with the Redpanda Cloud API. **Type**: string **Default**: "" **Example**: `cloud.client_id=abcdef123456` **Usage**: ```none rpk cloud cluster list -X cloud.client_id= ``` * * * ### [](#cloud-client_secret)cloud.client\_secret An OAuth client secret to use for authenticating with the Redpanda Cloud API. **Type**: string **Default**: "" **Example**: `cloud.client_secret=secretvalue789` **Usage**: ```none rpk cloud cluster list -X cloud.client_secret= ``` * * * ### [](#globals-prompt)globals.prompt A format string to use for the default prompt. See [`rpk profile prompt`](../rpk-profile/rpk-profile-prompt/) for more information. **Type**: string **Default**: `bg-red "%n"` **Example**: `globals.prompt="%n"` **Usage**: ```none rpk profile edit -X globals.prompt= ``` * * * ### [](#globals-no_default_cluster)globals.no\_default\_cluster A boolean that disables `rpk` from communicating to `localhost:9092` if no other cluster is specified. **Type**: boolean **Default**: `false` **Example**: `globals.no_default_cluster=true` **Usage**: ```none rpk topic list -X globals.no_default_cluster= ``` * * * ### [](#globals-command_timeout)globals.command\_timeout Sets a timeout for all commands issued through rpk. **Type**: [duration](#duration-format) **Default**: `30s` **Example**: `globals.command_timeout=30s` * * * ### [](#globals-dial_timeout)globals.dial\_timeout A duration that `rpk` will wait for a connection to be established before timing out. **Type**: [duration](#duration-format) **Default**: `3s` **Example**: `globals.dial_timeout=3s` **Usage**: ```none rpk topic list -X globals.dial_timeout= ``` * * * ### [](#globals-request_timeout_overhead)globals.request\_timeout\_overhead A duration that limits how long `rpk` waits for responses. **Type**: [duration](#duration-format) **Default**: `10s` > 📝 **NOTE** > > `globals.request_timeout_overhead` applies in addition to any request-internal timeout. > > For example, `ListOffsets` has no `Timeout` field, so `rpk` will wait `request_timeout_overhead` for a response. However, `JoinGroup` has a `RebalanceTimeoutMillis` field, so `request_timeout_overhead` is applied on top of the rebalance timeout. **Example**: `globals.request_timeout_overhead=5s` **Usage**: ```none rpk topic list -X globals.request_timeout_overhead= ``` * * * ### [](#globals-retry_timeout)globals.retry\_timeout This timeout specifies how long `rpk` will retry Kafka API requests. **Type**: [duration](#duration-format) **Default**: `30s` This timeout is evaluated before any backoff: - If a request fails, `rpk` first checks if the retry timeout has elapsed. - If the retry timeout has elapsed, `rpk` stops retrying. - Otherwise, `rpk` waits for the backoff and then retries. **Example**: `globals.retry_timeout=11s` **Usage**: ```none rpk topic list -X globals.retry_timeout= ``` * * * ### [](#globals-fetch_max_wait)globals.fetch\_max\_wait This timeout specifies the maximum duration that brokers will wait before replying to a fetch request with available data. **Type**: [duration](#duration-format) **Default**: `5s` **Example**: `globals.fetch_max_wait=5s` **Usage**: ```none rpk topic consume my-topic -X globals.fetch_max_wait= ``` * * * ### [](#globals-kafka_protocol_request_client_id)globals.kafka\_protocol\_request\_client\_id This string value is the client ID that `rpk` uses when issuing Kafka protocol requests to Redpanda. This client ID shows up in Redpanda logs and metrics. Changing it can be useful if you want to have your own `rpk` client stand out from others that are also interacting with the cluster. **Type**: string **Default**: `rpk` **Example**: `globals.kafka_protocol_request_client_id=my-rpk-client` **Usage**: ```none rpk topic list -X globals.kafka_protocol_request_client_id= ``` --- # Page 642: Tiers and Regions **URL**: https://docs.redpanda.com/redpanda-cloud/reference/tiers.md --- # Tiers and Regions --- title: Tiers and Regions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: tiers/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: tiers/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/tiers/index.adoc description: When you create a cluster, you select your region. For BYOC and Dedicated clusters, you also select a usage tier, which provides tested workload configurations for throughput, partitions (pre-replication), and connections. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-07-01" --- - [Serverless Regions](serverless-regions/) Learn about supported regions for Serverless clusters. - [BYOC Tiers and Regions](byoc-tiers/) Learn about supported tiers and regions for BYOC clusters. - [Dedicated Tiers and Regions](dedicated-tiers/) Learn about supported tiers and regions for Dedicated clusters. --- # Page 643: BYOC Tiers and Regions **URL**: https://docs.redpanda.com/redpanda-cloud/reference/tiers/byoc-tiers.md --- # BYOC Tiers and Regions --- title: BYOC Tiers and Regions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: tiers/byoc-tiers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: tiers/byoc-tiers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/tiers/byoc-tiers.adoc description: Learn about supported tiers and regions for BYOC clusters. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- ## [](#byoc-usage-tiers)BYOC usage tiers When you create a BYOC cluster, you select your usage tier. Each tier provides tested workload configurations for maximum throughput, partitions, and connections. | Tier | Ingress | Egress | Partitions (pre-replication) | Connections | | --- | --- | --- | --- | --- | | Tier 1 | 20 MBps | 60 MBps | 2,000 | 9,000 | | Tier 2 | 50 MBps | 150 MBps | 5,600 | 22,500 | | Tier 3 | 100 MBps | 200 MBps | 11,200 | 45,000 | | Tier 4 | 200 MBps | 400 MBps | 22,600 | 90,000 | | Tier 5 | 400 MBps | 800 MBps | 45,600 | 180,000 | | Tier 6 | 800 MBps | 1,600 MBps | 90,000 | 180,000 | | Tier 7 | 1,200 MBps | 2,400 MBps | 112,500 | 270,000 | | Tier 8 | 1,600 MBps | 3,200 MBps | 112,500 | 360,000 | | Tier 9 | 2,000 MBps | 4,000 MBps | 112,500 | 450,000 | > 📝 **NOTE** > > - Partition counts are based on clusters running Redpanda version 25.1 or higher and on the assumption that the replication factor is 3 (default). If you set a higher replication factor, the maximum value for partitions will be lower. > > - On Azure, tiers 1-5 are supported. > > - Redpanda supports compute-optimized tiers with AWS Graviton3 processors. > > - Depending on the workload, it may not be possible to achieve all maximum values. For example, a high number of partitions may make it more difficult to reach the maximum value in throughput. > > - Connections are regulated per broker for best performance. For example, in a tier 1 cluster with 3 brokers, there could be up to 3,000 connections per broker. ## [](#byoc-supported-regions)BYOC supported regions ### Google Cloud Platform (GCP) | Region | | --- | | asia-east1 | | asia-northeast1 | | asia-south1 | | asia-southeast1 | | australia-southeast1 | | europe-southwest1 | | europe-west1 | | europe-west2 | | europe-west3 | | europe-west4 | | europe-west9 | | northamerica-northeast1 | | southamerica-east1 | | southamerica-west1 | | us-central1 | | us-east1 | | us-east4 | | us-west1 | | us-west2 | ### Amazon Web Services (AWS) | Region | | --- | | af-south-1 | | ap-east-1 | | ap-northeast-1 | | ap-south-1 | | ap-southeast-1 | | ap-southeast-2 | | ap-southeast-3 | | ca-central-1 | | eu-central-1 | | eu-north-1 | | eu-south-1 | | eu-west-1 | | eu-west-2 | | eu-west-3 | | me-central-1 | | sa-east-1 | | us-east-1 | | us-east-2 | | us-west-2 | ### Azure | Region | | --- | | centralus | | eastus | | eastus2 | | germanywestcentral | | northeurope | | norwayeast | | swedencentral | | uksouth | | westeurope | | westus2 | --- # Page 644: Dedicated Tiers and Regions **URL**: https://docs.redpanda.com/redpanda-cloud/reference/tiers/dedicated-tiers.md --- # Dedicated Tiers and Regions --- title: Dedicated Tiers and Regions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: tiers/dedicated-tiers page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: tiers/dedicated-tiers.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/tiers/dedicated-tiers.adoc description: Learn about supported tiers and regions for Dedicated clusters. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- ## [](#dedicated-usage-tiers)Dedicated usage tiers When you create a Dedicated cluster, you select your usage tier. Each tier provides tested workload configurations for maximum throughput, partitions, and connections. | Tier | Ingress | Egress | Partitions (pre-replication) | Connections | | --- | --- | --- | --- | --- | | Tier 1 | 20 MBps | 60 MBps | 2,000 | 9,000 | | Tier 2 | 50 MBps | 150 MBps | 5,600 | 22,500 | | Tier 3 | 100 MBps | 200 MBps | 11,300 | 45,000 | | Tier 4 | 200 MBps | 400 MBps | 22,800 | 90,000 | | Tier 5 | 400 MBps | 800 MBps | 45,600 | 180,000 | > 📝 **NOTE** > > - Partition counts are based on clusters running Redpanda version 25.1 or higher and on the assumption that the replication factor is 3 (default). If you set a higher replication factor, the maximum value for partitions will be lower. > > - Depending on the workload, it may not be possible to achieve all maximum values. For example, a high number of partitions may make it more difficult to reach the maximum value in throughput. > > - Connections are regulated per broker for best performance. For example, in a tier 1 cluster with 3 brokers, there could be up to 3,000 connections per broker. ## [](#dedicated-supported-regions)Dedicated supported regions ### Google Cloud Platform (GCP) | Region | | --- | | asia-east1 | | asia-northeast1 | | asia-south1 | | asia-southeast1 | | australia-southeast1 | | europe-west1 | | europe-west2 | | europe-west3 | | northamerica-northeast1 | | southamerica-east1 | | us-central1 | | us-east1 | ### Amazon Web Services (AWS) | Region | | --- | | ap-northeast-1 | | ap-south-1 | | ap-southeast-1 | | ap-southeast-2 | | ca-central-1 | | eu-central-1 | | eu-west-1 | | eu-west-2 | | eu-west-3 | | us-east-1 | | us-east-2 | | us-west-2 | ### Azure | Region | | --- | | centralus | | eastus | | eastus2 | | northeurope | | norwayeast | | uksouth | --- # Page 645: Serverless Regions **URL**: https://docs.redpanda.com/redpanda-cloud/reference/tiers/serverless-regions.md --- # Serverless Regions --- title: Serverless Regions latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: tiers/serverless-regions page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: tiers/serverless-regions.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/reference/pages/tiers/serverless-regions.adoc description: Learn about supported regions for Serverless clusters. page-git-created-date: "2025-06-04" page-git-modified-date: "2025-11-19" --- ## [](#serverless-supported-regions)Serverless supported regions ### Amazon Web Services (AWS) | Region | | --- | | ap-northeast-1 | | ap-south-1 | | ap-southeast-1 | | eu-central-1 | | eu-west-2 | | us-east-1 | | us-west-2 | ### Google Cloud Platform (GCP) | Region | | --- | | us-central1 | > 📝 **NOTE** > > Serverless on GCP is currently in a [beta](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#beta) release. See also: [Serverless usage limits](../../../get-started/cluster-types/serverless/#serverless-usage-limits) --- # Page 646: Redpanda Cloud Security **URL**: https://docs.redpanda.com/redpanda-cloud/security.md --- # Redpanda Cloud Security --- title: Redpanda Cloud Security latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/index.adoc description: Learn about the fundamental building blocks of the Redpanda Cloud security. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Authentication](cloud-authentication/) Learn about Redpanda Cloud authentication. - [Authorization](authorization/) Learn about Redpanda Cloud authorization. - [Encryption](cloud-encryption/) Learn how Redpanda Cloud provides data encryption in transit and at rest. - [Availability](cloud-availability/) Learn how Redpanda Cloud supports deploying clusters in single or multiple availability zones (AZs). - [Secrets](secrets/) Learn how Redpanda Cloud manages secrets. - [Safety and Reliability](cloud-safety-reliability/) Learn how Redpanda Cloud tests for data inconsistency, liveness, and availability during adverse events. --- # Page 647: Authorization **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization.md --- # Authorization --- title: Authorization latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/index.adoc description: Learn about Redpanda Cloud authorization. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-05-07" --- - [Authorization](cloud-authorization/) Learn about user authorization and agent authorization in Redpanda Cloud. - [Role-Based Access Control (RBAC)](rbac/) Learn about configuring role-based access control (RBAC) in the control plane and in the data plane. - [Group-Based Access Control (GBAC)](gbac/) Configure group-based access control (GBAC) in the control plane and in the data plane. - [Configure ACLs](acl/) Learn how to use ACLs to configure fine-grained access to Redpanda resources. - [AWS IAM Policies](cloud-iam-policies/) See the IAM policies used by AWS. - [GCP IAM Policies](cloud-iam-policies-gcp/) See the IAM policies used by GCP. - [Azure IAM Policies](cloud-iam-policies-azure/) See the IAM policies used by Azure. --- # Page 648: Configure ACLs **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/acl.md --- # Configure ACLs --- title: Configure ACLs latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/acl page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/acl.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/acl.adoc description: Learn how to use ACLs to configure fine-grained access to Redpanda resources. page-git-created-date: "2025-08-25" page-git-modified-date: "2026-01-12" --- Access control lists (ACLs) provide a way to configure fine-grained access to Redpanda resources. ACLs are permission rules that determine which actions users or roles can perform on Redpanda resources. Redpanda stores ACLs internally, replicated with [Raft](https://raft.github.io/) to provide the same consensus guarantees as your data. > 📝 **NOTE** > > For complex organizational hierarchies or large numbers of users, consider using [role-based access control](../rbac/rbac_dp/) for a more flexible and efficient way to manage user permissions. ## [](#acls-overview)ACLs overview ACLs control access by defining: - **Who** can access resources (principals - users or roles) - **What** they can access (clusters, topics, consumer groups, transactional IDs, Schema Registry subjects, and Schema Registry operations) - **How** they can interact with those resources (operations like read, write, describe) - **Where** they can connect from (host restrictions) ACLs work with SASL/SCRAM and mTLS authentication methods to provide comprehensive security. ## [](#manage-acls)Manage ACLs You can create and manage ACLs in the following ways: - **Redpanda Cloud**: Select **Security** from the left navigation menu, select the **ACLs** tab. After the ACL is created, you can add users or roles to it. - **Command Line**: Use the `rpk` command-line tool for programmatic management. For example, suppose you want to create a user named `analytics-user` who can read from topics starting with `logs-` and write to a topic called `processed-data`: ```bash # 1. Create the user rpk security user create analytics-user --password 'secure-password' # 2. Grant read access to topics with "logs-" prefix rpk security acl create --allow-principal analytics-user \ --operation read,describe --topic 'logs-' \ --resource-pattern-type prefixed # 3. Grant write access to the processed-data topic rpk security acl create --allow-principal analytics-user \ --operation write,describe --topic processed-data ``` ## [](#acl-terminology)ACL terminology Understanding these terms helps you configure least-privilege access. | Term | Definition | Example | | --- | --- | --- | | Principal | The entity (user, role, or group) requesting access | User:analytics-user, RedpandaRole:data-engineers, Group:engineering | | Resource | The Redpanda component being accessed (cluster, topic, consumer group, transactional ID, Schema Registry subject, and Schema Registry operation) | Topic: sensor-data, Group: analytics-group, Cluster: kafka-cluster | | Operation | The action being performed on the resource | READ, WRITE, CREATE, DELETE, DESCRIBE | | Host | The IP address or hostname from which access is allowed/denied | 192.168.1.100, * (any host) | | Permission | Whether access is allowed or denied | ALLOW, DENY | An ACL rule combines these elements to create a permission statement: `ALLOW User:analytics-user to READ topic:sensor-data from host:192.168.1.100` ACL commands work on a multiplicative basis. If you specify two principals and two permissions, you create four ACLs: both permissions for each principal. ### [](#principals)Principals All ACLs require a principal. A principal is composed of two parts: the type, and the name. Redpanda supports the types "User", "RedpandaRole", and "Group". When you create user "bar", Redpanda expects you to add ACLs for "User:bar". To grant permissions to an OIDC group, use the `Group:` prefix (for example, `Group:engineering`). See [Configure GBAC in the Data Plane](../gbac/gbac_dp/). The `--allow-principal` and `--deny-principal` flags add this prefix for you, if necessary. The special character \* matches any name, meaning an ACL with principal `User:*` grants or denies the permission for any user. > 💡 **TIP** > > To set multiple principals in a single comma-separated string, you must enclose the string with quotes. Otherwise, `rpk` splits the string on commas and fails to read the option correctly. > > For example, use double quotes: > > ```bash > rpk security acl create --allow-principal="\"C=UK,ST=London,L=London,O=Redpanda,OU=engineering,CN=__schema_registry\"" > ``` > > Alternatively, use single quotes: > > ```bash > rpk security acl create --allow-principal='"C=UK,ST=London,L=London,O=Redpanda,OU=engineering,CN=__schema_registry"' > ``` ### [](#hosts)Hosts Hosts can be seen as an extension of the principal and can effectively gate where the principal can connect from. When creating ACLs, unless otherwise specified, the default host is the wildcard `*`, which allows or denies the principal from all hosts. When specifying hosts, you must pair the `--allow-host` flag with the `--allow-principal` flag and the `--deny-host` flag with the `--deny-principal` flag. ### [](#resources)Resources A resource is what an ACL allows or denies access to. The following resources are available within Redpanda: - `cluster` - `topics` - `groups` - `transactionalid` Starting in v25.2, Redpanda also supports the following ACL resources for Schema Registry: - `subject`: Controls ACL access for specific Schema Registry subjects. Specify using the flag `--registry-subject`. - `registry`: Controls whether or not to grant ACL access to global, or top-level Schema Registry operations. Specify using the flag `--registry-global`. > ❗ **IMPORTANT** > > ACLs for Schema Registry must be enabled for each cluster. See [Schema Registry Authorization](../../../manage/schema-reg/schema-reg-authorization/). Resources combine with the operation that is allowed or denied on that resource. By default, resources are specified on an exact name match (a "literal" match). Names for each of these resources can be specified with their respective flags. Use the `--resource-pattern-type` flag to specify that a resource name is "prefixed", meaning to allow anything with the given prefix. A literal name of "foo" matches only the topic "foo", while the prefixed name of "foo-" matches both "foo-bar" and "foo-jazz". The special wildcard resource name '\*' matches any name of the given resource type (`--topic` '\*' matches all topics). ### [](#operations)Operations Operations define what actions are allowed or denied on resources. Here are the available operations with common use cases: | Operation | Description | Common use case | | --- | --- | --- | | READ | Allows reading data from a resource | Consumers reading from topics, fetching consumer group offsets | | WRITE | Allows writing data to a resource | Producers publishing messages to topics | | CREATE | Allows creating new resources | Auto-creating topics, creating new consumer groups | | DELETE | Allows deleting resources | Removing topics, deleting consumer groups | | DESCRIBE | Allows querying resource metadata | Listing topics, getting topic configurations | | ALTER | Allows modifying resource properties | Changing topic partition counts, updating consumer group settings | | DESCRIBE_CONFIGS | Allows viewing resource configurations | Reading topic settings, broker configurations | | ALTER_CONFIGS | Allows modifying resource configurations | Changing topic retention policies, updating broker settings | | IDEMPOTENT_WRITE | Allows idempotent produce semantics initialization | Required for idempotent producers (InitProducerID) | | ALL | Grants all operations above | Administrative access to resources | Common combinations: - Producer: `WRITE` + `DESCRIBE` on topics - Consumer: `READ` + `DESCRIBE` on topics, `READ` on consumer groups - Admin: `ALL` on cluster and specific resources ### [](#producingconsuming)Producing/Consuming For quick reference, here are the ACL requirements for common client scenarios: | Client type | Required ACLs | | --- | --- | | Simple producer | WRITE + DESCRIBE on target topics | | Simple consumer | READ + DESCRIBE on target topicsREAD on consumer group | | Transactional producer | WRITE + DESCRIBE on target topicsWRITE on transactional ID | | Consumer group admin | READ + DESCRIBE on target topicsREAD + DESCRIBE + DELETE on consumer groups | Command examples: ```bash # Basic producer access rpk security acl create --allow-principal producer-user \ --operation write,describe --topic my-topic # Basic consumer access rpk security acl create --allow-principal consumer-user \ --operation read,describe --topic my-topic rpk security acl create --allow-principal consumer-user \ --operation read --group my-consumer-group ``` The following operations are necessary for each individual client request, where **resource** corresponds to the resource flag, and "for xyz" corresponds to the resource names in the request. Show operations PRODUCING/CONSUMING Produce WRITE on TOPIC for topics WRITE on TRANSACTIONAL\_ID for the transaction.id Fetch READ on TOPIC for topics ListOffsets DESCRIBE on TOPIC for topics Metadata DESCRIBE on TOPIC for topics CREATE on CLUSTER for kafka-cluster (if automatically creating topics) or, CREATE on TOPIC for topics (if automatically creating topics) InitProducerID IDEMPOTENT\_WRITE on CLUSTER or, WRITE on any TOPIC or, WRITE on TRANSACTIONAL\_ID for transactional.id (if using transactions) OffsetForLeaderEpoch DESCRIBE on TOPIC for topics GROUP CONSUMING FindCoordinator DESCRIBE on GROUP for group DESCRIBE on TRANSACTIONAL\_ID for transactional.id (transactions) OffsetCommit READ on GROUP for groups READ on TOPIC for topics OffsetFetch DESCRIBE on GROUP for groups DESCRIBE on TOPIC for topics OffsetDelete DELETE on GROUP for groups READ on TOPIC for topics JoinGroup READ on GROUP for group Heartbeat READ on GROUP for group LeaveGroup READ on GROUP for group SyncGroup READ on GROUP for group TRANSACTIONS (including FindCoordinator above) AddPartitionsToTxn WRITE on TRANSACTIONAL\_ID for transactional.id WRITE on TOPIC for topics AddOffsetsToTxn WRITE on TRANSACTIONAL\_ID for transactional.id READ on GROUP for group EndTxn WRITE on TRANSACTIONAL\_ID for transactional.id TxnOffsetCommit WRITE on TRANSACTIONAL\_ID for transactional.id READ on GROUP for group READ on TOPIC for topics ADMIN CreateTopics CREATE on CLUSTER for kafka-cluster CREATE on TOPIC for topics DESCRIBE\_CONFIGS on TOPIC for topics, for returning topic configs on create CreatePartitions ALTER on TOPIC for topics DeleteTopics DELETE on TOPIC for topics DESCRIBE on TOPIC for topics, if deleting by topic ID (in addition to prior ACL) DeleteRecords DELETE on TOPIC for topics DescribeGroup DESCRIBE on GROUP for groups ListGroups DESCRIBE on GROUP for groups or, DESCRIBE on CLUSTER for kafka-cluster DeleteGroups DELETE on GROUP for groups DescribeConfigs DESCRIBE\_CONFIGS on CLUSTER for cluster (broker describing) DESCRIBE\_CONFIGS on TOPIC for topics (topic describing) AlterConfigs ALTER\_CONFIGS on CLUSTER for cluster (broker altering) ALTER\_CONFIGS on TOPIC for topics (topic altering) AlterPartitionAssignments ALTER on CLUSTER for kafka-cluster ListPartitionReassignments DESCRIBE on CLUSTER for kafka-cluster AlterReplicaLogDirs ALTER on CLUSTER for kafka-cluster DescribeLogDirs DESCRIBE on CLUSTER for kafka-cluster AlterClientQuotas ALTER on CLUSTER for kafka-cluster DescribeClientQuotas DESCRIBE\_CONFIGS on CLUSTER for kafka-cluster AlterUserScramCreds ALTER on CLUSTER for kafka-cluster DescribeUserScramCreds DESCRIBE\_CONFIGS on CLUSTER for kafka-cluster DescribeProducers READ on TOPIC for topics DescribeTransactions DESCRIBE on TRANSACTIONAL\_ID for transactional.id DESCRIBE on TOPIC for topics ListTransactions DESCRIBE on TRANSACTIONAL\_ID for transactional.id REGISTRY GetGlobalConfig DESCRIBE\_CONFIGS on REGISTRY for schema registry UpdateGlobalConfig ALTER\_CONFIGS on REGISTRY for schema registry GetGlobalMode DESCRIBE\_CONFIGS on REGISTRY for schema registry UpdateGlobalMode ALTER\_CONFIGS on REGISTRY for schema registry GetReferencedBy DESCRIBE on REGISTRY for schema registry ListSchemasForId DESCRIBE on REGISTRY for schema registry ListSchemaTypes (no ACLs required) HealthCheck (no ACLs required) SUBJECT ListSubjects DESCRIBE on SUBJECT for subject CheckSchema READ on SUBJECT for subject RegisterSchema WRITE on SUBJECT for subject GetSchemaByVersion READ on SUBJECT for subject GetSchemaRaw READ on SUBJECT for subject ListSubjectVersions DESCRIBE on SUBJECT for subject DeleteSchemaVersion DELETE on SUBJECT for subject DeleteSubject DELETE on SUBJECT for subject GetSubjectConfig DESCRIBE\_CONFIGS on SUBJECT for subject UpdateSubjectConfig ALTER\_CONFIGS on SUBJECT for subject DeleteSubjectConfig ALTER\_CONFIGS on SUBJECT for subject GetSubjectMode DESCRIBE\_CONFIGS on SUBJECT for subject UpdateSubjectMode ALTER\_CONFIGS on SUBJECT for subject DeleteSubjectMode ALTER\_CONFIGS on SUBJECT for subject CheckCompatibility READ on SUBJECT for subject GetSchemaById READ on SUBJECT for subject To get this information with `rpk`, run: ```bash rpk security acl --help-operations ``` In flag form to set up a general producing/consuming client, you can invoke `rpk security acl create` up to three times with the following (including your `--allow-principal`): ```bash --operation write,read,describe --topic [topics] --operation describe,read --group [group.id] --operation describe,write --transactional-id [transactional.id] ``` ### [](#permissions)Permissions A client can be allowed access or denied access. By default, all permissions are denied. You only need to specifically deny a permission if you allow a wide set of permissions and then want to deny a specific permission in that set. You could allow all operations, and then specifically deny writing to topics. ### [](#management)Management Commands for managing users and ACLs work on a specific ACL basis, but listing and deleting ACLs works on filters. Filters allow matching many ACLs to be printed, listed, and deleted at the same time. Because this can be risky for deleting, the delete command prompts for confirmation by default. ## [](#acls-best-practices)ACLs best practices Follow these recommendations for secure and manageable ACL configurations. Security best practices: - Principle of least privilege: Grant only the minimum permissions required for each user or role - Avoid wildcards: Use specific resource names instead of `*` whenever possible - Separate environments: Use different principals for development, staging, and production - Regular audits: Periodically review and clean up unused ACLs Management best practices: - Use descriptive names: Choose clear user and topic names that indicate their purpose - Group related permissions: Create roles for users with similar access patterns - Document ACL decisions: Keep records of why specific permissions were granted Common pitfalls to avoid - Over-privileging: Granting `ALL` operations when specific ones would suffice - Forgetting consumer groups: Not granting necessary group permissions for consumers - Host restrictions: Accidentally blocking legitimate client connections with overly restrictive host rules - Pattern confusion: Mixing up literal vs. prefixed resource pattern types - Test ACLs: Verify permissions work as expected before deploying to production ## [](#manage-acls-with-rpk)Manage ACLs with rpk Use [`rpk security acl`](../../../reference/rpk/rpk-security/rpk-security-acl/) to manage ACLs and SASL/SCRAM users from the command line. ### [](#basic-workflow)Basic workflow Follow this typical workflow when setting up ACLs: 1. **Create a user**: `rpk security user create --password ''` 2. **Create ACLs**: `rpk security acl create --allow-principal --operation --topic ` 3. **Verify access**: `rpk security acl list --allow-principal ` Example setup: ```bash # 1. Create user rpk security user create data-processor \ --password 'secure-password' \ -X admin.hosts=localhost:9644 # 2. Grant topic access rpk security acl create --allow-principal data-processor \ --operation read,write,describe --topic 'data-*' \ --resource-pattern-type prefixed # 3. Grant consumer group access rpk security acl create --allow-principal data-processor \ --operation read,describe --group data-processing-group # 4. Verify the setup rpk security acl list --allow-principal data-processor ``` ### [](#command-overview)Command overview Here’s how `rpk` commands interact with Redpanda: | Command | Protocol | Default port | Purpose | | --- | --- | --- | --- | | list | Kafka API | 9092 | View existing ACLs | | create | Kafka API | 9092 | Create new ACLs | | delete | Kafka API | 9092 | Remove ACLs | ### [](#global-flags)Global flags Every [`rpk security acl`](../../../reference/rpk/rpk-security/rpk-security-acl/) command can use these flags: | Flag | Description | | --- | --- | | -X brokers | Comma-separated list of broker ip:port pairs (for example, --brokers '192.168.78.34:9092,192.168.78.35:9092,192.179.23.54:9092' ). Alternatively, you can set the RPK_BROKERS environment variable with the comma-separated list of broker addresses. | | --config | Redpanda configuration file. If not set, the file is searched in the default locations. | | -h, --help | Help. | | --password | SASL password to be used for authentication. | | --sasl-mechanism | The authentication mechanism to use. Supported values: SCRAM-SHA-256, SCRAM-SHA-512. | | --tls-cert | The certificate to be used for TLS authentication with the broker. | | --tls-enabled | Enable TLS for the Kafka API (not necessary if specifying custom certificates). This is assumed to be true when passing other --tls flags. | | --tls-key | The certificate key to be used for TLS authentication with the broker. | | --tls-truststore | The truststore to be used for TLS communication with the broker. | | --user | SASL user to be used for authentication. | ### [](#create-acls)Create ACLs With the create command, every ACL combination is a created ACL. At least one principal, one host, one resource, and one operation are required to create a single ACL. ```bash rpk security acl create/delete [globalACLFlags] [localFlags] ``` You can use the global flags and some other local flags. Following are the available local flags: | Flag | Description | | --- | --- | | --allow-host | Host for which access will be granted (repeatable). | | --allow-principal | Principals to which permissions will be granted (repeatable). | | --allow-role | Role to which permissions will be granted (repeatable). | | --cluster | Whether to grant ACLs to the cluster. | | --deny-host | Host from which access will be denied (repeatable). | | --deny-principal | Principal to which permissions will be denied (repeatable). | | --deny-role | Role to which permissions will be denied (repeatable). | | --group | Group to grant ACLs for (repeatable). | | -h, --help | Help. | | --name-pattern | The name pattern type to be used when matching the resource names. | | --operation | Operation that the principal will be allowed or denied. Can be passed many times. | | --resource-pattern-type | Pattern to use when matching resource names (literal or prefixed) (default "literal"). | | --topic | Topic to grant ACLs for (repeatable). | | --transactional-id | Transactional IDs to grant ACLs for (repeatable). | | --registry-subject | Schema Registry subject to grant ACLs for (repeatable). | | --registry-global | Grants ACLs for global Schema Registry operations (no name required). | Examples: To allow all permissions to user bar on topic "foo" and group "g", run: ```bash rpk security acl create --allow-principal bar --operation all --topic foo --group g ``` To allow read permissions to all users on topics biz and baz, run: ```bash rpk security acl create --allow-principal '*' --operation read --topic biz,baz ``` To allow write permissions to user buzz to transactional id "txn", run: ```bash rpk security acl create --allow-principal User:buzz --operation write --transactional-id txn ``` ### [](#list-and-delete-acls)List and delete ACLs List and delete for ACLs have a multiplying effect (similar to create ACL), but delete is more advanced. List and delete work on a filter basis. Any unspecified flag defaults to matching everything (all operations, or all allowed principals, and so on). To ensure that you don’t accidentally delete more than you intend, this command prints everything that matches your input filters and prompts for a confirmation before the delete request is issued. Anything matching more than 10 ACLs also asks for confirmation. If no resources are specified, all resources are matched. If no operations are specified, all operations are matched. You can opt in to matching everything. For example, `--operation any` matches any operation. The `--resource-pattern-type`, defaulting to `any`, configures how to filter resource names: - `any` returns exact name matches of either prefixed or literal pattern type - `match` returns wildcard matches, prefix patterns that match your input, and literal matches - `prefix` returns prefix patterns that match your input (prefix "fo" matches "foo") - `literal` returns exact name matches To list or delete ACLs, run: ```bash rpk security acl list/delete [globalACLFlags] [localFlags] ``` You can use the global flags and some other local flags. Following are the available local flags: | Flag | Description | | --- | --- | | --allow-host | Allowed host ACLs to list/remove. (repeatable) | | --allow-principal | Allowed principal ACLs to list/remove. (repeatable) | | --cluster | Whether to list/remove ACLs to the cluster. | | --deny-host | Denied host ACLs to list/remove. (repeatable) | | --deny-principal | Denied principal ACLs to list/remove. (repeatable) | | -d, --dry | Dry run: validate what would be deleted. | | --group | Group to list/remove ACLs for. (repeatable) | | -h, --help | Help. | | --no-confirm | Disable confirmation prompt. | | --operation | Operation to list/remove. (repeatable) | | -f, --print-filters | Print the filters that were requested. (failed filters are always printed) | | --resource-pattern-type | Pattern to use when matching resource names. (any, match, literal, or prefixed) (default "any") | | --topic | Topic to list/remove ACLs for. (repeatable) | | --transactional-id | Transactional IDs to list/remove ACLs for. (repeatable) | | --registry-subject | Schema Registry subject(s) to list/remove ACLs for. (repeatable) | | --registry-global | Match ACLs for global Schema Registry operations. | ### [](#user)User This command manages the SCRAM users. If SASL is enabled, a SCRAM user talks to Redpanda, and ACLs control what your user has access to. Using SASL requires setting `kafka_enable_authorization: true` in the Redpanda section of your `redpanda.yaml`. ```bash rpk security user [command] [globalACLFlags] [globalUserFlags] ``` Following are the available global user flags: | Flag | Description | Supported Value | | --- | --- | --- | | -X admin.hosts | The comma-separated list of IP addresses (IP:port). You must specify one for each broker. | strings | | -h, --help | -h, --help | Help. | ### [](#user-create)User create This command creates a single SASL/SCRAM user with the given password, and optionally with a custom mechanism. The mechanism determines which authentication flow the client uses for this user/password. Redpanda `rpk` supports the following mechanisms: `SCRAM-SHA-256` (default) and `SCRAM-SHA-512`, which is the same flow but uses sha512. Before a created SASL account can be used, you must also create ACLs to grant the account access to certain resources in your cluster. To create a SASL/SCRAM user, run: ```bash rpk security user create [user] -p [password] [globalACLFlags] [globalUserFlags] [localFlags] ``` Here are the local flags: | Flag | Description | | --- | --- | | -h, --help | Help. | | --mechanism | SASL mechanism to use: scram-sha-256 or scram-sha-512. Default is scram-sha-256. | ### [](#user-delete)User delete This command deletes the specified SASL account from Redpanda. This does not delete any ACLs that may exist for this user. You may want to re-create the user later, as well, not all ACLs have users that they describe (instead they are for wildcard users). ```bash rpk security user delete [USER] [globalACLFlags] [globalUserFlags] ``` ### [](#user-list)User list This command lists SASL users. ```bash rpk security user list [globalACLFlags] [globalUserFlags] ``` You can also use the shortened version changing `list` to `ls`. --- # Page 649: Authorization **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/cloud-authorization.md --- # Authorization --- title: Authorization latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/cloud-authorization page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/cloud-authorization.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/cloud-authorization.adoc description: Learn about user authorization and agent authorization in Redpanda Cloud. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-04-07" --- There are two types of authorization in Redpanda Cloud: - User authorization - Use [role-based access control (RBAC)](../rbac/) in the [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) and in the [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) to assign users access to specific resources. For example, you could grant everyone access to clusters in a development resource group while limiting access to clusters in a production resource group. Or, you could limit access to geographically-dispersed clusters in accordance with data residency laws. This alleviates the process of manually maintaining and verifying a set of ACLs for a user base that may contain thousands of users. - Use [group-based access control (GBAC)](../gbac/) in the [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) and in the [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) to manage permissions at the group level using OIDC. Assign OIDC groups to roles or create ACLs with `Group:` principals, so that users inherit access based on their group membership in your identity provider. Because group membership is managed by your identity provider, onboarding and offboarding require no changes in Redpanda. - Use Kafka [access control lists (ACLs)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#access-control-list-acl) to grant users permission to perform specific types of operations on specific resources (such as topics, groups, clusters, or transactional IDs). ACLs provide a way to configure fine-grained access to provisioned users. ACLs work with SASL/SCRAM and with mTLS with principal mapping for authentication. - BYOC agent authorization When deploying an agent as part of BYOC cluster provisioning, Redpanda Cloud automatically assigns IAM policies to the agent. The IAM policy permissions granted to the agent provide it the authorization required to fully manage Redpanda Cloud clusters in [AWS](../cloud-iam-policies/), [Azure](../cloud-iam-policies-azure/), or [GCP](../cloud-iam-policies-gcp/). > ❗ **IMPORTANT** > > IAM policies do not apply or act as deployment permissions, and there are no explicit user actions associated with IAM policies. Rather, IAM policy permissions apply to Redpanda Cloud agents _only_, and serve to provide Redpanda agents access to AWS, GCP, or Azure clusters so Redpanda brokers can communicate with them. --- # Page 650: Azure IAM Policies **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/cloud-iam-policies-azure.md --- # Azure IAM Policies --- title: Azure IAM Policies latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/cloud-iam-policies-azure page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/cloud-iam-policies-azure.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/cloud-iam-policies-azure.adoc description: See the IAM policies used by Azure. page-git-created-date: "2024-08-01" page-git-modified-date: "2024-10-21" --- When you run `rpk cloud byoc azure apply` to create a BYOC cluster, you grant IAM permissions to the Redpanda Cloud agent. IAM permissions allow the agent to access the Azure API to create and manage cluster resources. The permissions follow the principle of least privilege, limiting access to only what is necessary. IAM permissions are not required by Redpanda Cloud users. > 📝 **NOTE** > > - This page lists the IAM permissions Redpanda needs to create [BYOC clusters](../../../get-started/cluster-types/byoc/azure/create-byoc-cluster-azure/). This _does not_ pertain to [BYOVNet clusters](../../../get-started/cluster-types/byoc/azure/vnet-azure/). > > - No IAM permissions are required for Redpanda Cloud users. IAM policies do not grant user access to a cluster; rather, they grant the deployed Redpanda agent access, so that brokers can communicate with the BYOC clusters. Azure RBAC (role-based access control) is scoped to resource groups. For example: ```none "/subscriptions//resourceGroups/rg-rpcloud-cqh5itt4650ot3irs5mg", "/subscriptions//resourceGroups/rg-rpcloud-cqh5itt4650ot3irs5mg-network", "/subscriptions//resourceGroups/rg-rpcloud-cqh5itt4650ot3irs5mg-storage" ], "permissions": [ { ``` ## [](#azure-iam-policies)Azure IAM policies IAM policies are assigned to deployed Redpanda agents for BYOC Azure clusters that use the following Azure services: actions = \[ # Ability to read the resource group "Microsoft.Resources/subscriptions/resourcegroups/read", # Storage Containers "Microsoft.Storage/storageAccounts/blobServices/containers/delete", "Microsoft.Storage/storageAccounts/blobServices/containers/read", "Microsoft.Storage/storageAccounts/blobServices/containers/write", "Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action", # Create DNS Zones "Microsoft.Network/dnszones/read", "Microsoft.Network/dnszones/write", "Microsoft.Network/dnszones/delete", # Workaround for TF needing to import the zone when it already exists. "Microsoft.Network/dnszones/SOA/read", # Private link read "Microsoft.Network/privatelinkservices/read", # The agent needs access to the storage account in order to access the data "Microsoft.Storage/storageAccounts/read", # Manage AKS Clusters "Microsoft.ContainerService/managedClusters/read", "Microsoft.ContainerService/managedClusters/delete", "Microsoft.ContainerService/managedClusters/write", "Microsoft.ContainerService/managedClusters/agentPools/read", "Microsoft.ContainerService/managedClusters/agentPools/write", "Microsoft.ContainerService/managedClusters/agentPools/delete", "Microsoft.ContainerService/managedClusters/agentPools/upgradeNodeImageVersion/action", # Without this, cannot create node pools to the specified AKS cluster "Microsoft.ContainerService/managedClusters/listClusterUserCredential/action", # Allows joining to a VNet "Microsoft.Network/virtualNetworks/read", "Microsoft.Network/virtualNetworks/subnets/join/action", "Microsoft.Network/virtualNetworks/subnets/read", "Microsoft.Network/virtualNetworks/subnets/write", "Microsoft.Network/virtualNetworks/subnets/delete", # Allow agent to manage role assignments for the Redpanda cluster "Microsoft.Authorization/roleAssignments/read", "Microsoft.Authorization/roleAssignments/write", "Microsoft.Authorization/roleAssignments/delete", # Allow agent to manage role definitions for the Redpana cluster "Microsoft.Authorization/roleDefinitions/write", "Microsoft.Authorization/roleDefinitions/read", "Microsoft.Authorization/roleDefinitions/delete", # Allow agent to manage identities for the Redpanda cluster "Microsoft.ManagedIdentity/userAssignedIdentities/read", "Microsoft.ManagedIdentity/userAssignedIdentities/write", "Microsoft.ManagedIdentity/userAssignedIdentities/delete", "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action", "Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials/read", "Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials/write", "Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials/delete", # Allow agent to manage tiered storage bucket for the Redpanda cluster "Microsoft.Storage/storageAccounts/read", "Microsoft.Storage/storageAccounts/write", "Microsoft.Storage/storageAccounts/delete", "Microsoft.Storage/storageAccounts/blobServices/read", "Microsoft.Storage/storageAccounts/blobServices/write", # Allow agent to read public IPs "Microsoft.Network/publicIPAddresses/read", "Microsoft.Network/publicIPAddresses/write", "Microsoft.Network/publicIPAddresses/delete", # Creating the RP storage account requires these additional permissions to workaround https://github.com/hashicorp/terraform-provider-azurerm/issues/25521 "Microsoft.Storage/storageAccounts/queueServices/read", "Microsoft.Storage/storageAccounts/fileServices/read", "Microsoft.Storage/storageAccounts/fileServices/shares/read", "Microsoft.Storage/storageAccounts/listkeys/action", # Read the keyvault "Microsoft.KeyVault/vaults/read" \] data\_actions = \[ # Storage Containers "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action" \] --- # Page 651: GCP IAM Policies **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/cloud-iam-policies-gcp.md --- # GCP IAM Policies --- title: GCP IAM Policies latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/cloud-iam-policies-gcp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/cloud-iam-policies-gcp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/cloud-iam-policies-gcp.adoc description: See the IAM policies used by GCP. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-10-21" --- When you run `rpk cloud byoc gcp apply` to create a BYOC cluster, you grant IAM permissions to the Redpanda Cloud agent. IAM permissions allow the agent to access the GCP API to create and manage cluster resources. The permissions follow the principle of least privilege, limiting access to only what is necessary. IAM permissions are not required by Redpanda Cloud users. > 📝 **NOTE** > > - This page lists the IAM permissions the Redpanda agent service account uses to manage [BYOC cluster](../../../get-started/cluster-types/byoc/gcp/create-byoc-cluster-gcp/) resources. Your GCP account does not need these permissions for the initial Terraform bootstrap. This does _not_ pertain to permissions for [BYOVPC clusters](../../../get-started/cluster-types/byoc/gcp/vpc-byo-gcp/). > > - No IAM permissions are required for Redpanda Cloud users. IAM policies do not grant user access to a cluster; rather, they grant the deployed Redpanda agent access, so that brokers can communicate with the BYOC clusters. ## [](#gcp-iam-policies)GCP IAM policies The Redpanda agent service account for GCP is granted the following roles/permissions to manage Redpanda cluster resources: | Role/Permission | Description | | --- | --- | | compute.addresses.get | Allows a user to retrieve a specified address. | | compute.autoscalers.get | Allows a user to retrieve a specified autoscaler. | | compute.autoscalers.list | Allows a user to list autoscalers in a specified zone. | | compute.firewalls.create | Allows a user to create firewall rules to control inbound and outbound traffic for GCP instances. | | compute.firewalls.delete | Allows a user or service account to remove existing firewall rules from within a GCP project, modifying the network security configuration. | | compute.firewalls.get | Allows a user to view the details and configuration of a specific firewall rule for GCP projects. | | compute.firewalls.update | Allows a user to modify a specified firewall. | | compute.forwardingRules.create | Allows a user to create new forwarding rules within a project. | | compute.forwardingRules.delete | Allows a user to delete existing forwarding rules within a project. | | compute.forwardingRules.get | Allows a user to retrieve details about a specific forwarding rule within a project. | | compute.forwardingRules.pscCreate | Allows a user to create Private Service Connect forwarding rules within a project. | | compute.forwardingRules.pscDelete | Allows a user to delete Private Service Connect forwarding rules within a project. | | compute.forwardingRules.pscSetLabels | Allows a user to set or modify labels on Private Service Connect forwarding rules within a project. | | compute.forwardingRules.pscSetTarget | Allows a user to update the target service for a Private Service Connect forwarding rule. | | compute.forwardingRules.pscUpdate | Allows a user to update Private Service Connect forwarding rules within a project. | | compute.forwardingRules.setLabels | Allows a user to set, update, or remove labels on forwarding rules. | | compute.forwardingRules.setTarget | Allows a user to update the target of an existing forwarding rule. | | compute.forwardingRules.use | Allows a user to use a forwarding rule for traffic routing or other operations, without the ability to modify or delete it. | | compute.globalOperations.get | Allows a user to retrieve information about a specific global operation in a GCP project. | | compute.instanceGroupManagers.create | Allows a user to create a managed instance group. | | compute.instanceGroupManagers.delete | Allows a user to delete a specified managed instance group. | | compute.instanceGroupManagers.get | Allows a user or service account to retrieve details like the configuration, status, and properties of an instance group manager within GCP. | | compute.instanceGroupManagers.update | Allows a user to modify a specified managed instance group. | | compute.instanceGroups.create | Allows a user to create an instance group. | | compute.instanceGroups.delete | Allows a user to delete a specified instance group. | | compute.instanceGroups.get | Allows a user to retrieve a specified instance group. | | compute.instanceGroups.update | Allows a user to modify a specified instance group. | | compute.instances.create | Allows a user to create an instance. | | compute.instances.delete | Allows a user to delete a specified instance. | | compute.instances.get | Allows a user to retrieve a specified instance. | | compute.instances.list | Allows a user to list instances contained within a specified zone. | | compute.instances.reset | Allows a user to perform a reset on the specified instance. | | compute.instances.setDeletionProtection | Allows a user to enable deletion protection on a specified instance. | | compute.instances.update | Allows a user to modify a specified instance. | | compute.instances.use | Allows a user to use VM instances for operations, such as connecting to or interacting with the VM, but it does not grant the ability to modify or manage the instance itself. | | compute.instanceTemplates.create | Allows a user to create an instance template. | | compute.instanceTemplates.delete | Allows a user to delete a specified instance template. | | compute.instanceTemplates.get | Allows a user to retrieve a specified instance template. | | compute.networks.create | Allows a user to create a network. | | compute.networks.delete | Allows a user to delete a specified network. | | compute.networks.getEffectiveFirewalls | Allows a user to retrieve the effective firewalls for a specified network. | | compute.networks.update | Allows a user to modify a specified network. | | compute.networks.updatePolicy | Allows a user to update the configuration of existing GCP network resources. | | compute.networks.use | Allows a user to use a VPC network and its associated resources for tasks like launching instances or using network services, but it does not grant permission to modify the network itself. | | compute.projects.get | Allows a user or service account to retrieve information (such as project metadata, quotas, and configuration settings) about a specific GCP project. | | compute.regionBackendServices.create | Allows a user to create backend services in a specific region for a regional load balancer. | | compute.regionBackendServices.delete | Allows a user to delete backend services within a specific region. | | compute.regionBackendServices.get | Allows a user to retrieve information about a backend service within a specific region. | | compute.regionBackendServices.use | Allows a user to use a backend service in a specific region for operations like routing traffic, but does not grant the ability to modify or delete the backend service. | | compute.regionNetworkEndpointGroups.attachNetworkEndpoints | Allows a user to attach network endpoints to a regional network endpoint group (NEG). | | compute.regionNetworkEndpointGroups.create | Allows a user to create a NEG within a specific region. | | compute.regionNetworkEndpointGroups.delete | Allows a user to delete a NEG in a specific region. | | compute.regionNetworkEndpointGroups.detachNetworkEndpoints | Allows a user to remove network endpoints from a regional NEG. | | compute.regionNetworkEndpointGroups.get | Allows a user to retrieve information about a specific NEG within a region. | | compute.regionNetworkEndpointGroups.use | Allows a user to use a NEG within a specific region, typically for traffic routing and load balancing operations, without granting the ability to modify or delete the NEG itself. | | compute.regions.get | Allows a user to retrieve a specified region. | | compute.regions.list | Allows a user to retrieve a list of the available regions in a GCP project. | | compute.routers.get | Allows a user to retrieve a specified router. | | compute.serviceAttachments.create | Allows a user to create service attachments for Google Cloud services within a specific project or region. | | compute.serviceAttachments.delete | Allows a user to delete service attachments that are configured in a project or region. | | compute.serviceAttachments.get | Allows a user to retrieve information about an existing service attachment in a project or region. | | compute.serviceAttachments.list | Allows a user to list all service attachments within a project or region. | | compute.serviceAttachments.update | Allows a user to update or modify a service attachment in a project or region. | | compute.subnetworks.get | Allows a user to retrieve a specified subnetwork. | | compute.zoneOperations.get | Allows a user to retrieve a specified zone operation. | | compute.zoneOperations.list | Allows a user to list zone operations. | | compute.zones.get | Allows a user to retrieve a specified zone. | | compute.zones.list | Allows a user to retrieve a list of the available zones in a GCP project. | | dns.changes.create | Allows a user to create and update DNS resource record sets. | | dns.changes.get | Allows a user to retrieve the information about an existing DNS change. | | dns.changes.list | Allows a user to retrieve a list of changes to DNS resource record sets. | | dns.managedZones.create | Allows a user to create a new managed zone. A DNS managed zone holds the Domain Name System (DNS) records for the same DNS name suffix. | | dns.managedZones.delete | Allows a user or service account to delete managed zones within the Google Cloud DNS project. | | dns.managedZones.get | Allows a user or service account to retrieve information about a specific DNS managed zone. This permission is used in the context of Google Cloud DNS, which is a scalable and reliable domain name system (DNS) service. | | dns.managedZones.list | Allows a user or service account to list the managed zones within a Google Cloud DNS project. | | dns.managedZones.update | Allows a user to update or modify the configuration of a managed DNS zone within a Google Cloud DNS project. | | dns.projects.get | Allows a user to retrieve information about an existing GCP DNS project. | | dns.resourceRecordSets.create | Allows a user to create resource record sets within a DNS zone. | | dns.resourceRecordSets.delete | Allows a user to delete resource record sets within a DNS zone. | | dns.resourceRecordSets.get | Allows a user or service account to retrieve information about resource record sets within a managed DNS zone. | | dns.resourceRecordSets.list | Allows a user or service account to retrieve a list of resource record sets that are part of a particular DNS zone. | | dns.resourceRecordSets.update | Allows a user or service account to make changes to the resource records in a DNS zone. | | iam.roles.create | Allows a user to create a custom role for a GCP project or an organization. | | iam.roles.delete | Allows a user to delete a custom role from a GCP project or an organization. | | iam.roles.get | Allows a user to retrieve information about a specific role, including its permissions. | | iam.roles.list | Allows a user to list predefined roles, or the custom roles for a project or an organization. | | iam.roles.undelete | Allows a user to undelete a custom role from an organization or a project. | | iam.roles.update | Allows a user to update an IAM custom role. | | iam.serviceAccounts.actAs | Allows a service account to act as another service account or user within a GCP project. This permission is used to delegate authority to one service account to impersonate or perform actions on behalf of another service account or user. | | iam.serviceAccounts.create | Allows a user to create a service account for a project. | | iam.serviceAccounts.delete | Allows a user to delete a service account for a project. | | iam.serviceAccounts.get | Allows a user or service account to retrieve metadata and configuration information about a particular service account within a project. This includes information such as the email address, display name, and IAM policies associated with the service account. | | iam.serviceAccounts.getIamPolicy | Allows a user to retrieve the IAM policy for a service account. | | iam.serviceAccounts.setIamPolicy | Allows a user to set the IAM policy for a service account. | | iam.serviceAccounts.update | Allows a user to modify the service account for a project. | | logging.logEntries.create | Allows a user to write log entries. | | resourcemanager.projects.get | Allows a user or service account to view project details, such as project ID, name, labels, and other project-level settings. This permission controls the ability to retrieve the metadata and configuration of a project in GCP using the Resource Manager API. | | resourcemanager.projects.getIamPolicy | Allows a user or service account to retrieve the IAM access control policy for a specified project. Permission is denied if the policy or the resource does not exist. | | resourcemanager.projects.setIamPolicy | Allows a user or service account to set the IAM access control policy for the specified project. | | storage.buckets.get | Allows a user to retrieve metadata and configuration information about a specific bucket in Google Cloud Storage. Users with this permission can view details such as the bucket’s name, location, storage class, access control settings, and other attributes. | | storage.buckets.getIamPolicy | Allows a user to retrieve the IAM policy for a bucket. | | storage.buckets.setIamPolicy | Allows a user to set the IAM policy for a bucket. | | Storage Object Admin | Grants full control of bucket objects. The Redpanda Agent Storage Admin grant is scoped to a single bucket. | | Kubernetes Engine Admin | Full management of Kubernetes clusters and their Kubernetes API objects. | --- # Page 652: AWS IAM Policies **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/cloud-iam-policies.md --- # AWS IAM Policies --- title: AWS IAM Policies latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/cloud-iam-policies page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/cloud-iam-policies.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/cloud-iam-policies.adoc description: See the IAM policies used by AWS. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-10-21" --- When you run `rpk cloud byoc aws apply` to create a BYOC cluster, you grant IAM permissions to the Redpanda Cloud agent. IAM permissions allow the agent to access the AWS API to create and manage cluster resources. The permissions follow the principle of least privilege, limiting access to only what is necessary. IAM permissions are not required by Redpanda Cloud users. > 📝 **NOTE** > > - This page lists the IAM permissions Redpanda needs to create [BYOC clusters](../../../get-started/cluster-types/byoc/aws/create-byoc-cluster-aws/). This does _not_ pertain to [BYOVPC clusters](../../../get-started/cluster-types/byoc/aws/vpc-byo-aws/). > > - IAM permissions are not required for Redpanda Cloud users. IAM policies do not grant user access to a cluster; rather, they grant the deployed Redpanda agent access, so that brokers can communicate with the BYOC clusters. ## [](#aws-iam-policies)AWS IAM policies IAM policies are assigned to deployed Redpanda agents for BYOC AWS clusters that use the following AWS services: - [Amazon Elastic Compute Cloud (AWS EC2)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html) - [Amazon Elastic Compute Cloud Auto Scaling (AWS EC2 Auto Scaling)](https://aws.amazon.com/ec2/autoscaling/) - [Amazon Simple Storage Service (AWS S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) - [Amazon Route 53](https://aws.amazon.com/route53/) - [Amazon DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) ### [](#actions-allowed-with-wildcard-resources)Actions allowed with wildcard resources The following actions apply only to Redpanda agents with wildcard resources. RedpandaAgentActionsOnlyAllowedWithWildcardResources ```js statement { sid = "RedpandaAgentActionsOnlyAllowedWithWildcardResources" effect = "Allow" actions = [ "ec2:CreateTags", "ec2:DescribeAccountAttributes", "ec2:DescribeImages", "ec2:DescribeInstances", "ec2:DescribeInstanceTypes", "ec2:CreateLaunchTemplate", "ec2:CreateLaunchTemplateVersion", "ec2:DescribeLaunchTemplateVersions", "ec2:DescribeLaunchTemplates", "iam:ListPolicies", "iam:ListRoles", "iam:GetOpenIDConnectProvider", "iam:DeleteOpenIDConnectProvider", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeTags", "autoscaling:DescribeTerminationPolicyTypes", "autoscaling:DescribeInstanceRefreshes", "autoscaling:DescribeLaunchConfigurations", "iam:CreateServiceLinkedRole", "ec2:CreatePlacementGroup", "ec2:DeletePlacementGroup", "ec2:DescribePlacementGroups", "eks:DescribeNodegroup", "eks:DeleteNodegroup" ] resources = [ "*", ] } ``` ### [](#run-in-ec2-instances)Run in EC2 instances The following actions apply only to Redpanda agents running in AWS EC2 instances. RedpandaAgentEC2RunInstances ```js statement { sid = "RedpandaAgentEC2RunInstances" effect = "Allow" actions = [ "ec2:RunInstances", ] resources = [ "arn:aws:ec2:*:${local.aws_account_id}:instance/*", "arn:aws:ec2:*:${local.aws_account_id}:network-interface/*", "arn:aws:ec2:*:${local.aws_account_id}:volume/*", "arn:aws:ec2:*:${local.aws_account_id}:security-group/*", "arn:aws:ec2:*:${local.aws_account_id}:subnet/*", "arn:aws:ec2:*:${local.aws_account_id}:launch-template/*", "arn:aws:ec2:*::image/*", ] } ``` ### [](#delete-launch-templates)Delete launch templates The following actions apply only to Redpanda agents deleting AWS launch templates. RedpandaAgentEC2RunInstances ```js statement { sid = "RedpandaAgentLaunchTemplateDeletion" effect = "Allow" actions = [ "ec2:DeleteLaunchTemplate", ] resources = [ "arn:aws:ec2:__:${local.aws_account_id}:launch-template/__", ] condition { test = "StringEquals" variable = "ec2:ResourceTag/redpanda-id" values = [ var.redpanda_id, ] } } ``` ### [](#manage-security-groups)Manage security groups The following actions apply only to Redpanda agents managing AWS security groups. RedpandaAgentSecurityGroups ```js statement { sid = "RedpandaAgentSecurityGroups" effect = "Allow" actions = [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateSecurityGroup", "ec2:DeleteSecurityGroup", "ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:UpdateSecurityGroupRuleDescriptionsIngress", "ec2:UpdateSecurityGroupRuleDescriptionsEgress", "ec2:ModifySecurityGroupRules", ] resources = [ "arn:aws:ec2:*:${local.aws_account_id}:security-group/*", "arn:aws:ec2:*:${local.aws_account_id}:vpc/${local.network_config.vpc_id}", ] } ``` ### [](#manage-eks-clusters)Manage EKS clusters The following actions apply only to Redpanda agents managing Amazon Elastic Kubernetes Service (Amazon EKS) clusters. RedpandaAgentEKSCluster ```js statement { sid = "RedpandaAgentEKSCluster" effect = "Allow" actions = [ "eks:__", ] resources = [ "arn:aws:eks:__:${local.aws_account_id}:cluster/redpanda-${var.redpanda_id}", ] } ``` ### [](#manage-instance-profiles)Manage instance profiles The following actions apply only to Redpanda agents managing AWS instance profiles. RedpandaAgentInstanceProfile ```js statement { sid = "RedpandaAgentInstanceProfile" effect = "Allow" actions = [ "iam:AddRoleToInstanceProfile", "iam:RemoveRoleFromInstanceProfile", "iam:CreateInstanceProfile", "iam:DeleteInstanceProfile", "iam:GetInstanceProfile", "iam:TagInstanceProfile", ] resources = [ "arn:aws:iam::${local.aws_account_id}:instance-profile/redpanda-${var.redpanda_id}*", "arn:aws:iam::${local.aws_account_id}:instance-profile/redpanda-agent-${var.redpanda_id}*", ] } ``` ### [](#create-eks-oidc-providers)Create EKS OIDC providers The following actions apply only to Redpanda agents creating and accessing AWS EKS OIDC providers. RedpandaAgentEKSOIDCProvider ```js statement { sid = "RedpandaAgentEKSOIDCProvider" effect = "Allow" actions = [ "iam:CreateOpenIDConnectProvider", "iam:TagOpenIDConnectProvider", "iam:UntagOpenIDConnectProvider", ] resources = [ "arn:aws:iam::${local.aws_account_id}:oidc-provider/oidc.eks.*.amazonaws.com", ] } statement { sid = "RedpandaAgentEKSOIDCProviderCACertThumbprintUpdate" effect = "Allow" actions = [ "iam:UpdateOpenIDConnectProviderThumbprint", ] resources = [ "arn:aws:iam::${local.aws_account_id}:oidc-provider/oidc.eks.*.amazonaws.com", "arn:aws:iam::${local.aws_account_id}:oidc-provider/oidc.eks.*.amazonaws.com/id/*", ] condition { test = "StringEquals" variable = "aws:ResourceTag/redpanda-id" values = [ var.redpanda_id, ] } } ``` ### [](#manage-iam-policies)Manage IAM policies The following actions apply only to Redpanda agents managing AWS IAM policies. RedpandaAgentIAMPolicies ```js statement { sid = "RedpandaAgentIAMPolicies" effect = "Allow" actions = [ "iam:CreatePolicy", "iam:DeletePolicy", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:ListPolicyVersions", "iam:TagPolicy" ] resources = [ "arn:aws:iam::${local.aws_account_id}:policy/aws_ebs_csi_driver-redpanda-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/cert_manager_policy-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/external_dns_policy-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/load_balancer_controller-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/redpanda-agent-${var.redpanda_id}*", "arn:aws:iam::${local.aws_account_id}:policy/redpanda-${var.redpanda_id}-autoscaler", "arn:aws:iam::${local.aws_account_id}:policy/redpanda-cloud-storage-manager-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/secrets_manager_policy-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/redpanda-connectors-secrets-manager-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:policy/redpanda-console-secrets-manager-${var.redpanda_id}", ] } ``` ### [](#manage-iam-roles)Manage IAM roles The following actions apply only to Redpanda agents managing AWS IAM roles. RedpandaAgentIAMRoleManagement ```js statement { sid = "RedpandaAgentIAMRoleManagement" effect = "Allow" actions = [ "iam:CreateRole", "iam:DeleteRole", "iam:AttachRolePolicy", "iam:DetachRolePolicy", "iam:GetRole", "iam:TagRole", "iam:PassRole", "iam:ListAttachedRolePolicies", "iam:ListInstanceProfilesForRole", "iam:ListRolePolicies", ] resources = [ "arn:aws:iam::${local.aws_account_id}:role/redpanda-cloud-storage-manager-${var.redpanda_id}", "arn:aws:iam::${local.aws_account_id}:role/redpanda-agent-${var.redpanda_id}_", "arn:aws:iam::${local.aws_account_id}:role/redpanda-${var.redpanda_id}_", "arn:aws:iam::${local.aws_account_id}:role/redpanda-connectors-secrets-manager-${var.redpanda_id}_", "arn:aws:iam::${local.aws_account_id}:role/redpanda-console-secrets-manager-${var.redpanda_id}_", ] } ``` ### [](#manage-s3-buckets)Manage S3 buckets The following actions apply only to Redpanda agents managing AWS Simple Storage Service (S3) buckets. RedpandaAgentS3ManagementBucket ```js statement { sid = "RedpandaAgentS3ManagementBucket" effect = "Allow" actions = [ "s3:*", ] resources = [ data.aws_s3_bucket.management.arn, "${data.aws_s3_bucket.management.arn}/*", ] } ``` ### [](#manage-s3-cloud-bucket-storage)Manage S3 cloud bucket storage The following actions apply only to Redpanda agents managing AWS S3 cloud bucket storage. RedpandaAgentS3ManagementBucket ```js statement { sid = "RedpandaAgentS3CloudStorageBucket" effect = "Allow" actions = [ "s3:List*", "s3:Get*", "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketPolicy", "s3:DeleteBucketPolicy", ] resources = [ local.redpanda_cloud_storage_bucket_arn, "${local.redpanda_cloud_storage_bucket_arn}/*", ] } ``` ### [](#manage-virtual-private-cloud-vpc)Manage virtual private cloud (VPC) The following actions apply only to Redpanda agents managing AWS VPCs. RedpandaAgentVPCManagement ```js statement { sid = "RedpandaAgentVPCManagement" effect = "Allow" actions = [ "ec2:DescribeVpcs", "ec2:DescribeVpcAttribute", "ec2:DescribeSecurityGroups", "ec2:CreateInternetGateway", "ec2:DeleteInternetGateway", "ec2:AttachInternetGateway", "ec2:DescribeInternetGateways", "ec2:CreateNatGateway", "ec2:DeleteNatGateway", "ec2:DescribeNatGateways", "ec2:CreateRoute", "ec2:DeleteRoute", "ec2:CreateRouteTable", "ec2:DeleteRouteTable", "ec2:DescribeRouteTables", "ec2:AssociateRouteTable", "ec2:CreateSubnet", "ec2:DeleteSubnet", "ec2:DescribeSubnets", "ec2:CreateVpcEndpoint", "ec2:ModifyVpcEndpoint", "ec2:DeleteVpcEndpoints", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcPeeringConnections", "ec2:ModifyVpcPeeringConnectionOptions", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaces", "ec2:AttachNetworkInterface", "ec2:DetachNetworkInterface", "ec2:DescribeAvailabilityZones", ] resources = [ "*", ] } ``` ### [](#delete-network-interface)Delete network interface The following actions apply only to Redpanda agents deleting AWS network interfaces. RedpandaAgentNetworkInterfaceDelete ```js statement { sid = "RedpandaAgentNetworkInterfaceDelete" effect = "Allow" actions = [ "ec2:DeleteNetworkInterface", ] resources = [ "arn:aws:ec2:__:${local.aws_account_id}:network-interface/__", ] } ``` ### [](#create-vpc-peering)Create VPC peering The following actions apply only to Redpanda agents creating AWS VPC peering. RedpandaAgentVPCPeeringsCreate ```js statement { sid = "RedpandaAgentVPCPeeringsCreate" effect = "Allow" actions = [ "ec2:CreateVpcPeeringConnection", ] resources = [ "arn:aws:ec2:*:${local.aws_account_id}:vpc/${local.network_config.vpc_id}", ] } ``` ### [](#delete-vpc-peering)Delete VPC peering The following actions apply only to Redpanda agents deleting AWS VPC peering. RedpandaAgentVPCPeeringsDelete ```js statement { sid = "RedpandaAgentVPCPeeringsDelete" effect = "Allow" actions = [ "ec2:DeleteVpcPeeringConnection", "ec2:ModifyVpcPeeringConnectionOptions", ] resources = [ "arn:aws:ec2:__:${local.aws_account_id}:vpc-peering-connection/__", ] condition { test = "StringEquals" variable = "ec2:ResourceTag/redpanda-id" values = [ var.redpanda_id, ] } } ``` ### [](#manage-dynamodb-terraform-backend)Manage DynamoDB Terraform backend The following actions apply only to Redpanda agents managing the AWS DynamoDB Terraform backend. RedpandaAgentTFBackend ```js statement { sid = "RedpandaAgentTFBackend" effect = "Allow" actions = [ "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:DeleteItem", ] resources = [ "arn:aws:dynamodb:*:${local.aws_account_id}:table/rp-${local.aws_account_id}*", ] } ``` ### [](#manage-route-53)Manage Route 53 The following actions apply only to Redpanda agents managing the AWS Route 53 service. RedpandaAgentRoute53Management ```js statement { sid = "RedpandaAgentRoute53Management" effect = "Allow" actions = [ "route53:CreateHostedZone", "route53:GetChange", "route53:ChangeTagsForResource", "route53:GetHostedZone", "route53:ListTagsForResource", "route53:ListResourceRecordSets", "route53:ChangeResourceRecordSets", "route53:GetDNSSEC", "route53:DeleteHostedZone", ] resources = [ "*", ] } ``` ### [](#manage-auto-scaling)Manage Auto Scaling The following actions apply only to Redpanda agents managing the AWS Auto Scaling. RedpandaAgentAutoscaling ```js statement { sid = "RedpandaAgentAutoscaling" effect = "Allow" actions = [ "autoscaling:*", ] resources = [ "arn:aws:autoscaling:*:${local.aws_account_id}:autoScalingGroup:*:autoScalingGroupName/redpanda-${var.redpanda_id}*", "arn:aws:autoscaling:*:${local.aws_account_id}:autoScalingGroup:*:autoScalingGroupName/redpanda-agent-${var.redpanda_id}*" ] } ``` --- # Page 653: Group-Based Access Control (GBAC) **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/gbac.md --- # Group-Based Access Control (GBAC) --- title: Group-Based Access Control (GBAC) latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/gbac/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/gbac/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/gbac/index.adoc description: Configure group-based access control (GBAC) in the control plane and in the data plane. page-git-created-date: "2026-04-07" page-git-modified-date: "2026-04-07" --- Configure GBAC in the control plane and in the data plane to manage permissions using OIDC groups from your identity provider. - [Configure GBAC in the Control Plane](gbac/) Configure GBAC to manage access to organization-level resources, like clusters, resource groups, and networks, using OIDC groups from your identity provider. - [Configure GBAC in the Data Plane](gbac_dp/) Configure GBAC to manage access for provisioned users to cluster-level resources, like topics and consumer groups, using OIDC groups from your identity provider. --- # Page 654: Configure GBAC in the Data Plane **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/gbac/gbac_dp.md --- # Configure GBAC in the Data Plane --- title: Configure GBAC in the Data Plane latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/gbac/gbac_dp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/gbac/gbac_dp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/gbac/gbac_dp.adoc description: Configure GBAC to manage access for provisioned users to cluster-level resources, like topics and consumer groups, using OIDC groups from your identity provider. page-topic-type: how-to learning-objective-1: Configure the cluster properties that enable GBAC learning-objective-2: Assign an OIDC group to an RBAC role learning-objective-3: "Create a group-based ACL using the Group: principal prefix" page-git-created-date: "2026-04-07" page-git-modified-date: "2026-04-07" --- > 📝 **NOTE** > > This feature is available for BYOC and Dedicated clusters. Group-based access control (GBAC) in the [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) lets you manage Redpanda permissions at scale using the groups that already exist in your identity provider (IdP). Instead of creating and maintaining per-user permissions in Redpanda, you define access once for a group and your IdP controls who belongs to it. When users join or leave a team, their Redpanda access updates automatically at next login with no changes needed in Redpanda. GBAC extends [OIDC authentication](../../../cloud-authentication/#single-sign-on) and supports two ways to grant permissions to groups: create [ACLs](../../acl/) with `Group:` principals, or assign groups as members of [RBAC](../../rbac/) roles. Both approaches can be used independently or together. After reading this page, you will be able to: - Configure the cluster properties that enable GBAC - Assign an OIDC group to an RBAC role - Create a group-based ACL using the Group: principal prefix ## [](#prerequisites)Prerequisites To use GBAC, you need: - [OIDC authentication](../../../cloud-authentication/#single-sign-on) configured and enabled on your cluster. - Your IdP configured to include group claims in the OIDC access token (for example, a `groups` claim). ## [](#how-gbac-works)How GBAC works When a user authenticates with OIDC, Redpanda reads a configurable claim from the JWT access token (for example, `$.groups`) and extracts the list of groups the user belongs to. Redpanda then matches those group names against `Group:` principals in its ACLs and role assignments. Group membership is managed entirely by your IdP. Redpanda never stores or manages group membership directly. It reads group information from the OIDC token at authentication time. Changes you make in the IdP (adding or removing group memberships) take effect at the user’s next authentication, when a new token is issued. GBAC works across the following Redpanda APIs: - Kafka API - Schema Registry - HTTP Proxy ### [](#authorization-patterns)Authorization patterns GBAC supports two usage patterns: - Group as an ACL principal: Create an ACL with a `Group:` principal. Users in that group receive that permission directly. - Group assigned to a role: Assign a group as a member of a role-based access control (RBAC) role. All users in the group inherit the role’s ACLs. Both patterns can be used together. When a user belongs to multiple groups, they inherit the combined permissions of all groups. Redpanda evaluates all authorization sources (user ACLs, role ACLs, group ACLs, and group-to-role ACLs) in a single unified flow. Deny rules are checked first across all sources. If any source produces a deny, Redpanda rejects the request regardless of allows from other sources. If no deny is found, Redpanda checks for an allow across all sources. If no allow is found, Redpanda denies the request by default. flowchart LR A\[Request\] --> B{"Check all sources\\nfor deny"} B -- "Deny found" --> DENY\["❌ Deny"\] B -- "No deny found" --> C{"Check all sources\\nfor allow"} C -- "Allow found" --> ALLOW\["✅ Allow"\] C -- "No allow found" --> DEFAULT\["❌ Default deny"\] style DENY fill:#f44,color:#fff style ALLOW fill:#4a4,color:#fff style DEFAULT fill:#f44,color:#fff subgraph sources \[" "\] direction LR S1\["User ACLs"\] S2\["Role ACLs\\n(RBAC)"\] S3\["Group ACLs"\] S4\["Group→Role\\nACLs"\] end Figure 1. Authorization evaluation flow ## [](#supported-identity-providers)Supported identity providers GBAC works with any OIDC-compliant identity provider. These providers are commonly used with Redpanda: - [Auth0](https://auth0.com/docs/secure/tokens/json-web-tokens/create-custom-claims): Configure group claims in Auth0 Actions or Rules. - [Okta](https://developer.okta.com/docs/concepts/universal-directory/): Assign groups to applications and include them in token claims. - [Microsoft Entra ID (Azure AD)](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-fed-group-claims): Configure group claims in the application manifest. For IdP-specific configuration steps, see your provider’s documentation. ## [](#limitations)Limitations - Azure AD group limit: Users with more than 200 group memberships in Azure AD receive a URL reference in their token instead of a list of group names. Redpanda does not follow that URL and cannot resolve groups in this case. Mitigation: Filter token claims to include only the groups relevant to Redpanda. - Nested groups: Redpanda does not recursively resolve nested group hierarchies. If group A contains group B, only the direct memberships reported in the token are used. Use [`nested_group_behavior: suffix`](../../../../reference/properties/cluster-properties/#nested_group_behavior) to extract the last path segment from hierarchical group names when needed. - No wildcard ACLs for groups: ACL matching for `Group:` principals uses literal string comparison only. Wildcard patterns are not supported. ## [](#configure-token-claim-extraction)Configure token claim extraction Different identity providers store group information in different locations within the JWT token. In Redpanda Cloud, group claim extraction is configured through your SSO connection settings. 1. In the Cloud UI, navigate to **Organization IAM > Single sign-on**, then select your IdP connection. 2. For Mapping mode, select **use\_map**. 3. Configure Attributes (JSON) to map attribute names to claim paths, including `federated_groups` for group claims. A claim path is a [JSON path](https://goessner.net/articles/JsonPath/) expression that tells Redpanda where to find group information in the OIDC token. The appropriate claim path for each attribute may vary per IdP. For example, Okta exposes group claims in `${context.userinfo.groups}`. In this case, you must also include `groups` in **Userinfo scope**. ### [](#token-structure-examples)Token structure examples The following examples show how Redpanda extracts group principals from different token formats. #### [](#flat-group-values-default)Flat group values (default) With `oidc_group_claim_path: "$.groups"`, Redpanda extracts principals `Group:engineering` and `Group:analytics` from the token. ```json {"groups": ["engineering", "analytics"]} ``` #### [](#nested-claim)Nested claim With `oidc_group_claim_path: "$.realm_access.roles"`, Redpanda extracts principals `Group:eng` and `Group:fin` from the token. ```json {"realm_access": {"roles": ["eng", "fin"]}} ``` #### [](#path-style-group-names-with-no-suffix-extraction-default)Path-style group names with no suffix extraction (default) With `nested_group_behavior: "none"` (the default), Redpanda maps the full path to principals `Group:/departments/eng/platform` and `Group:/departments/eng/infra`. ```json {"groups": ["/departments/eng/platform", "/departments/eng/infra"]} ``` #### [](#csv-formatted-group-claim)CSV-formatted group claim Some identity providers return group claims as a single comma-separated string instead of an array. ```json {"groups": "engineering,analytics,finance"} ``` Redpanda automatically splits comma-separated values and extracts principals `Group:engineering`, `Group:analytics`, and `Group:finance`. ## [](#create-group-based-acls)Create group-based ACLs You can grant permissions directly to a group by creating an [ACL](../../acl/) with a `Group:` principal. This works the same as creating an ACL for a user, but uses the `Group:` prefix instead of `User:`. ### rpk To grant cluster-level access to the `engineering` group: ```bash rpk security acl create --allow-principal Group:engineering --operation describe --cluster ``` To grant topic-level access: ```bash rpk security acl create \ --allow-principal Group:engineering \ --operation read,describe \ --topic 'analytics-' \ --resource-pattern-type prefixed ``` ### Redpanda Cloud In Redpanda Cloud, group-based ACLs are managed through roles. To create an ACL for an OIDC group: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Click **Create role** to open the role creation form, or select an existing role and click **Edit**. 3. For **User/principal**, enter the group principal using the `Group:` format. For example, `Group:engineering`. 4. Define the permissions (ACLs) you want to grant to users in the group. You can configure ACLs for clusters, topics, consumer groups, transactional IDs, Schema Registry subjects, and Schema Registry operations. 5. Click **Create** (or **Update** if editing an existing role). > 📝 **NOTE** > > Redpanda Cloud assigns ACLs through roles. To grant permissions to a group, create a role for that group, add the group as a principal, and define the ACLs on the role. To create ACLs with a `Group:` principal directly (without creating a role), use `rpk`. ### Data Plane API 1. First, retrieve your cluster’s Data Plane API URL: ```bash export DATAPLANE_API_URL=$(curl -s https://api.redpanda.com/v1/clusters/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " | jq -r .cluster.dataplane_api) ``` 2. Make a [`POST /v1/acls`](/api/doc/cloud-dataplane/operation/operation-aclservice_createacl) request with a `Group:` principal. For example, to grant the `engineering` group read access to a topic: ```bash curl -X POST "${DATAPLANE_API_URL}/v1/acls" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "resource_type": "RESOURCE_TYPE_TOPIC", "resource_name": "analytics-events", "resource_pattern_type": "RESOURCE_PATTERN_TYPE_LITERAL", "principal": "Group:engineering", "host": "*", "operation": "OPERATION_READ", "permission_type": "PERMISSION_TYPE_ALLOW" }' ``` ## [](#assign-groups-to-roles)Assign groups to roles To manage permissions at scale, assign a group to an [RBAC](../../rbac/) role. All users in the group inherit the role’s ACLs automatically. ### rpk To assign a group to a role: ```bash rpk security role assign --principal Group: ``` For example, to assign the `engineering` group to the `DataEngineers` role: ```bash rpk security role assign DataEngineers --principal Group:engineering ``` To remove a group from a role: ```bash rpk security role unassign --principal Group: ``` For example: ```bash rpk security role unassign DataEngineers --principal Group:engineering ``` ### Redpanda Cloud To assign a group to a role in Redpanda Cloud: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role you want to assign the group to. 3. Click **Edit**. 4. For **User/principal**, enter the group name using the `Group:` format. For example, `Group:engineering`. 5. Click **Update**. To remove a group from a role: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role that has the group assignment you want to remove. 3. Click **Edit**. 4. For **User/principal**, remove the `Group:` entry. 5. Click **Update**. ### Data Plane API Make a [`PUT /v1/roles/{role_name}`](/api/doc/cloud-dataplane/operation/operation-securityservice_updaterolemembership) request to assign a group to a role: ```bash curl -X PUT "${DATAPLANE_API_URL}/v1/roles/DataEngineers" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "add": [{"principal": "Group:engineering"}] }' ``` To remove a group from a role, use the `remove` field: ```bash curl -X PUT "${DATAPLANE_API_URL}/v1/roles/DataEngineers" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "remove": [{"principal": "Group:engineering"}] }' ``` ## [](#view-groups-and-roles)View groups and roles Use the following commands to inspect group assignments and role memberships. ### [](#list-groups-assigned-to-a-role)List groups assigned to a role #### rpk To see which groups are assigned to a role, use `--print-members`. Groups are listed alongside other principals such as `User:` and appear as `Group:` entries: ```bash rpk security role describe --print-members ``` For example: ```bash rpk security role describe DataEngineers --print-members ``` To list all roles assigned to a specific group: ```bash rpk security role list --principal Group: ``` For example: ```bash rpk security role list --principal Group:engineering ``` #### Redpanda Cloud To view groups assigned to a role in Redpanda Cloud: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role you want to inspect. 3. The role details page lists all principals, including any `Group:` entries. #### Data Plane API To list all members of a role (including groups), make a [`GET /v1/roles/{role_name}/members`](/api/doc/cloud-dataplane/operation/operation-securityservice_listrolemembers) request: ```bash curl -X GET "${DATAPLANE_API_URL}/v1/roles/DataEngineers/members" \ -H "Authorization: Bearer " ``` The response includes a `members` array. Group members appear with the `Group:` prefix in the `principal` field. To list all roles assigned to a specific group, make a [`GET /v1/roles`](/api/doc/cloud-dataplane/operation/operation-securityservice_listroles) request with a principal filter: ```bash curl -X GET "${DATAPLANE_API_URL}/v1/roles?filter.principal=Group:engineering" \ -H "Authorization: Bearer " ``` ## [](#audit-logging)Audit logging When [audit logging](../../../../manage/audit-logging/) is enabled, Redpanda includes group information in the following event types: - Authentication events: Events across Kafka API, HTTP Proxy, and Schema Registry include the user’s IdP group memberships in the `user.groups` field with type `idp_group`. - Authorization events: When an authorization decision matches a group ACL, the matched group appears in the `actor.user.groups` field with type `idp_group`. ## [](#next-steps)Next steps - [Set up audit logging](../../../../manage/audit-logging/) to monitor group-based access events. ## [](#suggested-reading)Suggested reading - [Configure GBAC in the Control Plane](../gbac/) - [Configure RBAC in the Control Plane](../../rbac/rbac/) - [Configure RBAC in the Data Plane](../../rbac/rbac_dp/) - [Single sign-on](../../../cloud-authentication/#single-sign-on) --- # Page 655: Configure GBAC in the Control Plane **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/gbac/gbac.md --- # Configure GBAC in the Control Plane --- title: Configure GBAC in the Control Plane latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/gbac/gbac page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/gbac/gbac.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/gbac/gbac.adoc description: Configure GBAC to manage access to organization-level resources, like clusters, resource groups, and networks, using OIDC groups from your identity provider. page-topic-type: how-to learning-objective-1: Register an OIDC group in Redpanda Cloud learning-objective-2: Assign a predefined or custom role to a group learning-objective-3: Manage group-based access at the organization level page-git-created-date: "2026-04-07" page-git-modified-date: "2026-04-07" --- > 📝 **NOTE** > > This feature is available for BYOC and Dedicated clusters. Use Redpanda Cloud group-based access control (GBAC) in the [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) to manage access to organization resources based on OIDC groups from your identity provider (IdP). For example, you can grant one group access to development clusters while restricting production access to another group. You can also restrict access to geographically dispersed clusters to support data residency requirements. When a user’s group membership changes in the IdP, their Redpanda access updates automatically. After reading this page, you will be able to: - Register an OIDC group in Redpanda Cloud - Assign a predefined or custom role to a group - Manage group-based access at the organization level ## [](#gbac-terminology)GBAC terminology **Group**: A group is a collection of users defined in your IdP. With GBAC, you can assign groups to roles or ACLs in Redpanda Cloud, so that users inherit permissions based on their group membership in your IdP. **Role**: A role is a list of permissions. Permissions are attached to roles. Users assigned multiple roles receive the union of all permissions defined in those roles. Redpanda Cloud has several predefined roles that you cannot modify or delete, including Reader, Writer, and Admin. You can also create custom roles. **Role binding**: Role binding assigns a role to an account. Administrators can add, edit, or remove role bindings for a user. When you change the permissions for a given role, all users and service accounts with that role automatically get the modified permissions. ## [](#manage-organization-access)Manage organization access In the Redpanda Cloud Console, the **Organization IAM** page lets you create groups. When you create a group, you define its permissions with role binding. When you edit a group, you can change its role bindings to update the group’s permissions. When you change the permissions for a given role, all groups with that role automatically get the modified permissions. Various resources can be assigned as the scope of a role, including the following: - Organization - Resource group - Network - Network peering - Cluster (Serverless clusters have a different set of permissions from BYOC and Dedicated clusters.) - MCP server You can manage GBAC configurations with the [Redpanda Cloud Console](https://cloud.redpanda.com) or with the [Control Plane API](/api/doc/cloud-controlplane/). ## [](#configure-group-claim-extraction)Configure group claim extraction Different identity providers structure group information differently in their OIDC tokens. Before you register groups, configure your SSO connection to tell Redpanda Cloud where to find group claims in the token. In Redpanda Cloud, group claim extraction is configured through your SSO connection settings. 1. In the Cloud UI, navigate to **Organization IAM > Single sign-on**, then select your IdP connection. 2. For Mapping mode, select **use\_map**. 3. Configure Attributes (JSON) to map attribute names to claim paths, including `federated_groups` for group claims. A claim path is a [JSON path](https://goessner.net/articles/JsonPath/) expression that tells Redpanda where to find group information in the OIDC token. The appropriate claim path for each attribute may vary per IdP. For example, Okta exposes group claims in `${context.userinfo.groups}`. In this case, you must also include `groups` in **Userinfo scope**. ### [](#token-structure-examples)Token structure examples The following examples show how Redpanda extracts group principals from different token formats. #### [](#flat-group-values-default)Flat group values (default) With `oidc_group_claim_path: "$.groups"`, Redpanda extracts principals `Group:engineering` and `Group:analytics` from the token. ```json {"groups": ["engineering", "analytics"]} ``` #### [](#nested-claim)Nested claim With `oidc_group_claim_path: "$.realm_access.roles"`, Redpanda extracts principals `Group:eng` and `Group:fin` from the token. ```json {"realm_access": {"roles": ["eng", "fin"]}} ``` #### [](#path-style-group-names-with-no-suffix-extraction-default)Path-style group names with no suffix extraction (default) With `nested_group_behavior: "none"` (the default), Redpanda maps the full path to principals `Group:/departments/eng/platform` and `Group:/departments/eng/infra`. ```json {"groups": ["/departments/eng/platform", "/departments/eng/infra"]} ``` #### [](#csv-formatted-group-claim)CSV-formatted group claim Some identity providers return group claims as a single comma-separated string instead of an array. ```json {"groups": "engineering,analytics,finance"} ``` Redpanda automatically splits comma-separated values and extracts principals `Group:engineering`, `Group:analytics`, and `Group:finance`. ## [](#register-groups)Register groups To assign an IdP group to a role or ACL, you must first register the group in Redpanda Cloud: ### Cloud UI 1. Navigate to **Organization IAM > Groups**. 2. Click **Create group**. 3. Enter a **Name** that matches the group in your IdP exactly (for example, `engineering`). 4. Optionally, enter a **Description**, and configure a **Role binding** to assign the group to a role with a specific scope and resource. 5. Click **Create**. ### Control Plane API Make a [`POST /v1/groups`](/api/doc/cloud-controlplane/operation/operation-groupservice_creategroup) request to the [Control Plane API](../../../../manage/api/cloud-byoc-controlplane-api/): ```bash curl -X POST 'https://api.redpanda.com/v1/groups' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer ' \ -d '{ "group": { "name": "", "description": "" } }' ``` Replace `` with the name that matches the group in your IdP (for example, `engineering`). The name must match exactly for GBAC to map the group correctly. ## [](#predefined-roles)Predefined roles Redpanda Cloud provides several predefined roles that you cannot modify or delete, including Reader, Writer, and Admin. You can see all predefined roles along with their permissions on the **Roles** tab of **Organization IAM**. ## [](#custom-roles)Custom roles In addition to the predefined roles, administrators can create custom roles to mix and match permissions for specific use cases. Custom roles let you grant only the permissions a group needs, without the broad access of predefined roles. Custom roles are created on the **Roles** tab in **Organization IAM**. For steps to create a custom role, see [Custom roles in RBAC](../../rbac/rbac/#custom-roles). When you register a group or edit a group’s role binding, you can assign any predefined or custom role to the group. ## [](#suggested-reading)Suggested reading - [Configure GBAC in the Data Plane](../gbac_dp/) - [Configure RBAC in the Control Plane](../../rbac/rbac/) - [Configure RBAC in the Data Plane](../../rbac/rbac_dp/) - [Single sign-on](../../../cloud-authentication/#single-sign-on) --- # Page 656: Role-Based Access Control (RBAC) **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/rbac.md --- # Role-Based Access Control (RBAC) --- title: Role-Based Access Control (RBAC) latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/rbac/index page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/rbac/index.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/rbac/index.adoc description: Learn about configuring role-based access control (RBAC) in the control plane and in the data plane. page-git-created-date: "2025-02-26" page-git-modified-date: "2025-08-25" --- - [Configure RBAC in the Control Plane](rbac/) Configure RBAC to manage access to organization-level resources like clusters, resource groups, and networks. - [Configure RBAC in the Data Plane](rbac_dp/) Configure RBAC to manage access for provisioned users to cluster-level resources, like topics and consumer groups. --- # Page 657: Configure RBAC in the Data Plane **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/rbac/rbac_dp.md --- # Configure RBAC in the Data Plane --- title: Configure RBAC in the Data Plane latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/rbac/rbac_dp page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/rbac/rbac_dp.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/rbac/rbac_dp.adoc description: Configure RBAC to manage access for provisioned users to cluster-level resources, like topics and consumer groups. page-topic-type: how-to learning-objective-1: Configure cluster-level permissions for provisioned users learning-objective-2: Assign roles to users in the data plane learning-objective-3: Use RBAC with supported authentication methods page-git-created-date: "2025-02-26" page-git-modified-date: "2026-04-07" --- > 📝 **NOTE** > > This feature is available for BYOC and Dedicated clusters. Use role-based access control (RBAC) in the [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) to configure cluster-level permissions for provisioned users at scale. After reading this page, you will be able to: - Configure cluster-level permissions for provisioned users - Assign roles to users in the data plane - Use RBAC with supported authentication methods ## [](#rbac-overview)RBAC overview RBAC addresses the challenge of access management at scale. Instead of managing individual ACLs for each user, RBAC groups permissions into roles that you can assign to multiple users. Roles can reflect organizational structure or job duties. This approach decouples users and permissions, allowing a one-to-many mapping that reduces the number of custom ACLs needed. Benefits of RBAC: - Simplified management: Create roles once, assign to many users - Easier onboarding: New employees inherit permissions by role assignment - Faster audits: Review permissions by role rather than individual user - Better compliance: Roles align with organizational structure and job duties - Reduced errors: Fewer individual ACL assignments mean fewer mistakes ## [](#manage-roles)Manage roles Administrators can manage RBAC configurations with `rpk` or Redpanda Cloud. In Redpanda Cloud, select **Security** from the left navigation menu, and then select the **Roles** tab. After the role is created, you can add users/principals to it. For `rpk`, use [`rpk security`](../../../../reference/rpk/rpk-security/rpk-security/). For example, suppose you want to create a `DataAnalysts` role for users who need to read from analytics topics and write to reporting topics: ```bash # 1. Create the role rpk security role create DataAnalysts # 2. Grant read access to analytics topics rpk security acl create --operation read,describe \ --topic 'analytics-' --resource-pattern-type prefixed \ --allow-role DataAnalysts # 3. Grant write access to reporting topics rpk security acl create --operation write,describe \ --topic 'reports-' --resource-pattern-type prefixed \ --allow-role DataAnalysts # 4. Assign users to the role rpk security role assign DataAnalysts --principal alice,bob,charlie # 5. Verify the setup rpk security role describe DataAnalysts ``` All three users (`alice`, `bob`, `charlie`) now have identical permissions without managing individual ACLs for each user. ## [](#rbac-terminology)RBAC terminology Understanding RBAC terminology is essential for effective role management: | Term | Definition | Example | | --- | --- | --- | | Role | A named collection of ACLs that can be assigned to users | DataEngineers, ApplicationDevelopers, ReadOnlyUsers | | Principal | A user account in the system (same as ACL principals) | User:alice, User:bob, User:analytics-service | | Permission | An ACL rule that allows or denies specific operations | ALLOW READ on topic:sensor-data, DENY DELETE on cluster | | Assignment | The association between a user and one or more roles | User alice has roles DataEngineers and TopicAdmins | RBAC workflow: 1. **Create roles**: Define roles that match your organizational needs 2. **Grant permissions**: Create ACLs specifying the role as allowed/denied 3. **Assign users**: Associate users with appropriate roles 4. **Automatic inheritance**: Users gain all permissions from their assigned roles Under the RBAC framework, you create **roles**, grant **permissions** to those roles, and assign the roles to **users**. When you change the permissions for a given role, all users with that role automatically gain the modified permissions. You grant or deny permissions for a role by creating an ACL and specifying the RBAC role as either allowed or denied respectively. Redpanda treats all **users** as security principals and defines them with the `Type:Name` syntax (for example, `User:mike`). You can omit the `Type` when defining a principal and Redpanda will assume the `User:` type. All examples here use the full syntax for clarity. See [access control lists](../../acl/) for more information on defining ACLs and working with principals. ### [](#roles)Roles You can assign any number of roles to a given user. When installing a new Redpanda cluster, no roles are provisioned by default. When performing an upgrade from older versions of Redpanda, all existing SASL/SCRAM users are assigned to the placeholder `User` role to help you more readily migrate away from pure ACLs. As a security measure, this default role has no assigned ACLs. ### [](#policy-conflicts)Policy conflicts You can assign a combination of ACLs and roles to any given principal. ACLs allow permissions, deny permissions, or specify a combination of both. As a result, users may at times have role assignments with conflicting policies. Permission resolution rules: A user is permitted to perform an operation if and only if: 1. No `DENY` permission exists matching the operation 2. An `ALLOW` permission exists matching the operation Examples: | User’s direct ACLs | Role-based ACLs | Result | Explanation | | --- | --- | --- | --- | | ALLOW READ topic:logs | Role has DENY READ topic:logs | ❌ denied | DENY always takes precedence | | DENY WRITE topic:sensitive | Role has ALLOW WRITE topic:* | ❌ denied | Specific DENY blocks wildcard ALLOW | | No direct ACLs | Role has ALLOW READ topic:data | ✅ allowed | Role permission applies | | ALLOW READ topic:public | No role ACLs for this topic | ✅ allowed | Direct permission applies | ## [](#rbac-best-practices)RBAC best practices Follow these recommendations for effective role-based access control: Role design - Use descriptive names: Choose role names that clearly indicate their purpose (`DataEngineers`, `ReadOnlyAnalysts`) - Follow job functions: Align roles with actual job responsibilities and organizational structure - Keep roles focused: Create specific roles rather than overly broad ones (`TopicReaders` vs `SuperUsers`) - Plan for growth: Design roles that can accommodate new team members and evolving needs Permission management - Start with minimal permissions: Grant only the access required for the role’s function - Use wildcards carefully: Prefixed patterns like `analytics-*` are useful but review regularly - Avoid `DENY` rules: Prefer specific `ALLOW` rules over complex `DENY`/`ALLOW` combinations - Document role purpose: Maintain clear documentation about what each role is intended for Operational guidelines - Regular reviews: Audit roles and assignments quarterly to ensure they remain appropriate - Least privilege: Users should have the minimum roles needed for their current responsibilities - Temporary access: Create time-limited roles for contractors or temporary project access - Monitor usage: Track which roles and permissions are actively used vs. dormant ## [](#manage-users-and-roles)Manage users and roles Administrators can manage RBAC configurations with `rpk` or Redpanda Cloud. Common management tasks: - Create roles: Define new roles for organizational functions - Assign permissions: Add ACLs to roles to define what they can access - Assign users: Associate users with appropriate roles - Modify roles: Add or remove permissions from existing roles - Audit access: Review roles and assignments for compliance Typical workflow: 1. Create role 2. Add ACL permissions 3. Assign users 4. Test access 5. Monitor and adjust ### [](#create-a-role)Create a role Creating a new role is a two-step process. First you define the role, giving it a unique and descriptive name. Second, you assign one or more ACLs to allow or deny access for the new role. This defines the permissions that are inherited by all users assigned to the role. It is possible to have an empty role with no ACLs assigned. #### rpk To create a new role, run: ```bash rpk security role create ``` After the role is created, administrators create new ACLs and assign this role either allow or deny permissions. For example: ```bash rpk security acl create ... --allow-role ``` Example of creating a new role named `red`: ```bash rpk security role create red ``` ```bash Successfully created role "red" ``` #### Redpanda Cloud To create a new role: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Click **Create role**. 3. Provide a name for the role and an optional origin host for users to connect from. 4. Define the permissions (ACLs) for the role. You can create ACLs for clusters, topics, consumer groups, transactional IDs, Schema Registry subjects, and Schema Registry operations. > 💡 **TIP** > > You can assign more than one user/principal to the role when creating it. 5. Click **Create**. ### [](#delete-a-role)Delete a role When a role is deleted, Redpanda carries out the following actions automatically: - All role ACLs are deleted. - All users' assignments to the role are removed. Redpanda lists all impacted ACLs and role assignments when running this command. You receive a prompt to confirm the deletion action. The delete operation is irreversible. #### rpk To delete a role, run: ```bash rpk security role delete ``` Example of deleting a role named `red`: ```bash rpk security role delete red ``` ```bash PERMISSIONS =========== PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR RedpandaRole:red * TOPIC books LITERAL ALL ALLOW RedpandaRole:red * TOPIC videos LITERAL ALL ALLOW PRINCIPALS (1) ============== NAME TYPE panda User ? Confirm deletion of role "red"? This action will remove all associated ACLs and unassign role members Yes Successfully deleted role "red" ``` #### Redpanda Cloud To delete an existing role: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Click the role you want to delete. This shows all currently assigned permissions (ACLs) and principals (users). 3. Click **Delete**. 4. Click **Delete**. ### [](#assign-a-role)Assign a role You can assign a role to any security principal. Principals are referred to using the format: `Type:Name`. Redpanda currently supports only the `User` type. If you omit the type, Redpanda assumes the `User` type by default. With this command, you can assign the role to multiple principals at the same time by using a comma separator between each principal. #### rpk To assign a role to a principal, run: ```bash rpk security role assign --principal ``` Example of assigning a role named `red`: ```bash rpk security role assign red --principal bear,panda ``` ```bash Successfully assigned role "red" to NAME PRINCIPAL-TYPE bear User panda User ``` #### Redpanda Cloud To assign a role to a principal, edit the role or edit the user. Option 1: Edit the role 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role you want to assign to one or more users/principals. 3. Click **Edit**. 4. Below the list of permissions, find the Principals section. You can add any number of users/principals to the role. After listing all new users/principals, click **Update**. Option 2: Edit the user 1. From **Security** on the left navigation menu, select the **Users** tab. 2. Select the user you want to assign one or more roles to. 3. In the **Assign roles** input field, select the roles you want to add to this user. 4. After adding all roles, click **Update**. ### [](#unassign-a-role)Unassign a role You can remove a role assignment from a security principal without deleting the role. Principals are referred to using the format: `Type:Name`. Redpanda currently supports only the `User` type. If you omit the type, Redpanda assumes the `User` type by default. With this command, you can remove the role from multiple principals at the same time by using a comma separator between each principal. #### rpk To remove a role assignment from a principal, run: ```bash rpk security role unassign --principal ``` Example of unassigning a role named `red`: ```bash rpk security role unassign red --principal panda ``` ```bash Successfully unassigned role "red" from NAME PRINCIPAL-TYPE panda User ``` #### Redpanda Cloud There are two ways to remove a role from a principal: Option 1: Edit the role 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role you want to remove from one or more principals. 3. Click **Edit**. 4. Below the list of permissions, find the Principals section. Click **x** beside the name of any principals you want to remove from the role. 5. After you have removed all needed principals, click **Update**. Option 2: Edit the user 1. From **Security** on the left navigation menu, select the **Users** tab. 2. Select the user you want to remove from one or more roles. 3. Click **x** beside the name of any roles you want to remove this user from. 4. After you have removed the user from all roles, click **Update**. ### [](#edit-role-permissions)Edit role permissions You can add or remove ACLs from any of the roles you have previously created. #### rpk To modify an existing role by adding additional ACLs to it, run: ```bash rpk security acl create ... --allow-role ``` ```bash rpk security acl create ... --deny-role ``` To use `rpk` to remove ACLs from a role, run: ```bash rpk security acl delete ... --allow-role rpk security acl delete ... --deny-role ``` When you run `rpk security acl delete`, Redpanda deletes all ACLs matching the parameters supplied. Make sure to match the exact ACL you want to delete. If you supply only the `--allow-role` flag, for example, Redpanda will delete every ACL granting that role authorization to a resource. To list all the ACLs associated with a role, run: ```bash rpk security acl list --allow-role --deny-role ``` See also: - [rpk security acl create](../../../../reference/rpk/rpk-security/rpk-security-acl-create/) - [rpk security acl delete](../../../../reference/rpk/rpk-security/rpk-security-acl-delete/) - [rpk security acl list](../../../../reference/rpk/rpk-security/rpk-security-acl-list/) #### Redpanda Cloud To edit the ACLs for an existing role: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Select the role you want to edit and click **Edit**. 3. While editing the role, you can update the optional origin host for users to connect from. 4. You can add or remove ACLs for the role. As when creating a new role, you can create or modify ACLs for topics, consumer groups, transactional IDs, Schema Registry subjects, and Schema Registry operations. 5. After making all changes, click **Update**. ### [](#list-all-roles)List all roles Redpanda lets you view a list of all existing roles. #### rpk To view a list of all actives roles, run: ```bash rpk security role list ``` Example of listing all roles: ```bash rpk security role list ``` ```bash NAME red ``` #### Redpanda Cloud To view all existing roles: 1. From **Security** on the left navigation menu, select the **Roles** tab. All roles are listed in a paginated view. You can also filter the view using the input field at the top of the list. ### [](#describe-a-role)Describe a role When managing roles, you may need to review the ACLs the role grants or the list of principals assigned to the role. #### rpk To view the details of a given role, run: ```bash rpk security role describe ``` Example of describing a role named `red`: ```bash rpk security role describe red ``` ```bash PERMISSIONS =========== PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME RESOURCE-PATTERN-TYPE OPERATION PERMISSION ERROR RedpandaRole:red * TOPIC books LITERAL ALL ALLOW RedpandaRole:red * TOPIC videos LITERAL ALL ALLOW PRINCIPALS (1) ============== NAME TYPE panda User ``` #### Redpanda Cloud To view details of an existing role: 1. From **Security** on the left navigation menu, select the **Roles** tab. 2. Find the role you want to view and click the role name. All roles are listed in a paginated view. You can also filter the view using the input field at the top of the list. ## [](#suggested-reading)Suggested reading - [`rpk security`](../../../../reference/rpk/rpk-security/rpk-security/) - Complete security command reference - [`rpk security acl`](../../../../reference/rpk/rpk-security/rpk-security-acl/) - ACL management commands - [Access Control Lists](../../acl/) - Understanding the underlying ACL system ## [](#suggested-reading-2)Suggested reading - [Configure RBAC in the Control Plane](../rbac/) - [Configure GBAC in the Control Plane](../../gbac/gbac/) - [Configure GBAC in the Data Plane](../../gbac/gbac_dp/) --- # Page 658: Configure RBAC in the Control Plane **URL**: https://docs.redpanda.com/redpanda-cloud/security/authorization/rbac/rbac.md --- # Configure RBAC in the Control Plane --- title: Configure RBAC in the Control Plane latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: authorization/rbac/rbac page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: authorization/rbac/rbac.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/authorization/rbac/rbac.adoc description: Configure RBAC to manage access to organization-level resources like clusters, resource groups, and networks. page-topic-type: how-to learning-objective-1: Assign predefined or custom roles to users and service accounts learning-objective-2: Manage role bindings at the organization level learning-objective-3: Create custom roles with granular permissions page-git-created-date: "2025-02-26" page-git-modified-date: "2026-04-07" --- > 📝 **NOTE** > > This feature is available for BYOC and Dedicated clusters. Use Redpanda Cloud role-based access control (RBAC) in the [control plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#control-plane) to manage access to resources in your organization. For example, you can grant everyone in a team access to clusters in a development resource group while limiting access to clusters in a production resource group. You can also restrict access to geographically dispersed clusters to support data residency requirements. After reading this page, you will be able to: - Assign predefined or custom roles to users and service accounts - Manage role bindings at the organization level - Create custom roles with granular permissions ## [](#rbac-terminology)RBAC terminology **Role**: A role is a list of permissions. With RBAC, permissions are attached to roles. Users assigned multiple roles receive the union of all permissions defined in those roles. Redpanda Cloud has several predefined roles that you cannot modify or delete, including Reader, Writer, and Admin. You can also create custom roles. **Account**: An RBAC account is either a user account (human user) or a service account (machine or programmatic user). **Role binding**: Role binding assigns a role to an account. Administrators can add, edit, or remove role bindings for a user. When you change the permissions for a given role, all users and service accounts with that role automatically get the modified permissions. ## [](#manage-organization-access)Manage organization access In the Redpanda Cloud Console, the **Organization IAM** page lists your organization’s existing users and service accounts and their associated roles. You can edit a user’s access, invite new users, and create service accounts. When you add a user, you define their permissions with role binding. Service accounts are assigned the Admin role for all resources in the organization. On the **Organization IAM - Users** page, select a user to see their assigned roles. For example, for a user with Admin access on the organization, the user’s _Resource_ is the organization name, the _Scope_ is organization, and the _Role_ is Admin. Various resources can be assigned as the scope of a role. For example: - Organization - Resource group - Network - Network peering - Cluster (Serverless clusters have a different set of permissions from BYOC and Dedicated clusters.) - MCP server > 📝 **NOTE** > > Redpanda topics are not included. For topic-level access control, see [Configure RBAC in the Data Plane](../rbac_dp/). Users can have multiple roles, as long as they are each for a different resource and scope. For example, you could assign a user the Reader role on the organization, the Admin role on a specific resource group, and the Writer role on a specific cluster. When you delete a role, Redpanda removes it from any user or service account it is attached to, and permissions are revoked. ## [](#predefined-roles)Predefined roles Redpanda Cloud provides several predefined roles that you cannot modify or delete, including Reader, Writer, and Admin. You can see all predefined roles along with their permissions on the **Roles** tab of **Organization IAM**. ## [](#custom-roles)Custom roles In addition to the predefined roles, administrators can create custom roles to mix and match permissions for specific use cases. Custom roles let you grant only the permissions a user needs, without the broad access of predefined roles. To create a custom role, use the [Redpanda Cloud Console](https://cloud.redpanda.com) or the [Control Plane API](/api/doc/cloud-controlplane/). In the Redpanda Cloud Console: 1. In the left navigation menu, select **Organization IAM**, then select the **Roles** tab. 2. Click **Create role**. 3. Enter a **Name** and optional **Description** for the role. 4. Select permissions from the available categories: **Control Plane**, **Data Plane**, **IAM**, and **Billing**. Each category contains multiple permission groups (for example, Cluster, Network, or Topic), and each group contains individual operations such as Create, Read, Update, and Delete. You can select operations individually or select all operations for a group. 5. Click **Create**. After creating a custom role, you can assign it to users through role bindings on the **Users** tab. ## [](#suggested-reading)Suggested reading - [Configure RBAC in the Data Plane](../rbac_dp/) - [Configure GBAC in the Control Plane](../../gbac/gbac/) - [Configure GBAC in the Data Plane](../../gbac/gbac_dp/) --- # Page 659: Authentication **URL**: https://docs.redpanda.com/redpanda-cloud/security/cloud-authentication.md --- # Authentication --- title: Authentication latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-authentication page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-authentication.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/cloud-authentication.adoc description: Learn about Redpanda Cloud authentication. page-git-created-date: "2024-06-06" page-git-modified-date: "2026-04-07" --- Redpanda Cloud uses authentication to verify who can access your clusters and perform actions. - **User authentication**: How people access the Redpanda Cloud UI and manage resources - **Service authentication**: How applications and services connect to your Redpanda clusters ## [](#user-authentication)User authentication Redpanda provides user authentication to your Redpanda organization through email/password or single sign-on (SSO) with OIDC-based identity provider (IdP). ### [](#emailpassword)Email/password Passwords are hashed (a one-way function that makes the original value unrecoverable, and effectively encrypted) and salted at rest using [bcrypt](https://en.wikipedia.org/wiki/Bcrypt). ### [](#single-sign-on)Single sign-on Redpanda integrates with any OIDC-compliant IdP that supports discovery, including [Okta](#integrate-with-okta), [Microsoft Entra ID](#integrate-with-microsoft-entra-id), Auth0, Active Directory Federation Services (AD-FS), and JumpCloud. After SSO is enabled for an organization, new users in that organization can authenticate with SSO. You must integrate your IdP with Redpanda Cloud to use SSO. On the **Users** page, users with the admin permissions see a **Single sign-on** tab and can add connections for up to two different IdPs. Enter the client ID, client secret, and discovery URI for the IdP. (See your IdP documentation for these values. The discovery URI may be called something different, like the well known URL or the `issuer_url`.) By default, the connection is added in a disabled state. Edit the connection to enable it. You can choose to enable auto-enroll in the connection, which provides new users signing in from that IdP access to your Redpanda organization. When you enable auto-enroll, you select to assign a Reader, Writer, or Admin role to users who log in with that IdP. Set up is different across IdPs. If your IdP provides OIDC group information, you can also use [group-based access control (GBAC)](../authorization/gbac/) to manage permissions at the group level. Register your IdP groups in Redpanda Cloud and assign roles to those groups, so that users automatically inherit permissions based on their group membership. > 📝 **NOTE** > > Before you can delete an SSO connection, an admin must manually delete all users associated with that connection. #### [](#integrate-with-okta)Integrate with Okta To integrate with Okta, follow the [Okta documentation](https://help.okta.com/en-us/Content/Topics/Apps/Apps_App_Integration_Wizard_OIDC.htm) to create an application within Okta for Redpanda. The Redpanda callback location (that is, the redirect location where Okta sends the user) is the following: ```none https://auth.prd.cloud.redpanda.com/login/callback ``` Okta provides the following fields required for SSO configuration on the Redpanda **Users** page: `clientId`, `clientSecret`, and `discoveryUrl`. The discovery URL for Okta generally looks like the following (where `an_id` could be “default”): ```none https://.okta.com/oauth2//.well-known/openid-configuration ``` #### [](#integrate-with-microsoft-entra-id)Integrate with Microsoft Entra ID To integrate with Microsoft Entra ID, create a Web application registration that uses the OIDC Authorization Code flow with PKCE: 1. In the [Microsoft Entra admin center](https://entra.microsoft.com/), go to **App registrations** and click **New registration**. 1. Name: `Redpanda Cloud`. 2. Supported account types: **Accounts in this organizational directory only (Single tenant)**. 3. Redirect URI: select **Web**, and paste the Callback URL from Redpanda Cloud. To find the Callback URL, go to **Users** > **Single sign-on** in Redpanda Cloud and click **Add connection**. Copy the **Callback URL**. > ❗ **IMPORTANT** > > The platform type must be **Web**. Because Redpanda Cloud uses the OIDC Authorization Code flow with PKCE and a server-side callback, the app must be configured as Web (not SPA or mobile). 4. Click **Register**. 2. After registration, a corresponding Enterprise application (service principal) appears under **Enterprise applications**. If your organization restricts access, assign users/groups to this Enterprise application to allow access to Redpanda Cloud. 3. On the application registration for Redpanda Cloud, click **Endpoints** and copy the **OpenID Connect metadata document** URL. 4. In Redpanda Cloud, on the **Users**: **Single sign-on** page, paste that endpoint address into the **Discovery URI** field. Then, complete the SSO configuration: 1. For **Client ID**, copy and paste the **Application (client) ID** from the Azure app for Redpanda Cloud. 2. For **Client secret**, copy and paste the secret you get from adding a client secret on the Certificates & secrets page for the Azure app for Redpanda Cloud. 3. For **Realm**, enter your Microsoft Entra ID tenant domain name. 4. Click **Save**. 5. On the Redpanda Cloud Users: Single sign-on page, edit your new Entra ID connection to enable single sign-on. Users with an email address with that realm (domain) can now access your Redpanda Cloud account. > 📝 **NOTE** > > - No additional claims required for SSO: Redpanda Cloud SSO relies on the standard OIDC claims (`openid`, `profile`, `email`) provided by your IdP. You do not need to configure optional claims or group claims in ID/Access tokens. > > - Group claims required for [group-based access control (GBAC)](../authorization/gbac/): If you plan to use GBAC, you must configure your IdP to include group claims in OIDC tokens. > > - No API permissions required: You do not need Microsoft Graph or any other API permissions. Microsoft Graph `User.Read` may be listed (some tenants add it during app creation), but Redpanda Cloud performs OIDC sign-in only and does not call Microsoft Graph. > > - First-login consent only: On first sign-in, users are prompted to consent to the standard OIDC scopes `openid`, `profile`, and `email`. After the first consent, users should not be prompted again unless consent is revoked or the app configuration changes. ##### [](#tips-for-integrating-entra-id)Tips for Integrating Entra ID If users are repeatedly prompted for consent or cannot log in: - Ensure the app is configured as Web with the exact Redirect URI from Redpanda Cloud. - Remove any extra API permissions (for example, `Microsoft Graph: User.Read`). - Avoid adding non-standard claims or scopes. ### [](#multi-factor-authentication-mfa)Multi-factor authentication (MFA) Improve account security by requiring a second verification step when logging in to Redpanda Cloud. Redpanda Cloud supports time-based one-time passwords (TOTP) using an authenticator app (for example, Google Authenticator, Microsoft Authenticator, 1Password). You can enable MFA for your own account, and organization administrators can enable MFA for all members of the organization. During the initial MFA setup, after entering your login credentials, you’re prompted to scan a QR code to get a TOTP code from an authenticator app. Enter that 6-digit code to access Redpanda Cloud. Subsequent logins also require entering a TOTP code, but you can choose to remember the device to skip the MFA prompt on that device for the next 30 days. As part of the initial setup, you’re also prompted to save a separate recovery code. Keep the recovery code offline and secure. You can use that recovery code to regain access to Redpanda Cloud, if necessary (for example, if your phone is lost). #### [](#enable-mfa-individual-users)Enable MFA (individual users) Users can enable MFA for their own accounts. 1. In the Cloud UI, select your profile avatar and choose **Manage user**. 2. Open the **Security** tab. 3. Click **Enable** to set up multi-factor authentication. #### [](#enforce-mfa-organization-admins)Enforce MFA (organization admins) Administrators can require MFA for all users in an organization. 1. In the Cloud UI, go to **Organization IAM**. 2. Open the **MFA** tab. 3. Click **Enable** to require MFA for all members of this organization. #### [](#troubleshooting)Troubleshooting - **New phone or lost access:** If you can’t access your authenticator app, select to try another access method and enter your recovery code. - **TOTP code not accepted:** Ensure the code hasn’t expired and that your phone’s time is set automatically; time drift can cause invalid codes. - **Remembered device prompts again:** The 30-day trust is device- and browser-specific. Clearing cookies, switching browsers, or using a new device requires re-verification. ### [](#user-impersonation)User impersonation > ❗ **IMPORTANT** > > To unlock this feature for your account, contact [Redpanda Support](https://support.redpanda.com/hc/en-us/requests/new). BYOC and Dedicated clusters support unified authentication and authorization between the Redpanda Cloud UI and Redpanda with user impersonation. With user impersonation enabled, the topics and resources users see in the UI match exactly what they can access with the Cloud API or `rpk`. You can use the same credentials to authenticate to both Redpanda Cloud and the underlying Redpanda cluster, with consistent permissions across all interfaces. This ensures accurate audit logs and unified identity enforcement across all client applications, including the Cloud UI. - **Without user impersonation**: Redpanda Cloud uses a static service account to access your cluster. All UI requests appear to come from this generic admin user. - **With user impersonation**: Redpanda Cloud uses your individual user credentials and evaluates permissions using [access control lists (ACLs)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#access-control-list-acl) and [role-based access control (RBAC)](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#rbac) in the data plane. Each user sees only the resources they have permission to access. To enable user impersonation: 1. Go to the **Cluster settings** page and select the option to enable user impersonation. 2. Configure permissions for your users on the cluster **Security** page using ACLs or RBAC roles. > ❗ **IMPORTANT** > > After enabling user impersonation: > > - **Admin users** continue to have full access as before > > - **Reader and Writer users** will lose access to the cluster until you explicitly grant them permissions through ACLs or RBAC roles on the **Security** page > > > Plan to configure user permissions before or immediately after enabling this feature to avoid access disruption. ## [](#service-authentication)Service authentication Your applications and tools need to authenticate when connecting to Redpanda APIs. Redpanda Cloud supports different authentication methods depending on which API you’re using and your cloud provider. - **SASL** (Simple Authentication and Security Layer): Username and password authentication over encrypted connections - **mTLS** (Mutual TLS): Certificate-based authentication where both client and server verify each other’s identity - **Basic authentication**: Username and password sent in HTTP headers over encrypted connections > 📝 **NOTE** > > mTLS authentication is supported on AWS and GCP clusters only. Azure clusters currently support SASL and basic authentication only. ### [](#authentication-methods-by-api)Authentication methods by API Different APIs support different authentication methods: - **Kafka API**: Redpanda Cloud supports both SASL (over TLS 1.2) and [mTLS](#mtls) authentication for Kafka clients connecting to Redpanda clusters over the TCP endpoint or listener. - **HTTP Proxy API** and **Schema Registry API**: Redpanda Cloud supports HTTP basic authentication (encrypted over TLS 1.2) and [mTLS](#mtls) for client authentication. For AWS and GCP, you can simultaneously enable mTLS and SASL for the Kafka API, and mTLS and basic authentication for the HTTP APIs (HTTP Proxy and Schema Registry). When you enable both authentication methods, Redpanda creates separate listeners: - One mTLS listener on a specific port - One SASL/basic authentication listener on a different port This allows clients to choose which authentication method to use when connecting. | Cloud provider | API | Supported authentication methods | | --- | --- | --- | | AWSSee Enable mTLS and SASL | Kafka API | SASLSASL/SCRAMSASL/PLAINmTLS | | HTTP Proxy API | Basic authenticationmTLS | | Schema Registry API | Basic authenticationmTLS | | GCPSee Enable mTLS and SASL | Kafka API | SASLSASL/SCRAMSASL/PLAINmTLS | | HTTP Proxy API | Basic authenticationmTLS | | Schema Registry API | Basic authenticationmTLS | | Azure | Kafka API | SASLSASL/SCRAMSASL/PLAIN | | HTTP Proxy API | Basic authentication | | Schema Registry API | Basic authentication | > 📝 **NOTE** > > Each Redpanda Cloud [data plane](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#data-plane) runs its own dedicated agent, which authenticates and connects against the control plane over a single TLS 1.2 encrypted TCP connection. The following features use IAM policies to generate dynamic and short-lived credentials to interact with cloud provider APIs: - Data plane agent - Tiered Storage - Redpanda Console - Kafka Connect [IAM policies](../authorization/cloud-iam-policies/) have constrained permissions so that each service can only access or manage its own data plane-scoped resources, following the principle of least privilege. ### [](#configure-service-authentication)Configure service authentication When you create a new cluster using the [Cloud UI](https://cloud.redpanda.com/), the cluster is enabled by default with SASL for the Kafka API and basic authentication for the HTTP Proxy API and Schema Registry API. ### [](#requirements)Requirements To configure service authentication using the Cloud API, you must have: - A service account in your Redpanda organization with administrative privileges - Access to the [Cloud API](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) #### [](#configuration-methods-by-interface)Configuration methods by interface - **Cloud UI**: Create clusters with default SASL/basic authentication, or enable mTLS for HTTP Proxy and Schema Registry on existing clusters - **Cloud API**: Required to: - Create mTLS-enabled clusters - Enable mTLS for the Kafka API on existing clusters - Enable both mTLS and SASL/basic authentication simultaneously ### [](#authenticate-to-the-cloud-api)Authenticate to the Cloud API 1. Create a service account in your organization, if you haven’t already. In the Redpanda Cloud UI, go to the **Service account** tab of the [Organization IAM](https://cloud.redpanda.com/organization-iam?tab=service-accounts) page to create a service account. 2. Retrieve the client ID and secret by clicking **Copy ID** and **Copy Secret**. 3. Obtain an access token by making a `POST` request to `https://auth.prd.cloud.redpanda.com/oauth/token` with the ID and secret in the request body. ```bash AUTH_TOKEN=`curl -s --request POST \ --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id= \ --data client_secret= \ --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token` ``` Make sure to replace the following variables: | Placeholder variable | Description | | --- | --- | | | Client ID. | | | Client secret. | ### [](#mtls)Enable mTLS authentication For clusters with mTLS authentication, Redpanda creates a dedicated mTLS-enabled listener for each API service (Kafka API, HTTP Proxy, or Schema Registry) where you’ve enabled this authentication method. After you enable mTLS, [get the API endpoints](#retrieve-api-endpoints) and [verify that mTLS authentication is in effect](#verify-mtls). > 📝 **NOTE** > > - mTLS authentication is supported on AWS and GCP clusters only. > > - If you enable mTLS authentication, you cannot disable it later. #### [](#create-a-new-cluster-with-mtls-enabled)Create a new cluster with mTLS enabled 1. Follow the steps to create a resource group and network for [BYOC](../../manage/api/cloud-byoc-controlplane-api/#create-a-resource-group) or [Dedicated](../../manage/api/cloud-dedicated-controlplane-api/#create-a-resource-group), if you haven’t already. You’ll need the resource group ID and network ID to create a cluster in the next step. 2. Make a [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) request to create a new cluster with mTLS enabled. > 📝 **NOTE** > > The following example enables mTLS for the Kafka API. To enable mTLS for HTTP Proxy and Schema Registry, add the `http_proxy.mtls` and `schema_registry.mtls` fields to the request body. You can choose to enable mTLS for any combination of the three services. Show example request to enable mTLS for Kafka API ```bash CLUSTER_CREATE_BODY=`cat << EOF { "cluster": { "cloud_provider": "", "connection_type": "CONNECTION_TYPE_PRIVATE", "name": "", "resource_group_id": "", "network_id": "", "region": "", "zones": [ ], "throughput_tier": "", "type": "", "kafka_api": { "mtls": { "enabled": true, "ca_certificates_pem": [""], "principal_mapping_rules": [""] } } } } EOF` curl -v -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_CREATE_BODY" https://api.redpanda.com/v1/clusters/ ``` Make sure to replace the following variables: | Placeholder variable | Description | | --- | --- | | | ID of the Redpanda cluster. | | | Cloud provider for the cluster (CLOUD_PROVIDER_AWS or CLOUD_PROVIDER_GCP). | | | Name of the Redpanda cluster. | | | ID of the resource group. | | | ID of the network. | | | The region where the cluster is created. For example, us-central1. | | | The zones where the cluster is created. For example, ["us-central1-a", "us-central1-b", "us-central1-c"]. | | | The usage tier of the cluster. | | | The Redpanda cluster type, TYPE_BYOC or TYPE_DEDICATED. | | | A trusted Kafka client CA certificate in PEM format. The ca_certificates_pem field accepts a list of certificates. | | | Configurable rule for mapping the Distinguished Name of Kafka client certificates to Kafka principals.For example, the mapping rule RULE:.*CN=([^,]+).*/\\$1/ maps the following certificate subject to a principal named test:Subject: C=US, ST=IL, L=Chicago, O=redpanda, OU=cloud, CN=test, emailAddress=test123@redpanda.comSee Configure Authentication for more details on principal mapping rules. The principal_mapping_rules field accepts a list of rules. | The Create Cluster endpoint returns a long-running operation. You can check the status of the operation by making a `GET` request to the following endpoint: ```bash curl -H "Authorization: Bearer $AUTH_TOKEN" https://api.redpanda.com/v1/operations/ ``` When the operation state is `COMPLETED`, you can [verify that mTLS is enabled](#verify-mtls) for the API endpoints. #### [](#update-an-existing-cluster-to-use-mtls)Update an existing cluster to use mTLS Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to enable mTLS for the Kafka API on a cluster. The following code block shows a request to enable mTLS for the Kafka API. To enable mTLS for HTTP Proxy and Schema Registry, add the `http_proxy.mtls` and `schema_registry.mtls` fields to the request body: Show example request ```bash CLUSTER_PATCH_BODY=`cat << EOF { "kafka_api": { "mtls": { "enabled": true, "ca_certificates_pem": [""], "principal_mapping_rules": [""] } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" https://api.redpanda.com/v1/clusters/ ``` Make sure to replace the following variables: | Placeholder variable | Description | | --- | --- | | | ID of Redpanda cluster. | | | A trusted Kafka client CA certificate in PEM format. The ca_certificates_pem field accepts a list of certificates. | | | Configurable rule for mapping the Distinguished Name of Kafka client certificates to Kafka principals.For example, the mapping rule RULE:.*CN=([^,]+).*/\\$1/ maps the following certificate subject to a principal named test:Subject: C=US, ST=IL, L=Chicago, O=redpanda, OU=cloud, CN=test, emailAddress=test123@redpanda.comSee Configure Authentication for more details on principal mapping rules. The principal_mapping_rules field accepts a list of rules. | The Update Cluster endpoint returns a long-running operation. You can check the status of the operation by making a `GET` request to the following endpoint: ```bash curl -H "Authorization: Bearer $AUTH_TOKEN" https://api.redpanda.com/v1/operations/ ``` When the operation state is `COMPLETED`, you can [verify that mTLS is enabled](#verify-mtls) for the API endpoints. ### [](#enable-mtls-and-sasl)Enable mTLS and SASL > 📝 **NOTE** > > You can enable mTLS and SASL simultaneously for AWS and GCP clusters only. To unlock this feature for your account, contact your Customer Success Manager. You can choose to enable mTLS and SASL simultaneously for the Kafka API, and mTLS and Basic authentication for HTTP Proxy and Schema Registry. The `sasl` field in the API request examples toggle both SASL and basic authentication. #### [](#create-a-new-cluster-with-both-mtls-and-sasl-enabled)Create a new cluster with both mTLS and SASL enabled 1. Follow the steps to create a resource group and network for [BYOC](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) or [Dedicated](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster), if you haven’t already done so. You’ll need the resource group ID and network ID to create a cluster in the next step. 2. Make a [`POST /v1/clusters`](/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster) request to create a new cluster with both mTLS and SASL or basic authentication enabled. You can enable mTLS and SASL or basic authentication for any combination of the three services. For example, if you want to enable mTLS and SASL simultaneously for Kafka API and mTLS and basic authentication simultaneously for Schema Registry only, leave out the entire `http_proxy` block from the request body. If you want to enable mTLS only for the Kafka API, and mTLS and basic authentication for HTTP Proxy and Schema Registry, leave out the `kafka_api.sasl` field. Show example request ```bash CLUSTER_CREATE_BODY=`cat << EOF { "cluster": { "cloud_provider": "", "connection_type": "CONNECTION_TYPE_PRIVATE", "name": "", "resource_group_id": "", "network_id": "", "region": "", "zones": [ ], "throughput_tier": "", "type": "", "kafka_api": { "mtls": { "enabled": true, "ca_certificates_pem": [""], "principal_mapping_rules": [""] }, "sasl": { "enabled": true } }, "http_proxy": { "mtls": { "enabled": true, "ca_certificates_pem": [""] }, "sasl": { "enabled": true } }, "schema_registry": { "mtls": { "enabled": true, "ca_certificates_pem": [""] }, "sasl": { "enabled": true } } } } EOF` curl -v -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_CREATE_BODY" https://api.redpanda.com/v1/clusters/` ``` Make sure to replace the following variables: | Placeholder variable | Description | | --- | --- | | | ID of Redpanda cluster. | | | Cloud provider for the cluster (CLOUD_PROVIDER_AWS or CLOUD_PROVIDER_GCP). | | | Name of the Redpanda cluster. | | | ID of the resource group. | | | ID of the network. | | | The region where the cluster is created. For example, us-central1. | | | The zones where the cluster is created. For example, ["us-central1-a", "us-central1-b", "us-central1-c"]. | | | The usage tier of the cluster. | | | The Redpanda cluster type, TYPE_BYOC or TYPE_DEDICATED. | | | A trusted Kafka client CA certificate in PEM format. The ca_certificates_pem field accepts a list of certificates. | | | Configurable rule for mapping the Distinguished Name of Kafka client certificates to Kafka principals.For example, the mapping rule RULE:.*CN=([^,]+).*/\\$1/ maps the following certificate subject to a principal named test:Subject: C=US, ST=IL, L=Chicago, O=redpanda, OU=cloud, CN=test, emailAddress=test123@redpanda.comSee Configure Authentication for more details on principal mapping rules. The principal_mapping_rules field accepts a list of rules. | The Create Cluster endpoint returns a long-running operation. You can check the status of the operation by making a `GET` request to the following endpoint: ```bash curl -H "Authorization: Bearer $AUTH_TOKEN" https://api.redpanda.com/v1/operations/ ``` When the operation state is `COMPLETED`, you can [verify that mTLS is enabled](#verify-mtls) for the API endpoints. #### [](#update-an-existing-cluster-to-use-mtls-and-sasl)Update an existing cluster to use mTLS and SASL Make a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request to enable mTLS and SASL on an existing cluster. You can choose to enable mTLS and SASL or basic authentication for any combination of the three services. For example, if you want to enable mTLS and SASL simultaneously for Kafka API and mTLS and basic authentication simultaneously for Schema Registry only, leave out the entire `http_proxy` block from the request body. If you want to enable mTLS only for the Kafka API, and mTLS and basic authentication for HTTP Proxy and Schema Registry, leave out the `kafka_api.sasl` field. Show example request ```bash CLUSTER_PATCH_BODY=`cat << EOF { "kafka_api": { "mtls": { "enabled": true, "ca_certificates_pem": [""], "principal_mapping_rules": [""] }, "sasl": { "enabled": true } }, "schema_registry": { "mtls": { "enabled": true, "ca_certificates_pem": [""] }, "sasl": { "enabled": true } }, "http_proxy": { "mtls": { "enabled": true, "ca_certificates_pem": [""] }, "sasl": { "enabled": true } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" https://api.redpanda.com/v1/clusters/` ``` Make sure to replace the following variables: | Placeholder variable | Description | | --- | --- | | | ID of Redpanda cluster. | | | A trusted Kafka client CA certificate in PEM format. The ca_certificates_pem field accepts a list of certificates. | | | Configurable rule for mapping the Distinguished Name of Kafka client certificates to Kafka principals.For example, the mapping rule RULE:.*CN=([^,]+).*/\\$1/ maps the following certificate subject to a principal named test:Subject: C=US, ST=IL, L=Chicago, O=redpanda, OU=cloud, CN=test, emailAddress=test123@redpanda.comSee Configure Authentication for more details on principal mapping rules. The principal_mapping_rules field accepts a list of rules. | The Update Cluster endpoint returns a long-running operation. You can check the status of the operation by making a `GET` request to the following endpoint: ```bash curl -H "Authorization: Bearer $AUTH_TOKEN" https://api.redpanda.com/v1/operations/ ``` When the operation state is `COMPLETED`, you can [verify that mTLS is enabled](#verify-mtls) for the API endpoints. #### [](#update-an-existing-cluster-to-disable-sasl)Update an existing cluster to disable SASL If you enabled mTLS and SASL on a cluster, you can disable SASL by making a [`PATCH /v1/clusters/{cluster.id}`](/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster) request: Show example request ```bash CLUSTER_PATCH_BODY=`cat << EOF { "kafka_api": { "sasl": { "enabled": false } } } EOF` curl -v -X PATCH \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -d "$CLUSTER_PATCH_BODY" https://api.redpanda.com/v1/clusters/ ``` ### [](#retrieve-api-endpoints)Retrieve API endpoints Retrieve the mTLS and SASL-enabled endpoints by calling the `GET /v1/clusters/{id}` endpoint, passing the cluster ID as a parameter. ```bash curl -X GET "https://api.redpanda.com/v1/clusters/" \ -H "accept: application/json"\ -H "content-type: application/json" \ -H "authorization: Bearer ${AUTH_TOKEN}" ``` The API endpoints are returned in the response body in the following fields: | API | Field | Example | | --- | --- | --- | | Kafka API | kafka_api.all_seed_brokers | sasl: seed-2f92c489.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:9092mtls: seed-2f92c489.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:9093 | | HTTP Proxy | http_proxy.all_urls | sasl: https://pandaproxy-ce24d80a.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:30082mtls: https://pandaproxy-ce24d80a.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:30083 | | Schema Registry | schema_registry.all_urls | sasl: https://schema-registry-20b02d09.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:30081mtls: https://schema-registry-20b02d09.d040oh0mf339m7q5uu0g.byoc.ign.cloud.redpanda.com:30080 | ### [](#verify-mtls)Verify mTLS for Kafka API To verify that mTLS is enabled for the Kafka API, run the following `rpk` command without providing a security certificate or private key: ```bash rpk cluster info --tls-enabled ``` You should get the following error: ```none unable to request metadata: remote error: tls: certificate required ``` When you consume, produce to, or manage topics using [`rpk`](../../../current/reference/rpk/rpk-topic/rpk-topic/), you must provide a client certificate and private key. You may use the `--tls-cert` and `--tls-key` options, or [environment variables](../../../current/reference/rpk/rpk-x-options/) with `rpk`. ```bash rpk topic create test-topic --tls-enabled --tls-cert=/path/to/tls.crt --tls-key=/path/to/tls.key ``` ### [](#verify-mtls-http)Verify mTLS for HTTP Proxy and Schema Registry To verify that mTLS is enabled for the HTTP Proxy and Schema Registry, run the following `curl` commands, without providing a security certificate or key: ```bash # Run the following to verify HTTP Proxy curl -u $USERNAME:$PASSWORD -k -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 -d '{"records":[{"test":"hello"},{"test":"world"}]}' $HTTP_PROXY_MTLS_URL/topics/ # Run the following to verify Schema Registry curl -u $USERNAME:$PASSWORD -k -H "Content-Type: application/vnd.schemaregistry.v1+json" $SCHEMA_REGISTRY_MTLS_URL/subjects//versions/1 ``` You should get an error indicating that the certificate is required. To successfully connect to the HTTP Proxy and Schema Registry, you must provide a client certificate and private key. The following `curl` commands show example requests to mTLS-enabled endpoints using `test` as the username and `12345` as the password. ```bash # HTTP Proxy curl -u test:12345 -k --cert cert.pem --key key.pem -H "Content-Type: application/vnd.kafka.json.v2+json" --sslv2 --http2 https://pandaproxy-45f811b1.cge5asc6006u7fvep0q0.fmc.dev.cloud.redpanda.com:30082/topics # Schema Registry curl -u test:12345 -k --cert cert.pem --key key.pem https://schema-registry-15d24f32.cge5asc6006u7fvep0q0.fmc.dev.cloud.redpanda.com:30081/subjects/Kafka-value/versions/1 ``` ### [](#verify-sasl)Verify SASL To verify that SASL is enabled for the Kafka API, run: ```bash rpk topic create test-topic --tls-enabled --user --password ``` The command should succeed, and you should be able to create a topic named `test-topic`. ## [](#suggested-reading)Suggested reading - [Cloud API Overview](/api/doc/cloud-controlplane/topic/topic-cloud-api-overview) - [Cloud API Authentication](/api/doc/cloud-controlplane/authentication) --- # Page 660: Availability **URL**: https://docs.redpanda.com/redpanda-cloud/security/cloud-availability.md --- # Availability --- title: Availability latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-availability page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-availability.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/cloud-availability.adoc description: Learn how Redpanda Cloud supports deploying clusters in single or multiple availability zones (AZs). page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- Redpanda Cloud supports the deployment of Redpanda clusters in single or multiple availability zones (AZs), spanning at most three AZs. Brokers are evenly distributed across AZs, and the number of topic replicas is set to `3` by default. Data is evenly distributed across AZs automatically. This behavior is known as [rack awareness](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#rack-awareness). To prevent downtime during cluster upgrades, the Redpanda Cloud cluster operator upgrades one broker at a time. It waits for the health of the cluster to return to its nominal state before continuing with the next broker upgrade, until all brokers are fully rolled out. Redpanda’s Support, Security, and Site Reliability Engineering (SRE) teams monitor Redpanda Cloud clusters 24/7 to ensure they meet availability service level agreements (SLAs). If incidents occur, teams at Redpanda trigger an incident response process to quickly mitigate them. --- # Page 661: Encryption **URL**: https://docs.redpanda.com/redpanda-cloud/security/cloud-encryption.md --- # Encryption --- title: Encryption latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-encryption page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-encryption.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/cloud-encryption.adoc description: Learn how Redpanda Cloud provides data encryption in transit and at rest. page-git-created-date: "2024-06-06" page-git-modified-date: "2025-11-12" --- Redpanda Cloud provides data at rest and data in transit encryption. ## [](#data-at-rest-encryption)Data at rest encryption For data on disk, Redpanda Cloud relies on the cloud provider’s default volume encryption. The default encryption uses AES-256 block cipher and encryption keys either per disk or data chunk, depending on the cloud provider. For details about how default data at rest encryption works, see: - [AWS SSD instance store volume](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html) - [GCP data encryption at rest](https://cloud.google.com/docs/security/encryption/default-encryption) - [Azure data encryption at rest](https://learn.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest) For Tiered Storage data, every Redpanda Cloud cluster uses a unique and periodically rotated managed master key (SSE-S3). The block cipher uses AES-256. ## [](#data-in-transit-encryption)Data in transit encryption All network traffic transporting customer data is encrypted in transit using asymmetric encryption with TLS 1.2 and TLS 1.3. The network connection to the control plane is also TLS 1.2 encrypted. Data plane TLS certificates are generated and signed by [Let’s Encrypt](https://letsencrypt.org/). Redpanda Cloud implements mitigations to prevent bad actors from enumerating cluster endpoints through the public certificate transparency log. The following protocols and cipher suites are supported and accepted by Redpanda services such as Schema Registry, HTTP Proxy, and Kafka API. > 📝 **NOTE** > > Cipher suites marked \*\* are deprecated. ```bash Supported Server Cipher(s): Preferred TLSv1.3 128 bits TLS_AES_128_GCM_SHA256 Curve 25519 DHE 253 Accepted TLSv1.3 256 bits TLS_AES_256_GCM_SHA384 Curve 25519 DHE 253 Accepted TLSv1.3 256 bits TLS_CHACHA20_POLY1305_SHA256 Curve 25519 DHE 253 Accepted TLSv1.3 128 bits TLS_AES_128_CCM_SHA256 Curve 25519 DHE 253 Preferred TLSv1.2 128 bits ECDHE-RSA-AES128-GCM-SHA256 Curve 25519 DHE 253 Accepted TLSv1.2 128 bits AES128-GCM-SHA256 ** Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-GCM-SHA384 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits AES256-GCM-SHA384 ** Accepted TLSv1.2 256 bits ECDHE-RSA-CHACHA20-POLY1305 Curve 25519 DHE 253 Accepted TLSv1.2 128 bits ECDHE-RSA-AES128-SHA ** Curve 25519 DHE 253 Accepted TLSv1.2 128 bits AES128-SHA ** Accepted TLSv1.2 128 bits AES128-CCM ** Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-SHA ** Curve 25519 DHE 253 Accepted TLSv1.2 256 bits AES256-SHA ** Accepted TLSv1.2 256 bits AES256-CCM ** Server Key Exchange Group(s): TLSv1.3 128 bits secp256r1 (NIST P-256) TLSv1.3 192 bits secp384r1 (NIST P-384) TLSv1.3 260 bits secp521r1 (NIST P-521) TLSv1.3 128 bits x25519 TLSv1.3 224 bits x448 TLSv1.3 112 bits ffdhe2048 TLSv1.3 128 bits ffdhe3072 TLSv1.3 150 bits ffdhe4096 TLSv1.3 175 bits ffdhe6144 TLSv1.3 192 bits ffdhe8192 TLSv1.2 128 bits secp256r1 (NIST P-256) TLSv1.2 192 bits secp384r1 (NIST P-384) TLSv1.2 260 bits secp521r1 (NIST P-521) TLSv1.2 128 bits x25519 TLSv1.2 224 bits x448 ``` --- # Page 662: Safety and Reliability **URL**: https://docs.redpanda.com/redpanda-cloud/security/cloud-safety-reliability.md --- # Safety and Reliability --- title: Safety and Reliability latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: cloud-safety-reliability page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: cloud-safety-reliability.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/cloud-safety-reliability.adoc description: Learn how Redpanda Cloud tests for data inconsistency, liveness, and availability during adverse events. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- Safety, reliability, and security are a top priority at Redpanda and an important part of the product development lifecycle. Redpanda continuously performs chaos testing to check for data inconsistency, liveness, and availability issues during adverse events. It checks for losing brokers, network partition or packet drops, or approaching system limits in terms of disk, CPU, network, or memory utilization. ## [](#auditing-and-testing)Auditing and testing To test and ensure Redpanda Cloud adheres to consistency guarantees, Redpanda has undergone [Jepsen validation and testing](https://jepsen.io/analyses/redpanda-21.10.1). Additionally, the Redpanda Cloud, SRE, and Security teams run periodic game day testing to simulate a failure or event to test systems, processes, and team responses. This game day testing of Redpanda Cloud is designed to verify safety, reliability, observability, and security of features, and to identify any regressions or new gaps in the system, mental models, alerts, or runbooks. The Redpanda Cloud cluster infrastructure is periodically reconciled to prevent state drift from building up and causing incidents. ## [](#packaging)Packaging Redpanda Cloud cluster software artifacts (also known as the meta-package or Install Pack) are packaged and tested together with each release. Install Packs undergo a comprehensive certification process on each cloud provider that Redpanda Cloud supports, and they include the testing of upgrades from the latest two Install Pack patch releases. One output of the Install Pack certification process is a Redpanda configuration for different tiers, tailored to each supported cloud provider, machine, and storage type. These tier limits and quotas help Redpanda to configure back pressure mechanisms on behalf of customers. ## [](#self-regulation)Self-regulation Redpanda Cloud adheres to a system automatic self-regulation, as demonstrated in the [Tiered Storage](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#tiered-storage) and [data balancing](https://docs.redpanda.com/redpanda-cloud/reference/glossary/#rebalancing) features. --- # Page 663: Secrets **URL**: https://docs.redpanda.com/redpanda-cloud/security/secrets.md --- # Secrets --- title: Secrets latest-operator-version: v26.1.2 latest-console-tag: v3.7.1 latest-connect-version: 4.87.0 latest-redpanda-tag: v26.1.3 docname: secrets page-component-name: redpanda-cloud page-version: master page-component-version: master page-component-title: Cloud page-relative-src-path: secrets.adoc page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/security/pages/secrets.adoc description: Learn how Redpanda Cloud manages secrets. page-git-created-date: "2024-06-06" page-git-modified-date: "2024-08-01" --- Redpanda Cloud uses _dynamic secrets_ through IAM roles. These have policies defined by the actions and resources that a user (also known as a principal) strictly needs, following the principle of least privilege. Redpanda Cloud also uses _static secrets_, stored in either the [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) or [GCP Secret Manager](https://cloud.google.com/secret-manager) services. Static secrets managed through Redpanda Console never leave their corresponding data plane account or network. They stay securely stored in AWS Secrets Manager or GCP Secret Manager. ---