Skip to main content
Version: 23.1

Create a Google BigQuery Sink Connector

The Google BigQuery Sink connector enables you to stream any structured data from Redpanda to BigQuery for advanced analytics.

Prerequisites

Before you can create a Google BigQuery Sink connector in the Redpanda Cloud, you must:

  1. Create a Google Cloud account.

  2. In the Google home page:

    1. Select an existing project or create a new one.
    2. Create a new dataset for the project.
    3. (Optional if your data has a schema) After creating the dataset, create a new table to hold the data you intend to stream from Redpanda Cloud topics. Specify a structure for the table using schema values that align with your Redpanda topic data.
      note

      This step is mandatory only if the data in Redpanda does not have a schema. If the data in Redpanda includes a schema, then the connector automatically creates the tables in BigQuery.

  3. Create a custom role.

    The role must have the following permissions:

    bigquery.datasets.get
    bigquery.tables.create
    bigquery.tables.get
    bigquery.tables.getData
    bigquery.tables.list
    bigquery.tables.update
    bigquery.tables.updateData
  4. Create a service account.

  5. Add the custom role to your service account.

  6. Create a service account key, and then download it.

Limitations

The Google BigQuery Sink connector doesn’t support schemas with recursion.

Create a Google BigQuery Sink connector

To create the Google BigQuery Sink connector:

  1. In Redpanda Cloud, click Connectors in the navigation menu, and then click Create Connector.

  2. Select Export to Google BigQuery.

  3. On the Create Connector page, specify the following required connector configuration options:

    PropertyDescription
    Topics to exportA comma-separated list of the cluster topics you want to replicate to Google BigQuery.
    Topics regexA Java regular expression of topics to replicate. For example: specify .* to replicate all available topics in the cluster. Applicable only when Use regular expressions is selected.
    Credentials JSONA JSON key with BigQuery service account credentials.
    ProjectThe BigQuery project to which topic data will be written.
    Default datasetThe default Google BigQuery dataset to be used.
  4. Click Next. Review the connector properties specified, then click Create.

Advanced Google BigQuery Sink connector configuration

In most instances, the preceding basic configuration properties are sufficient. If you require any additional property settings (for example, automatically create BigQuery tables or map topics to tables), then specify any of the following optional advanced connector configuration properties by selecting Show advanced options on the Create Connector page:

PropertyDescription
Auto create tablesAutomatically create BigQuery tables if they don't already exist. If the table does not exist, then it is created based on the record schema.
Topic to table mapMap of topics to tables. Format: comma-separated tuples, for example topic1:table1,topic2:table2.
Allow new BigQuery fieldsIf true, new fields can be added to BigQuery tables during subsequent schema updates.
Allow BigQuery required field relaxationIf true, fields in the BigQuery schema can be changed from REQUIRED to NULLABLE.
Upsert enabledEnables upsert functionality on the connector.
Delete enabledEnable delete functionality on the connector.
Time partitioning typeThe time partitioning type to use when creating tables.

Map data

Use the appropriate key or value converter (input data format) for your data as follows:

  • JSON when your messages are JSON-encoded. Select Message JSON contains schema, with the schema and payload fields. If your messages do not contain schema, manually create tables in BigQuery.
  • AVRO when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry.

By default, the table name is the name of the topic (non-alphanumeric characters replaced with an underscore character, _).

Use Topic to table map to remap topic names. For example, topic1:table1,topic2:table2.

Test the connection

After the connector is created, go to your BigQuery worksheets and query your table:

SELECT * FROM `project.dataset.table`

It may take a couple of minutes for the records to be visible in BigQuery.

Troubleshoot

Google credentials are checked for validity during connector creation, upon clicking Finish. In cases where there are invalid credentials, the connector is not created.

Other issues are reported using a failed task error message.

MessageAction
Not found: Project invalid-project-nameCheck to make sure Project contains a valid BigQuery project.
Not found: Dataset project:invalid-datasetCheck to make sure Default dataset contains a valid BigQuery dataset.
An unexpected error occurred while validating credentials for BigQuery: Failed to create credentials from input streamThe credentials given as a JSON file in the Credentials JSON property are incorrect. Copy a valid key from the Google Cloud service account.
JsonConverter with schemas.enable requires "schema" and "payload" fieldsThe connector encountered an incorrect message format when reading from a topic.
JsonParseException: Unrecognized token 'test': was expecting JSONDuring reading from a topic the connector encountered a message that is invalid JSON.

What do you like about this doc?




Optional: Share your email address if we can contact you about your feedback.

Let us know what we do well: