Skip to main content

Documentation Index

Fetch the complete documentation index at: https://platform.docs.zenoo.com/llms.txt

Use this file to discover all available pages before exploring further.

GCP Managed Kafka Setup

This guide covers configuring Zenoo Hub to use Google Cloud Managed Service for Apache Kafka. While the main GCP Cloud Provider documentation covers Firestore and Secret Manager, this dedicated guide focuses on Kafka configuration, authentication, IAM permissions, and troubleshooting specific to GCP Managed Kafka.

Overview

Google Cloud Managed Service for Apache Kafka provides a fully managed, Apache Kafka-compatible event streaming service. It eliminates the operational overhead of running and maintaining Kafka clusters while providing:
  • Automatic scaling - Scales capacity based on throughput
  • High availability - Multi-zone redundancy and automatic failover
  • IAM integration - Native GCP authentication and authorization
  • Security - Encryption at rest and in transit by default
  • Monitoring - Built-in Cloud Monitoring integration
Why This Separate Guide? GCP Managed Kafka has unique authentication and authorization requirements compared to self-hosted Kafka or AWS MSK. This guide addresses:
  • OAuth/SASL_SSL authentication setup (recommended)
  • mTLS authentication (alternative)
  • IAM permission configuration
  • Kafka Streams consumer group coordination
  • Common deployment issues and solutions

Prerequisites

Before proceeding, ensure you have:
  • GCP Managed Kafka cluster created and running
  • Hub application with GCP cloud provider configured (see GCP Provider)
  • Service account with Kafka IAM permissions (see IAM Permissions)
  • Network connectivity from Hub deployment to Managed Kafka cluster
Create Managed Kafka Cluster:
gcloud managed-kafka clusters create zenoo-hub-kafka \
  --location=us-central1 \
  --cpu=3 \
  --memory=3GiB \
  --project=${PROJECT_ID}

Authentication Methods

GCP Managed Kafka supports two authentication methods. OAuth is strongly recommended for new deployments. OAuth authentication uses GCP IAM for both authentication and authorization, providing the simplest and most secure approach.

Benefits

  • No manual ACL management - Permissions controlled via IAM roles
  • Automatic credential rotation - Uses Application Default Credentials
  • Integrated with GCP ecosystem - Consistent with other GCP services
  • No certificate management - No keystores or truststores required
  • Seamless authentication - Works with service accounts, Workload Identity, etc.

Required Dependency

Add the GCP Managed Kafka authentication library to your build.gradle:
dependencies {
    // GCP Managed Kafka OAuth authentication
    implementation 'com.google.cloud.hosted.kafka:managed-kafka-auth-login-handler:1.0.6'
}

Configuration

Configure OAuth authentication in application-gcp.yml:
spring:
  kafka:
    # OAuth uses port 9092 (bootstrapAddress from cluster)
    bootstrap-servers: ${SPRING_KAFKA_BOOTSTRAP_SERVERS:bootstrap.zenoo-hub-kafka.us-central1.managedkafka.hub-instance-test.cloud.goog:9092}

    properties:
      # Security protocol - OAuth uses SASL_SSL
      security.protocol: SASL_SSL

      # SASL/OAuth configuration
      sasl.mechanism: OAUTHBEARER
      sasl.login.callback.handler.class: com.google.cloud.hosted.kafka.auth.GcpLoginCallbackHandler
      sasl.jaas.config: org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;

      # Consumer group timeouts (IMPORTANT for Kafka Streams)
      session.timeout.ms: 300000        # 5 minutes
      max.poll.interval.ms: 300000      # 5 minutes
      rebalance.timeout.ms: 300000      # 5 minutes

    consumer:
      auto-offset-reset: earliest
Key Configuration Notes:
  • Port 9092 - OAuth uses the standard Kafka port (not 9192 for mTLS)
  • Timeouts - Set to 300000ms (5 minutes) for GCP Managed Kafka latency characteristics
  • Bootstrap Address - Get from cluster details:
    gcloud managed-kafka clusters describe zenoo-hub-kafka \
      --location=us-central1 \
      --format="value(bootstrapAddress)"
    

mTLS (Alternative)

Mutual TLS (mTLS) authentication uses client certificates for authentication. While supported, OAuth is strongly recommended for new deployments.

Benefits

  • Certificate-based authentication - No IAM dependency
  • Explicit trust model - Client certificates verify identity

Challenges

  • Manual ACL management - Must configure Kafka ACLs via API
  • Certificate rotation overhead - Regular keystore/truststore updates required
  • GCP Managed Kafka ACL limitations - Known issues with ACL API (see troubleshooting)
  • Operational complexity - More moving parts to manage

Configuration

spring:
  kafka:
    # mTLS uses port 9192 (sslBootstrapAddress from cluster)
    bootstrap-servers: ${SPRING_KAFKA_BOOTSTRAP_SERVERS:bootstrap.zenoo-hub-kafka.us-central1.managedkafka.hub-instance-test.cloud.goog:9192}

    ssl:
      key-store-location: classpath:certs/client-keystore.jks
      key-store-password: ${KEYSTORE_PASSWORD}
      key-store-type: JKS
      trust-store-location: classpath:certs/client-truststore.jks
      trust-store-password: ${TRUSTSTORE_PASSWORD}
      trust-store-type: JKS
      # key-password: ${KEY_PASSWORD}    # If different from keystore password

    properties:
      security.protocol: SSL
      session.timeout.ms: 300000
      max.poll.interval.ms: 300000
      rebalance.timeout.ms: 300000

    consumer:
      auto-offset-reset: earliest
mTLS Setup:
  1. Generate certificate signing request (CSR)
  2. Sign CSR with Managed Kafka CA
  3. Import signed certificate and CA into keystores
  4. Configure ACLs via Managed Kafka API (see limitations)
Recommendation: Use OAuth unless you have specific requirements for certificate-based authentication.

IAM Permissions for OAuth

When using OAuth authentication, GCP Managed Kafka uses IAM roles for authorization. The service account running your Hub application requires specific IAM roles.

Required Roles

# Set variables
export PROJECT_ID="your-gcp-project"
export SA_EMAIL="hub-service-account@${PROJECT_ID}.iam.gserviceaccount.com"

# Managed Kafka Client (required for all connections)
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/managedkafka.client"

# Consumer Group Editor (required for Kafka Streams)
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/managedkafka.consumerGroupEditor"

Permissions Breakdown

roles/managedkafka.client includes:
  • managedkafka.clusters.connect - Connect to Managed Kafka cluster
  • managedkafka.topics.read - Read messages from topics
  • managedkafka.topics.write - Write messages to topics
roles/managedkafka.consumerGroupEditor includes:
  • managedkafka.consumerGroups.update - Join consumer groups and commit offsets
Critical for Kafka Streams: Kafka Streams applications require both roles. Without consumerGroupEditor, Kafka Streams will remain stuck in REBALANCING state with 0 assigned partitions.

Verify Permissions

# Check service account has required roles
gcloud projects get-iam-policy ${PROJECT_ID} \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:${SA_EMAIL}" \
  --format="table(bindings.role)"

# Expected output should include:
# roles/managedkafka.client
# roles/managedkafka.consumerGroupEditor

Hub Configuration for Kafka

Complete Hub configuration example integrating GCP cloud provider with Managed Kafka.

Complete application-gcp.yml

# GCP Cloud Provider Configuration for Hub with Managed Kafka

spring:
  profiles:
    include:
      - gcp                 # Enable GCP cloud provider
      - create-topics       # Auto-create topics on startup

  # Kafka Configuration - GCP Managed Kafka
  kafka:
    # OAuth bootstrap address (port 9092)
    bootstrap-servers: ${SPRING_KAFKA_BOOTSTRAP_SERVERS:bootstrap.zenoo-hub-kafka.us-central1.managedkafka.hub-instance-test.cloud.goog:9092}

    properties:
      # OAuth authentication
      security.protocol: SASL_SSL
      sasl.mechanism: OAUTHBEARER
      sasl.login.callback.handler.class: com.google.cloud.hosted.kafka.auth.GcpLoginCallbackHandler
      sasl.jaas.config: org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;

      # Consumer group timeouts (critical for Kafka Streams)
      session.timeout.ms: 300000
      max.poll.interval.ms: 300000
      rebalance.timeout.ms: 300000

    consumer:
      auto-offset-reset: earliest

# Hub Configuration
hub:
  cloud:
    provider:
      type: gcp

  gcp:
    projectId: ${GCP_PROJECT_ID:hub-instance-test}

    firestore:
      prefix: "hub-prod"
      createIndexes: true
      ttlEnabled: true

    secrets:
      prefix: "hub-prod"
      cacheSize: 256
      cacheExpiry: 1h

    metrics:
      enabled: true
      prefix: "hub-prod"

  # Hub Kafka topics configuration
  topics:
    partitions: 20
    retention: 2H

  execution:
    expiration: 5m

  streams:
    prefix: ${PREFIX:hub-prod}
    state:
      dir: /tmp/hub-streams
      cleanup-on-start: true

Environment Variables

Override configuration via environment variables:
export GCP_PROJECT_ID="your-project-id"
export SPRING_KAFKA_BOOTSTRAP_SERVERS="bootstrap.your-cluster.region.managedkafka.project.cloud.goog:9092"

Profile-Based Configuration

Use Spring profiles for different environments:
# Development
spring.profiles: dev
spring.kafka.bootstrap-servers: localhost:9092  # Local Kafka

---
# Staging
spring.profiles: staging
spring.kafka.bootstrap-servers: bootstrap.staging-kafka.us-central1.managedkafka.project.cloud.goog:9092

---
# Production
spring.profiles: prod
spring.kafka.bootstrap-servers: bootstrap.prod-kafka.us-central1.managedkafka.project.cloud.goog:9092

Kafka Streams Configuration

Critical: Hub uses Apache Kafka Streams for stateful stream processing. Ensure ALL Kafka properties are passed to Kafka Streams configuration.

Verification

The Hub’s KafkaConfig.java must include this pattern:
@Bean
public KafkaStreamsConfiguration kStreamsConfigs() {
    Map<String, Object> props = new HashMap<>();

    // CRITICAL: Start with ALL properties from Spring Boot configuration
    // This includes security.protocol, SASL properties, timeouts, etc.
    props.putAll(kafkaProperties.getProperties());

    // Then override with Hub-specific configuration
    props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId());
    props.put(StreamsConfig.STATE_DIR_CONFIG, hubStreamsConfig.getState().getDir());
    props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
    props.put(StreamsConfig.REQUEST_TIMEOUT_MS_CONFIG, hubStreamsConfig.getRequestTimeoutMs());
    // ... additional Hub-specific properties

    // Apply security properties (ensures proper SSL/SASL configuration)
    props.putAll(KafkaPropertiesSupport.securityProperties(kafkaProperties));

    return new KafkaStreamsConfiguration(props);
}
Why This Matters: If props.putAll(kafkaProperties.getProperties()) is missing, Kafka Streams will not receive OAuth authentication properties and will fail to connect or remain stuck in REBALANCING.

Common Mistake

Incorrect (causes OAuth failure):
// Missing kafkaProperties.getProperties()
props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
// OAuth properties never passed to Kafka Streams!
Correct:
// Include ALL properties first
props.putAll(kafkaProperties.getProperties());  // <- CRITICAL

// Then override Hub-specific config
props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId());

Deployment

Deploy Hub with GCP Managed Kafka on various GCP compute platforms.

GCE VM Deployment

Deploy on Google Compute Engine with attached service account:
# Set variables
export PROJECT_ID="your-project"
export SA_EMAIL="hub-service-account@${PROJECT_ID}.iam.gserviceaccount.com"
export CLUSTER_NAME="zenoo-hub-kafka"
export REGION="us-central1"

# Get Kafka bootstrap address
BOOTSTRAP_ADDRESS=$(gcloud managed-kafka clusters describe ${CLUSTER_NAME} \
  --location=${REGION} \
  --format="value(bootstrapAddress)")

# Create VM with service account
gcloud compute instances create hub-instance \
  --machine-type=e2-standard-2 \
  --service-account=${SA_EMAIL} \
  --scopes=cloud-platform \
  --zone=${REGION}-a \
  --project=${PROJECT_ID}

# SSH to VM and deploy container
gcloud compute ssh hub-instance --zone=${REGION}-a --command="
  docker run -d \
    --name hub-instance \
    --restart=always \
    -e SPRING_KAFKA_BOOTSTRAP_SERVERS=${BOOTSTRAP_ADDRESS}:9092 \
    -e GCP_PROJECT_ID=${PROJECT_ID} \
    -e SPRING_PROFILES_ACTIVE=gcp \
    -p 5080:5080 \
    us.gcr.io/${PROJECT_ID}/hub:latest
"

GKE Deployment

Deploy on Google Kubernetes Engine with Workload Identity: 1. Configure Workload Identity:
# Bind Kubernetes service account to GCP service account
gcloud iam service-accounts add-iam-policy-binding ${SA_EMAIL} \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[default/hub-service-account]" \
  --project=${PROJECT_ID}
2. Kubernetes Manifests:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: hub-service-account
  annotations:
    iam.gke.io/gcp-service-account: hub-service-account@PROJECT_ID.iam.gserviceaccount.com

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hub-config
data:
  application.yml: |
    spring:
      profiles:
        active: gcp
      kafka:
        bootstrap-servers: bootstrap.zenoo-hub-kafka.us-central1.managedkafka.PROJECT_ID.cloud.goog:9092
    hub:
      gcp:
        projectId: PROJECT_ID

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hub-instance
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hub
  template:
    metadata:
      labels:
        app: hub
    spec:
      serviceAccountName: hub-service-account
      containers:
      - name: hub
        image: us.gcr.io/PROJECT_ID/hub:latest
        ports:
        - containerPort: 5080
        env:
        - name: SPRING_CONFIG_LOCATION
          value: file:/config/application.yml
        volumeMounts:
        - name: config
          mountPath: /config
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 5080
          initialDelaySeconds: 120
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 5080
          initialDelaySeconds: 30
          periodSeconds: 10
      volumes:
      - name: config
        configMap:
          name: hub-config

---
apiVersion: v1
kind: Service
metadata:
  name: hub-service
spec:
  selector:
    app: hub
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5080
  type: LoadBalancer
3. Deploy:
kubectl apply -f hub-deployment.yaml

Cloud Run Deployment

Deploy on Cloud Run (serverless):
# Build and push container
gcloud builds submit --tag us.gcr.io/${PROJECT_ID}/hub:latest

# Deploy to Cloud Run
gcloud run deploy hub-instance \
  --image=us.gcr.io/${PROJECT_ID}/hub:latest \
  --service-account=${SA_EMAIL} \
  --set-env-vars="SPRING_KAFKA_BOOTSTRAP_SERVERS=${BOOTSTRAP_ADDRESS}:9092,GCP_PROJECT_ID=${PROJECT_ID},SPRING_PROFILES_ACTIVE=gcp" \
  --region=${REGION} \
  --platform=managed \
  --allow-unauthenticated \
  --memory=2Gi \
  --cpu=2 \
  --project=${PROJECT_ID}
Note: Cloud Run may require VPC egress for Managed Kafka connectivity.

Verification

Verify Kafka Streams and Kafka connectivity after deployment.

Check Kafka Streams State

Kafka Streams should transition to RUNNING state within 30-60 seconds of application startup.
# Replace with your instance IP or hostname
export INSTANCE_IP="34.58.162.65"

# Check health endpoint
curl -s http://${INSTANCE_IP}:5080/actuator/health | python3 -m json.tool | grep -A 5 kafkaStreams
Expected Output:
"kafkaStreams": {
    "status": "UP",
    "details": {
        "state": "RUNNING"
    }
}
Warning Signs:
  • state: REBALANCING (stuck indefinitely)
  • state: ERROR
  • status: DOWN

Verify Partition Assignment

Check that consumer partitions are assigned:
curl -s http://${INSTANCE_IP}:5080/actuator/metrics/kafka.consumer.coordinator.assigned.partitions | python3 -m json.tool
Expected Output:
{
    "name": "kafka.consumer.coordinator.assigned.partitions",
    "measurements": [
        {
            "statistic": "VALUE",
            "value": 80.0    # Should be > 0
        }
    ]
}
Warning: If value: 0.0, partitions are not assigned (authorization issue or configuration problem).

Check Connection Stability

Verify stable Kafka connections:
curl -s http://${INSTANCE_IP}:5080/actuator/metrics/kafka.consumer.connection.count | python3 -m json.tool
Expected Output:
{
    "name": "kafka.consumer.connection.count",
    "measurements": [
        {
            "statistic": "VALUE",
            "value": 32.0    # Should be stable and > 0
        }
    ]
}
Monitor for 5 minutes - connection count should remain stable (not constantly increasing/decreasing).

Check for Errors

Monitor application logs for errors:
# GCE
gcloud compute ssh hub-instance --zone=${REGION}-a --command="docker logs hub-instance --tail 100 2>&1 | grep -i error"

# GKE
kubectl logs -l app=hub --tail=100 | grep -i error

# Cloud Run
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=hub-instance AND severity>=ERROR" --limit=50
No errors expected - OAuth authentication should be silent when working correctly.

Troubleshooting

Common issues and solutions when deploying Hub with GCP Managed Kafka.

Kafka Streams Stuck in REBALANCING

Symptoms:
  • Health check shows state: REBALANCING indefinitely (> 2 minutes)
  • Actuator metrics show 0 assigned partitions
  • Failed rebalance count increasing
Causes and Solutions:

Cause 1: Missing IAM Permissions

Most common issue. Service account lacks required Managed Kafka IAM roles. Diagnosis:
# Check service account roles
gcloud projects get-iam-policy ${PROJECT_ID} \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:${SA_EMAIL}" \
  --format="table(bindings.role)"
Solution:
# Grant required roles
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/managedkafka.client"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/managedkafka.consumerGroupEditor"

# Restart application after granting roles
docker restart hub-instance
Verification: After granting roles, Kafka Streams should transition to RUNNING within 30 seconds.

Cause 2: Kafka Properties Not Passed to Streams

OAuth properties not reaching Kafka Streams configuration. Diagnosis: Check application logs for OAuth token errors:
docker logs hub-instance 2>&1 | grep -i "oauth\|token\|sasl"
Solution: Verify KafkaConfig.kStreamsConfigs() includes:
props.putAll(kafkaProperties.getProperties());  // Must be present!

Cause 3: Timeout Configuration Too Short

Consumer group coordination timing out due to short timeouts. Diagnosis: Check if timeouts are configured:
docker exec hub-instance env | grep -i timeout
Solution: Ensure application-gcp.yml has adequate timeouts:
spring:
  kafka:
    properties:
      session.timeout.ms: 300000        # Must be present
      max.poll.interval.ms: 300000      # Must be present
      rebalance.timeout.ms: 300000      # Must be present
GCP Managed Kafka has higher latencies than self-hosted Kafka. 5-minute timeouts (300000ms) are recommended.

Authentication Failures

Symptoms:
  • Application fails to start
  • Logs show OAuth token errors
  • Connection refused errors

Problem: OAuth Library Missing

Error Message:
java.lang.ClassNotFoundException: com.google.cloud.hosted.kafka.auth.GcpLoginCallbackHandler
Cause: OAuth authentication library not in classpath. Solution: Add dependency to build.gradle:
implementation 'com.google.cloud.hosted.kafka:managed-kafka-auth-login-handler:1.0.6'
Rebuild and redeploy:
./gradlew clean build
docker build -t hub:latest .

Problem: Application Default Credentials Not Found

Error Message:
com.google.auth.oauth2.DefaultCredentialsProvider: Unable to detect Application Default Credentials
Cause: No credentials available in environment. Solution: For GCE/GKE, attach service account:
# GCE
gcloud compute instances set-service-account INSTANCE_NAME \
  --service-account=${SA_EMAIL} \
  --zone=ZONE

# GKE - Use Workload Identity (see Deployment section)
For local testing:
gcloud auth application-default login

Problem: Service Account Lacks managedkafka.client Role

Error Message:
org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed
Cause: Service account can authenticate but lacks permission to connect to cluster. Solution:
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/managedkafka.client"

Connection Issues

Symptoms:
  • Cannot connect to bootstrap servers
  • Timeout errors
  • Network unreachable

Problem: Wrong Bootstrap Address

Error Message:
org.apache.kafka.common.errors.TimeoutException: Timeout waiting for bootstrap servers
Cause: Incorrect bootstrap server address or port. Solution: Verify bootstrap address:
gcloud managed-kafka clusters describe ${CLUSTER_NAME} \
  --location=${REGION} \
  --format="value(bootstrapAddress)"
OAuth uses port 9092 (not 9192):
spring.kafka.bootstrap-servers: bootstrap.cluster.region.managedkafka.project.cloud.goog:9092

Problem: Network Firewall Rules

Error Message:
java.net.ConnectException: Connection timed out
Cause: Firewall blocking egress to Managed Kafka. Solution: Check VPC firewall rules:
gcloud compute firewall-rules list --project=${PROJECT_ID}
Allow egress to Managed Kafka (default is allow):
gcloud compute firewall-rules create allow-managed-kafka \
  --direction=EGRESS \
  --action=ALLOW \
  --rules=tcp:9092 \
  --destination-ranges=0.0.0.0/0 \
  --network=default

Problem: Private Cluster Without Private Google Access

Cause: Managed Kafka cluster in VPC without Private Google Access enabled. Solution: Enable Private Google Access on subnet:
gcloud compute networks subnets update SUBNET_NAME \
  --enable-private-ip-google-access \
  --region=${REGION}

Problem: DNS Resolution Failure

Error Message:
java.net.UnknownHostException: bootstrap.cluster.region.managedkafka.project.cloud.goog
Cause: DNS not configured correctly. Solution: Test DNS resolution:
nslookup bootstrap.cluster.region.managedkafka.project.cloud.goog
dig bootstrap.cluster.region.managedkafka.project.cloud.goog
Ensure VM uses Google’s DNS (169.254.169.254 for metadata server).

Performance Issues

Problem: High Consumer Lag

Symptoms:
  • Messages not processed in time
  • Consumer lag increasing
Diagnosis:
# Check consumer lag via gcloud
gcloud managed-kafka consumer-groups describe GROUP_ID \
  --cluster=${CLUSTER_NAME} \
  --location=${REGION}
Solutions:
  1. Increase parallelism:
hub:
  streams:
    threadCount: 8    # Increase from default 4
  1. Increase partition count:
hub:
  topics:
    partitions: 40    # Increase from 20
  1. Scale horizontally (add more Hub instances)

Problem: Slow Rebalancing

Symptoms:
  • Rebalancing takes several minutes
  • Frequent rebalances disrupt processing
Solution: Increase rebalance timeout:
spring:
  kafka:
    properties:
      rebalance.timeout.ms: 600000    # Increase to 10 minutes
And ensure max.poll.interval.ms is adequate:
spring:
  kafka:
    properties:
      max.poll.interval.ms: 600000    # Increase if processing is slow

Best Practices

Use OAuth Over mTLS

Recommendation: Always use OAuth/SASL_SSL authentication for new deployments. Reasons:
  • Simpler operations: No certificate rotation or keystore management
  • IAM integration: Consistent permission model across GCP services
  • Automatic credential refresh: No manual key rotation
  • Better security: Short-lived OAuth tokens vs long-lived certificates
When to use mTLS:
  • Compliance requirements mandate certificate-based authentication
  • Migrating from existing mTLS setup (consider migrating to OAuth)

Configure Appropriate Timeouts

GCP Managed Kafka has different latency characteristics than self-hosted Kafka. Recommended Timeouts:
spring:
  kafka:
    properties:
      session.timeout.ms: 300000        # 5 minutes
      max.poll.interval.ms: 300000      # 5 minutes
      rebalance.timeout.ms: 300000      # 5 minutes
      request.timeout.ms: 30000         # 30 seconds
Why Higher Timeouts:
  • Multi-zone replication may introduce latency
  • OAuth token acquisition overhead
  • Managed service network path

Monitor Consumer Lag

Regularly monitor consumer lag to detect processing issues:
# Check lag via gcloud
gcloud managed-kafka consumer-groups describe hub-prod \
  --cluster=zenoo-hub-kafka \
  --location=us-central1

# Or use Cloud Monitoring
gcloud monitoring time-series list \
  --filter='metric.type="kafka.googleapis.com/consumer/lag"'
Set up Cloud Monitoring alerts for high lag:
gcloud alpha monitoring policies create \
  --display-name="High Kafka Consumer Lag" \
  --condition-display-name="Lag > 1000 messages" \
  --condition-threshold-value=1000 \
  --condition-threshold-duration=300s

Enable Topic Auto-Creation

Let Hub auto-create topics on startup:
spring:
  profiles:
    include:
      - create-topics    # Auto-create topics
This ensures topics exist before processing starts, preventing startup errors.

Use Separate Clusters per Environment

Create dedicated Managed Kafka clusters for each environment:
  • hub-dev-kafka - Development
  • hub-staging-kafka - Staging
  • hub-prod-kafka - Production
Benefits:
  • Isolation prevents cross-environment issues
  • Cost optimization (smaller dev clusters)
  • Independent scaling and configuration

Secure Bootstrap Address

Store bootstrap address as environment variable, not hardcoded:
# Bad - hardcoded
spring.kafka.bootstrap-servers: bootstrap.cluster.us-central1.managedkafka.project.cloud.goog:9092

# Good - environment variable
spring.kafka.bootstrap-servers: ${SPRING_KAFKA_BOOTSTRAP_SERVERS}
Set via:
  • GCE metadata
  • Kubernetes ConfigMap/Secret
  • Cloud Run environment variables

Regular IAM Audits

Quarterly review IAM permissions:
# List all service accounts with Managed Kafka access
gcloud projects get-iam-policy ${PROJECT_ID} \
  --flatten="bindings[].members" \
  --filter="bindings.role:roles/managedkafka.*" \
  --format="table(bindings.members, bindings.role)"
Remove unused service accounts and roles.

See Also