Postgres Flex Alpha Instance Creation Fails with 503 Service Unavailable While Portal Shows READY #64

Closed
opened 2026-02-16 07:22:39 +00:00 by sven.schmidt · 1 comment

Description

Creating a PostgreSQL Flex Alpha instance via the Terraform provider intermittently fails with a 503 Service Unavailable error during the wait phase.

The Terraform apply process aborts with a 503 error, although the instance appears as READY in the STACKIT portal.

This suggests that the provider does not properly handle transient API failures or upstream connection resets during the polling/wait operation.

Steps to reproduce

/*Copyright 2025 STACKIT GmbH & Co. KG <maintainer.email@stackit.cloud>

Use of this source code is governed by an MIT-style
license that can be found in the LICENSE file or at
https://opensource.org/licenses/MIT.*/

resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance" {
  project_id      = var.tcc_project_id
  name            = "schmidtsv-tcc-pgflex"
  backup_schedule = "0 2 * * *"
  encryption = {
    kek_key_version = "1"
    kek_key_id      = stackit_kms_key.key.key_id
    kek_key_ring_id = stackit_kms_keyring.tcc_keyring.keyring_id
    service_account = var.tcc_service_account_email
  }
  storage = {
    performance_class = "premium-perf6-stackit"
    size              = 5
  }
  flavor_id      = "2.16"
  retention_days = 55
  replicas       = 1
  network = {
    access_scope = "SNA"
    acl          = ["10.77.0.0/16"]
  }
  version = 14
}


output "postgres_instance_id" {
  value = stackitprivatepreview_postgresflexalpha_instance.tcc_test_instance.instance_id
}
  1. Run terraform init
  2. Run terraform apply
  3. Wait for instance creation
  4. Observe failure during wait handler

Actual behavior

stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Still creating... [05m20s elapsed]
stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Still creating... [05m30s elapsed]
stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Creation complete after 5m33s [id=c4c357a4-9ac6-4343-a385-3ebbe929eec5]
stackit_routing_table_route.tcc_rt_sql_server: Creating...
stackit_routing_table_route.tcc_rt_sql_server: Creation complete after 0s [id=f7aaa814-a87a-4acd-9708-5ed910145922,***,06aed55a-6ab1-4e89-a95c-6cd16045cee7,27c8b797-7e5b-4acf-aed4-b049b19eb871,3fad59ea-6a7b-400c-9159-0ea90d88633b]
╷
│ Warning: stackit_routing_table_route is part of the routing-tables experiment.
│ 
│   with stackit_routing_table_route.tcc_rt_sql_server,
│   on 04-routing-tables.tf line 7, in resource "stackit_routing_table_route" "tcc_rt_sql_server":
│    7: resource "stackit_routing_table_route" "tcc_rt_sql_server" {
│ 
│ This resource is part of the routing-tables experiment and is likely going
│ to undergo significant changes or be removed in the future. Use it at your
│ own discretion.
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Error: Error creating instance
│ 
│   with stackitprivatepreview_postgresflexalpha_instance.tcc_test_instance,
│   on 03-postgresql.tf line 7, in resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance":
│    7: resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance" {
│ 
│ Wait handler error: 503 Service Unavailable, status code 503, Body:
│ upstream connect error or disconnect/reset before headers. reset reason:
│ connection termination
│ 
│ Trace ID: "2c93f2ccef97d4642fe216476ed738e8"

Expected behavior

The provider should:

  • Treat transient 503 errors during polling as retryable
  • Implement proper exponential backoff and retry logic
  • Continue waiting instead of aborting immediately
  • Only fail if the operation definitively fails

If the instance ultimately reaches READY, Terraform should complete successfully.

## Description Creating a PostgreSQL Flex Alpha instance via the Terraform provider intermittently fails with a 503 Service Unavailable error during the wait phase. The Terraform apply process aborts with a 503 error, although the instance appears as READY in the STACKIT portal. This suggests that the provider does not properly handle transient API failures or upstream connection resets during the polling/wait operation. ## Steps to reproduce <!-- Please add an example terraform config below which helps us reproduce the behavior. --> ```terraform /*Copyright 2025 STACKIT GmbH & Co. KG <maintainer.email@stackit.cloud> Use of this source code is governed by an MIT-style license that can be found in the LICENSE file or at https://opensource.org/licenses/MIT.*/ resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance" { project_id = var.tcc_project_id name = "schmidtsv-tcc-pgflex" backup_schedule = "0 2 * * *" encryption = { kek_key_version = "1" kek_key_id = stackit_kms_key.key.key_id kek_key_ring_id = stackit_kms_keyring.tcc_keyring.keyring_id service_account = var.tcc_service_account_email } storage = { performance_class = "premium-perf6-stackit" size = 5 } flavor_id = "2.16" retention_days = 55 replicas = 1 network = { access_scope = "SNA" acl = ["10.77.0.0/16"] } version = 14 } output "postgres_instance_id" { value = stackitprivatepreview_postgresflexalpha_instance.tcc_test_instance.instance_id } ``` <!-- Please provide us with the steps to reproduce the behavior. --> 1. Run terraform init 2. Run terraform apply 3. Wait for instance creation 4. Observe failure during wait handler ## Actual behavior ```shell stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Still creating... [05m20s elapsed] stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Still creating... [05m30s elapsed] stackitprivatepreview_sqlserverflexbeta_instance.tcc_test_instance: Creation complete after 5m33s [id=c4c357a4-9ac6-4343-a385-3ebbe929eec5] stackit_routing_table_route.tcc_rt_sql_server: Creating... stackit_routing_table_route.tcc_rt_sql_server: Creation complete after 0s [id=f7aaa814-a87a-4acd-9708-5ed910145922,***,06aed55a-6ab1-4e89-a95c-6cd16045cee7,27c8b797-7e5b-4acf-aed4-b049b19eb871,3fad59ea-6a7b-400c-9159-0ea90d88633b] ╷ │ Warning: stackit_routing_table_route is part of the routing-tables experiment. │ │ with stackit_routing_table_route.tcc_rt_sql_server, │ on 04-routing-tables.tf line 7, in resource "stackit_routing_table_route" "tcc_rt_sql_server": │ 7: resource "stackit_routing_table_route" "tcc_rt_sql_server" { │ │ This resource is part of the routing-tables experiment and is likely going │ to undergo significant changes or be removed in the future. Use it at your │ own discretion. │ │ (and one more similar warning elsewhere) ╵ ╷ │ Error: Error creating instance │ │ with stackitprivatepreview_postgresflexalpha_instance.tcc_test_instance, │ on 03-postgresql.tf line 7, in resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance": │ 7: resource "stackitprivatepreview_postgresflexalpha_instance" "tcc_test_instance" { │ │ Wait handler error: 503 Service Unavailable, status code 503, Body: │ upstream connect error or disconnect/reset before headers. reset reason: │ connection termination │ │ Trace ID: "2c93f2ccef97d4642fe216476ed738e8" ``` ## Expected behavior The provider should: - Treat transient 503 errors during polling as retryable - Implement proper exponential backoff and retry logic - Continue waiting instead of aborting immediately - Only fail if the operation definitively fails If the instance ultimately reaches READY, Terraform should complete successfully.
andre.harms was assigned by marcel.henselin 2026-02-16 07:37:13 +00:00

resolved with #64

resolved with #64
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: stackit-dev-tools/terraform-provider-stackitprivatepreview#64
No description provided.