Skip to content

golang panic on failover #242

@vmercierfr

Description

@vmercierfr

Describe the bug

aws-advanced-go-wrapper produces Golang panic (crash) after cluster failover because the driver (awssql, efm plugin) is using a map unsafely across goroutines during failover monitoring. It’s a known class of bug (concurrent map iteration and map write). The fix is either upgrading to a newer driver release, disabling EFM failover monitoring, or patching the code with synchronization.

Expected Behavior

aws-advanced-go-wrapper should not generates Golang panic when events occured in Aurora cluster.

What plugins are used? What other connection properties were set?

failover,efm

Current Behavior

aws-advanced-go-wrapper produces Golang panic after cluster failover.

Backtrace:

fatal error: concurrent map iteration and map write

goroutine 486 [running]:
internal/runtime/maps.fatal({0x638be6?, 0x57e0c0?})
	/usr/local/go/src/runtime/panic.go:1046 +0x20
internal/runtime/maps.(*Iter).Next(0x4000099f40?)
	/usr/local/go/src/internal/runtime/maps/table.go:792 +0x98
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).newStateRun(0x40001d0f70)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:129 +0x224
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:73 +0x294

goroutine 1 [runnable]:
regexp/syntax.(*parser).checkSize(0x4000446a28?, 0x1fa1f8?)
	/usr/local/go/src/regexp/syntax/parse.go:195 +0xf0
regexp/syntax.(*parser).checkLimits(0x4000510600, 0x4000505e30)
	/usr/local/go/src/regexp/syntax/parse.go:166 +0x34
regexp/syntax.(*parser).push(0x4000510600, 0x4000505e30)
	/usr/local/go/src/regexp/syntax/parse.go:326 +0x308
regexp/syntax.(*parser).alternate(0x4000510600)
	/usr/local/go/src/regexp/syntax/parse.go:519 +0x164
regexp/syntax.parse({0x622eca, 0x3}, 0xd4)
	/usr/local/go/src/regexp/syntax/parse.go:1090 +0xcb4
regexp/syntax.Parse(...)
	/usr/local/go/src/regexp/syntax/parse.go:888
regexp.compile({0x622eca, 0x3}, 0x0?, 0x0)
	/usr/local/go/src/regexp/regexp.go:168 +0x30
regexp.Compile(...)
	/usr/local/go/src/regexp/regexp.go:131
regexp.MustCompile({0x622eca, 0x3})
	/usr/local/go/src/regexp/regexp.go:311 +0x30
github.com/aws/aws-advanced-go-wrapper/awssql/utils.parseMultiStatementQueries({0x644b3a, 0x41})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sql_method_utils.go:81 +0x38
github.com/aws/aws-advanced-go-wrapper/awssql/utils.GetSeparateSqlStatements({0x644b3a?, 0x4f7b69?})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sql_method_utils.go:60 +0x24
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginServiceImpl).UpdateState(0x40001c8000, {0x644b3a?, 0x4000446ea8?}, {0x0?, 0x702a60?, 0xb869e0?})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_service.go:535 +0x84
github.com/aws/aws-advanced-go-wrapper/awssql/driver.(*AwsWrapperConn).QueryContext.func1()
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/driver.go:219 +0x64
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func2()
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:260 +0xec
github.com/aws/aws-advanced-go-wrapper/awssql/plugins.(*DefaultPlugin).Execute(0x40000b3dc0, {0x700010, 0x400030c480}, {0x62985c, 0x11}, 0x0?, {0x4000496540, 0x1, 0x1})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/default_plugin.go:55 +0x44
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x705f08, 0x40000b3dc0}, 0x40004aa600)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func1(0x0?, 0x40002d6d80?)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:57 +0x34
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2.1()
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x28
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*HostMonitorConnectionPlugin).Execute(0x40001727e0, {0x0?, 0x0?}, {0x62985c, 0x11}, 0x4000482480, {0x0?, 0x136668?, 0x4000447338?})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/host_monitoring_plugin.go:144 +0x2b8
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x706040, 0x40001727e0}, 0x4000482480)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2(0x40000f6af0, 0x40004aa600)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x9c
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2.1()
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x28
github.com/aws/aws-advanced-go-wrapper/awssql/plugins.(*FailoverPlugin).Execute(0x40000fa0f0, {0x0?, 0x0?}, {0x62985c?, 0x4000447528?}, 0x0?, {0x0?, 0x62985c?, 0x4000447528?})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/failover_plugin.go:255 +0x78
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x705f70, 0x40000fa0f0}, 0x4000482460)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2(0x40000f6af0, 0x40004aa600)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x9c
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginChain).Execute(0x57cc80?, 0x40001772c0?, 0x62985c?)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:93 +0x118
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).executeWithSubscribedPlugins(0x40000f0240, {0x62985c, 0x11}, 0x40000f6af0, 0x40004aa600)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:274 +0xc8
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute(0x40000f0240, {0x700010, 0x400030c480}, {0x62985c, 0x11}, 0x40000f6aa0, {0x4000496540, 0x1, 0x1})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:262 +0x1b0
github.com/aws/aws-advanced-go-wrapper/awssql/driver.ExecuteWithPlugins({0x700010, 0x400030c480}, {0x707758, 0x40000f0240}, {0x62985c, 0x11}, 0x40000f6aa0, {0x4000496540, 0x1, 0x1})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/wrapper_utils.go:46 +0x194
github.com/aws/aws-advanced-go-wrapper/awssql/driver.queryWithPlugins({0x700010, 0x400030c480}, {0x707758, 0x40000f0240}, {0x62985c?, 0x40?}, 0x4000061808?, {0x622af1, 0x2}, {0x4000496540?, ...})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/wrapper_utils.go:63 +0x4c
github.com/aws/aws-advanced-go-wrapper/awssql/driver.(*AwsWrapperConn).QueryContext(0x40001e03c0, {0x701080, 0xb869e0}, {0x644b3a, 0x41}, {0xb869e0, 0x0, 0x0})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/driver.go:227 +0x144
database/sql.ctxDriverQuery({0x701080?, 0xb869e0?}, {0xffff5e0cf900?, 0x40001e03c0?}, {0x0?, 0x0?}, {0x644b3a?, 0x40001c8000?}, {0xb869e0?, 0x5f14c0?, ...})
	/usr/local/go/src/database/sql/ctxutil.go:48 +0xac
database/sql.(*DB).queryDC.func1()
	/usr/local/go/src/database/sql/sql.go:1786 +0xe0
database/sql.withLock({0x6fd5b8, 0x40001b6880}, 0x4000447c98)
	/usr/local/go/src/database/sql/sql.go:3572 +0x74
database/sql.(*DB).queryDC(0x4000510580?, {0x701080, 0xb869e0}, {0x701248, 0x40000f6a50}, 0x40001b6880, 0x4000496530, {0x644b3a, 0x41}, {0x0, ...})
	/usr/local/go/src/database/sql/sql.go:1781 +0x11c
database/sql.(*Tx).QueryContext(0x4000510580, {0x701080, 0xb869e0}, {0x644b3a, 0x41}, {0x0, 0x0, 0x0})
	/usr/local/go/src/database/sql/sql.go:2535 +0x90
database/sql.(*Tx).QueryRowContext(...)
	/usr/local/go/src/database/sql/sql.go:2553
database/sql.(*Tx).QueryRow(0x5f5e100?, {0x644b3a?, 0xb869e0?}, {0x0?, 0x0?, 0x0?})
	/usr/local/go/src/database/sql/sql.go:2567 +0x48
main.main()
	/home/ssm-user/lab/main.go:69 +0x2a8

goroutine 22 [select, 1 minutes]:
database/sql.(*DB).connectionOpener(0x4000175520, {0x701248, 0x40000f6640})
	/usr/local/go/src/database/sql/sql.go:1261 +0x80
created by database/sql.OpenDB in goroutine 1
	/usr/local/go/src/database/sql/sql.go:841 +0x114

goroutine 23 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/utils.(*SlidingExpirationCache[...]).cleanupExpiredItems(0x70d9e0, {0x701248, 0x40000f6730})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:186 +0x70
created by github.com/aws/aws-advanced-go-wrapper/awssql/utils.NewSlidingExpirationCache[...] in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:61 +0x264

goroutine 32 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/utils.(*SlidingExpirationCache[...]).cleanupExpiredItems(0x70db60, {0x701248, 0x40000f6c80})
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:186 +0x70
created by github.com/aws/aws-advanced-go-wrapper/awssql/utils.NewSlidingExpirationCache[...] in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:61 +0x264

goroutine 8 [sleep]:
time.Sleep(0x2faf080)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).delay(0x4000208a20, 0x0?)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:336 +0xd4
github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).Run(0x4000208a20, 0xb86e80)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:514 +0x960
created by github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).Start in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:121 +0x108

goroutine 15 [sleep]:
time.Sleep(0x3b9aca00)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).newStateRun(0x40002165b0)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:146 +0x10c
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:73 +0x294

goroutine 16 [sleep]:
time.Sleep(0x5f5e100)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).run(0x40002165b0)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:158 +0x128
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:74 +0x2d4

goroutine 487 [sleep]:
time.Sleep(0x5f5e100)
	/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).run(0x40001d0f70)
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:158 +0x128
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
	/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:74 +0x2d4

goroutine 1515 [runnable]:
database/sql.(*DB).beginDC.gowrap1()
	/usr/local/go/src/database/sql/sql.go:1925
runtime.goexit({})
	/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4
created by database/sql.(*DB).beginDC in goroutine 1
	/usr/local/go/src/database/sql/sql.go:1925 +0x174
exit status 2

Full trace:

very_simplified_trace.txt

Reproduction Steps

  1. Create an Aurora cluster (serverless for convenience)
# Set parameters
MASTER_USER_PASSWORD=TO_BE_REPLACED
VPC_SECURITY_GROUP_IDS=TO_BE_REPLACED
DB_SUBNET_GROUP_NAME=O_BE_REPLACED

# Create Aurora cluster
aws rds create-db-cluster \
  --db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
  --engine aurora-postgresql \
  --engine-version 16.8 \
  --db-subnet-group-name ${DB_SUBNET_GROUP_NAME} \
  --vpc-security-group-ids ${VPC_SECURITY_GROUP_IDS} \
  --master-username ${MASTER_USER_NAME-admin} \
  --master-user-password ${MASTER_USER_PASSWORD} \
  --storage-encrypted \
  --deletion-protection \
  --backup-retention-period 7 \
  --no-deletion-protection \
  --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=4

# Create Aurora instances
aws rds create-db-instance \
  --db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
  --db-instance-identifier db1 \
  --engine aurora-postgresql \
  --engine-version 16.8 \
  --db-instance-class db.serverless \

aws rds create-db-instance \
  --db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
  --db-instance-identifier db2 \
  --engine aurora-postgresql \
  --engine-version 16.8 \
  --db-instance-class db.serverless
  1. Create PostgreSQL table
psql -h <AURORA ENDPOINT> -U admin postgres -c "CREATE TABLE test (id bigserial, created_at timestamp);"
  1. Init Golang project
go mod init lab
go get github.com/aws/aws-advanced-go-wrapper/pgx-driver@latest
  1. Add sample code
package main

import (
	"database/sql"
	"flag"
	"fmt"
	"log"
	"log/slog"
	"time"

	_ "github.com/aws/aws-advanced-go-wrapper/pgx-driver"
	_ "github.com/lib/pq"
)

func main() {
	slog.SetLogLoggerLevel(slog.LevelDebug)

	dsnFlag := flag.String("dsn", "", "PostgreSQL connection string")
	flag.Parse()

	var pgDsn string
	if *dsnFlag != "" {
		pgDsn = *dsnFlag
	} else {
		log.Fatalf("No DSN provided. Use -dsn flag to provide a DSN.")
	}

	// Open connection
	//db, err := sql.Open("postgres", pgDsn)
	db, err := sql.Open("awssql-pgx", pgDsn)
	if err != nil {
		log.Fatalf("failed to open PostgreSQL connection: %v", err)
	}
	defer db.Close()

	// Verify connection
	if err := db.Ping(); err != nil {
		log.Fatalf("failed to ping PostgreSQL: %v", err)
	}
	fmt.Println("Connected to PostgreSQL!")

	// Insert data continuously
	for {
		time.Sleep(time.Millisecond * 100)

		var now string

		tx, err := db.Begin()
		if err != nil {
			slog.Error("failed to begin transaction", "error", err)
			continue
		}

		err = tx.QueryRow("INSERT INTO test (created_at) VALUES (now()) RETURNING created_at").Scan(&now)
		if err != nil {
			tx.Rollback()
			slog.Error("insert failed", "error", err)
			continue
		}

		if err := tx.Commit(); err != nil {
			tx.Rollback()
			slog.Error("failed to commit transaction", "error", err)
			continue
		}

		fmt.Printf(".")
	}
}
  1. Launch application in a shell
go run . -dsn  "host=<AURORA Endpoint> port=5432 user=admin password=${MASTER_USER_PASSWORD} dbname=postgres"
  1. Generate Aurora failover in a second shell

Note: Issues seems to be a race condition, you may need to repeat this step.

aws rds failover-db-cluster --db-cluster-identifier cluster1 --target-db-instance-identifier db2
  1. Look at backtrace

  2. Cleanup

# Delete Aurora cluster
aws rds delete-db-instance --db-instance-identifier db1 --skip-final-snapshot
aws rds delete-db-instance --db-instance-identifier db2 --skip-final-snapshot
aws rds delete-db-cluster --db-cluster-identifier cluster1 --skip-final-snapshot

Possible Solution

Implement concurrency-safe map iteration

Additional Information/Context

Timeline

  • 08:54:41 → Initial connection to Aurora PostgreSQL cluster succeeds. Monitoring routines for topology are started (FailoverPlugin, ClusterTopologyMonitor, etc.).
  • 08:55:04 → A failover occurs:
    • Writer db1 goes down, client detects unexpected EOF.
    • Plugin starts writer failover procedure, finds that db2 has become the new writer.
    • Transaction rollback fails (tx is closed).
  • 08:55:05 → New writer db2 is confirmed. Monitoring routines are restarted.
  • 08:55:59 → While the monitoring code is executing (Conn.QueryContext), a concurrent map read+write happens inside efm.MonitorImpl.

That triggered the fatal runtime panic.

Investigation

  • The AWS Aurora Go wrapper driver, specifically the Enhanced Failover Monitoring (EFM) plugin spawns multiple goroutines for monitoring cluster topology (efm.MonitorImpl, ClusterTopologyMonitorImpl).
  • These goroutines share a map that tracks hosts / states.
  • They don’t synchronize access properly.
  • During failover (a stressful moment with frequent updates), one goroutine writes to the map while another is iterating → boom.

The AWS Advanced Go Wrapper version used

1.0.0

Go version used

1.25.1

Operating System and version

Linux 6.1.128-136.201.amzn2023.aarch64

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions