-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Describe the bug
aws-advanced-go-wrapper
produces Golang panic (crash) after cluster failover because the driver (awssql, efm plugin) is using a map unsafely across goroutines during failover monitoring. It’s a known class of bug (concurrent map iteration and map write). The fix is either upgrading to a newer driver release, disabling EFM failover monitoring, or patching the code with synchronization.
Expected Behavior
aws-advanced-go-wrapper
should not generates Golang panic when events occured in Aurora cluster.
What plugins are used? What other connection properties were set?
failover,efm
Current Behavior
aws-advanced-go-wrapper
produces Golang panic after cluster failover.
Backtrace:
fatal error: concurrent map iteration and map write
goroutine 486 [running]:
internal/runtime/maps.fatal({0x638be6?, 0x57e0c0?})
/usr/local/go/src/runtime/panic.go:1046 +0x20
internal/runtime/maps.(*Iter).Next(0x4000099f40?)
/usr/local/go/src/internal/runtime/maps/table.go:792 +0x98
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).newStateRun(0x40001d0f70)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:129 +0x224
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:73 +0x294
goroutine 1 [runnable]:
regexp/syntax.(*parser).checkSize(0x4000446a28?, 0x1fa1f8?)
/usr/local/go/src/regexp/syntax/parse.go:195 +0xf0
regexp/syntax.(*parser).checkLimits(0x4000510600, 0x4000505e30)
/usr/local/go/src/regexp/syntax/parse.go:166 +0x34
regexp/syntax.(*parser).push(0x4000510600, 0x4000505e30)
/usr/local/go/src/regexp/syntax/parse.go:326 +0x308
regexp/syntax.(*parser).alternate(0x4000510600)
/usr/local/go/src/regexp/syntax/parse.go:519 +0x164
regexp/syntax.parse({0x622eca, 0x3}, 0xd4)
/usr/local/go/src/regexp/syntax/parse.go:1090 +0xcb4
regexp/syntax.Parse(...)
/usr/local/go/src/regexp/syntax/parse.go:888
regexp.compile({0x622eca, 0x3}, 0x0?, 0x0)
/usr/local/go/src/regexp/regexp.go:168 +0x30
regexp.Compile(...)
/usr/local/go/src/regexp/regexp.go:131
regexp.MustCompile({0x622eca, 0x3})
/usr/local/go/src/regexp/regexp.go:311 +0x30
github.com/aws/aws-advanced-go-wrapper/awssql/utils.parseMultiStatementQueries({0x644b3a, 0x41})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sql_method_utils.go:81 +0x38
github.com/aws/aws-advanced-go-wrapper/awssql/utils.GetSeparateSqlStatements({0x644b3a?, 0x4f7b69?})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sql_method_utils.go:60 +0x24
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginServiceImpl).UpdateState(0x40001c8000, {0x644b3a?, 0x4000446ea8?}, {0x0?, 0x702a60?, 0xb869e0?})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_service.go:535 +0x84
github.com/aws/aws-advanced-go-wrapper/awssql/driver.(*AwsWrapperConn).QueryContext.func1()
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/driver.go:219 +0x64
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func2()
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:260 +0xec
github.com/aws/aws-advanced-go-wrapper/awssql/plugins.(*DefaultPlugin).Execute(0x40000b3dc0, {0x700010, 0x400030c480}, {0x62985c, 0x11}, 0x0?, {0x4000496540, 0x1, 0x1})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/default_plugin.go:55 +0x44
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x705f08, 0x40000b3dc0}, 0x40004aa600)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func1(0x0?, 0x40002d6d80?)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:57 +0x34
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2.1()
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x28
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*HostMonitorConnectionPlugin).Execute(0x40001727e0, {0x0?, 0x0?}, {0x62985c, 0x11}, 0x4000482480, {0x0?, 0x136668?, 0x4000447338?})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/host_monitoring_plugin.go:144 +0x2b8
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x706040, 0x40001727e0}, 0x4000482480)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2(0x40000f6af0, 0x40004aa600)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x9c
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2.1()
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x28
github.com/aws/aws-advanced-go-wrapper/awssql/plugins.(*FailoverPlugin).Execute(0x40000fa0f0, {0x0?, 0x0?}, {0x62985c?, 0x4000447528?}, 0x0?, {0x0?, 0x62985c?, 0x4000447528?})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/failover_plugin.go:255 +0x78
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute.func1({0x705f70, 0x40000fa0f0}, 0x4000482460)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:250 +0x160
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).makePluginChain.(*PluginChain).ExecAddToHead.func2(0x40000f6af0, 0x40004aa600)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:62 +0x9c
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginChain).Execute(0x57cc80?, 0x40001772c0?, 0x62985c?)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:93 +0x118
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).executeWithSubscribedPlugins(0x40000f0240, {0x62985c, 0x11}, 0x40000f6af0, 0x40004aa600)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:274 +0xc8
github.com/aws/aws-advanced-go-wrapper/awssql/plugin_helpers.(*PluginManagerImpl).Execute(0x40000f0240, {0x700010, 0x400030c480}, {0x62985c, 0x11}, 0x40000f6aa0, {0x4000496540, 0x1, 0x1})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugin_helpers/plugin_manager.go:262 +0x1b0
github.com/aws/aws-advanced-go-wrapper/awssql/driver.ExecuteWithPlugins({0x700010, 0x400030c480}, {0x707758, 0x40000f0240}, {0x62985c, 0x11}, 0x40000f6aa0, {0x4000496540, 0x1, 0x1})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/wrapper_utils.go:46 +0x194
github.com/aws/aws-advanced-go-wrapper/awssql/driver.queryWithPlugins({0x700010, 0x400030c480}, {0x707758, 0x40000f0240}, {0x62985c?, 0x40?}, 0x4000061808?, {0x622af1, 0x2}, {0x4000496540?, ...})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/wrapper_utils.go:63 +0x4c
github.com/aws/aws-advanced-go-wrapper/awssql/driver.(*AwsWrapperConn).QueryContext(0x40001e03c0, {0x701080, 0xb869e0}, {0x644b3a, 0x41}, {0xb869e0, 0x0, 0x0})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver/driver.go:227 +0x144
database/sql.ctxDriverQuery({0x701080?, 0xb869e0?}, {0xffff5e0cf900?, 0x40001e03c0?}, {0x0?, 0x0?}, {0x644b3a?, 0x40001c8000?}, {0xb869e0?, 0x5f14c0?, ...})
/usr/local/go/src/database/sql/ctxutil.go:48 +0xac
database/sql.(*DB).queryDC.func1()
/usr/local/go/src/database/sql/sql.go:1786 +0xe0
database/sql.withLock({0x6fd5b8, 0x40001b6880}, 0x4000447c98)
/usr/local/go/src/database/sql/sql.go:3572 +0x74
database/sql.(*DB).queryDC(0x4000510580?, {0x701080, 0xb869e0}, {0x701248, 0x40000f6a50}, 0x40001b6880, 0x4000496530, {0x644b3a, 0x41}, {0x0, ...})
/usr/local/go/src/database/sql/sql.go:1781 +0x11c
database/sql.(*Tx).QueryContext(0x4000510580, {0x701080, 0xb869e0}, {0x644b3a, 0x41}, {0x0, 0x0, 0x0})
/usr/local/go/src/database/sql/sql.go:2535 +0x90
database/sql.(*Tx).QueryRowContext(...)
/usr/local/go/src/database/sql/sql.go:2553
database/sql.(*Tx).QueryRow(0x5f5e100?, {0x644b3a?, 0xb869e0?}, {0x0?, 0x0?, 0x0?})
/usr/local/go/src/database/sql/sql.go:2567 +0x48
main.main()
/home/ssm-user/lab/main.go:69 +0x2a8
goroutine 22 [select, 1 minutes]:
database/sql.(*DB).connectionOpener(0x4000175520, {0x701248, 0x40000f6640})
/usr/local/go/src/database/sql/sql.go:1261 +0x80
created by database/sql.OpenDB in goroutine 1
/usr/local/go/src/database/sql/sql.go:841 +0x114
goroutine 23 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/utils.(*SlidingExpirationCache[...]).cleanupExpiredItems(0x70d9e0, {0x701248, 0x40000f6730})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:186 +0x70
created by github.com/aws/aws-advanced-go-wrapper/awssql/utils.NewSlidingExpirationCache[...] in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:61 +0x264
goroutine 32 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/utils.(*SlidingExpirationCache[...]).cleanupExpiredItems(0x70db60, {0x701248, 0x40000f6c80})
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:186 +0x70
created by github.com/aws/aws-advanced-go-wrapper/awssql/utils.NewSlidingExpirationCache[...] in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/utils/sliding_expiration_cache.go:61 +0x264
goroutine 8 [sleep]:
time.Sleep(0x2faf080)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).delay(0x4000208a20, 0x0?)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:336 +0xd4
github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).Run(0x4000208a20, 0xb86e80)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:514 +0x960
created by github.com/aws/aws-advanced-go-wrapper/awssql/driver_infrastructure.(*ClusterTopologyMonitorImpl).Start in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/driver_infrastructure/cluster_topology_monitor.go:121 +0x108
goroutine 15 [sleep]:
time.Sleep(0x3b9aca00)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).newStateRun(0x40002165b0)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:146 +0x10c
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:73 +0x294
goroutine 16 [sleep]:
time.Sleep(0x5f5e100)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).run(0x40002165b0)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:158 +0x128
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:74 +0x2d4
goroutine 487 [sleep]:
time.Sleep(0x5f5e100)
/usr/local/go/src/runtime/time.go:363 +0x150
github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.(*MonitorImpl).run(0x40001d0f70)
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:158 +0x128
created by github.com/aws/aws-advanced-go-wrapper/awssql/plugins/efm.NewMonitorImpl in goroutine 1
/home/ssm-user/go/pkg/mod/github.com/aws/aws-advanced-go-wrapper/awssql@v1.0.0/plugins/efm/monitor.go:74 +0x2d4
goroutine 1515 [runnable]:
database/sql.(*DB).beginDC.gowrap1()
/usr/local/go/src/database/sql/sql.go:1925
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4
created by database/sql.(*DB).beginDC in goroutine 1
/usr/local/go/src/database/sql/sql.go:1925 +0x174
exit status 2
Full trace:
Reproduction Steps
- Create an Aurora cluster (serverless for convenience)
# Set parameters
MASTER_USER_PASSWORD=TO_BE_REPLACED
VPC_SECURITY_GROUP_IDS=TO_BE_REPLACED
DB_SUBNET_GROUP_NAME=O_BE_REPLACED
# Create Aurora cluster
aws rds create-db-cluster \
--db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
--engine aurora-postgresql \
--engine-version 16.8 \
--db-subnet-group-name ${DB_SUBNET_GROUP_NAME} \
--vpc-security-group-ids ${VPC_SECURITY_GROUP_IDS} \
--master-username ${MASTER_USER_NAME-admin} \
--master-user-password ${MASTER_USER_PASSWORD} \
--storage-encrypted \
--deletion-protection \
--backup-retention-period 7 \
--no-deletion-protection \
--serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=4
# Create Aurora instances
aws rds create-db-instance \
--db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
--db-instance-identifier db1 \
--engine aurora-postgresql \
--engine-version 16.8 \
--db-instance-class db.serverless \
aws rds create-db-instance \
--db-cluster-identifier ${DB_CLUSTER_IDENTIFIER-cluster1} \
--db-instance-identifier db2 \
--engine aurora-postgresql \
--engine-version 16.8 \
--db-instance-class db.serverless
- Create PostgreSQL table
psql -h <AURORA ENDPOINT> -U admin postgres -c "CREATE TABLE test (id bigserial, created_at timestamp);"
- Init Golang project
go mod init lab
go get github.com/aws/aws-advanced-go-wrapper/pgx-driver@latest
- Add sample code
package main
import (
"database/sql"
"flag"
"fmt"
"log"
"log/slog"
"time"
_ "github.com/aws/aws-advanced-go-wrapper/pgx-driver"
_ "github.com/lib/pq"
)
func main() {
slog.SetLogLoggerLevel(slog.LevelDebug)
dsnFlag := flag.String("dsn", "", "PostgreSQL connection string")
flag.Parse()
var pgDsn string
if *dsnFlag != "" {
pgDsn = *dsnFlag
} else {
log.Fatalf("No DSN provided. Use -dsn flag to provide a DSN.")
}
// Open connection
//db, err := sql.Open("postgres", pgDsn)
db, err := sql.Open("awssql-pgx", pgDsn)
if err != nil {
log.Fatalf("failed to open PostgreSQL connection: %v", err)
}
defer db.Close()
// Verify connection
if err := db.Ping(); err != nil {
log.Fatalf("failed to ping PostgreSQL: %v", err)
}
fmt.Println("Connected to PostgreSQL!")
// Insert data continuously
for {
time.Sleep(time.Millisecond * 100)
var now string
tx, err := db.Begin()
if err != nil {
slog.Error("failed to begin transaction", "error", err)
continue
}
err = tx.QueryRow("INSERT INTO test (created_at) VALUES (now()) RETURNING created_at").Scan(&now)
if err != nil {
tx.Rollback()
slog.Error("insert failed", "error", err)
continue
}
if err := tx.Commit(); err != nil {
tx.Rollback()
slog.Error("failed to commit transaction", "error", err)
continue
}
fmt.Printf(".")
}
}
- Launch application in a shell
go run . -dsn "host=<AURORA Endpoint> port=5432 user=admin password=${MASTER_USER_PASSWORD} dbname=postgres"
- Generate Aurora failover in a second shell
Note: Issues seems to be a race condition, you may need to repeat this step.
aws rds failover-db-cluster --db-cluster-identifier cluster1 --target-db-instance-identifier db2
-
Look at backtrace
-
Cleanup
# Delete Aurora cluster
aws rds delete-db-instance --db-instance-identifier db1 --skip-final-snapshot
aws rds delete-db-instance --db-instance-identifier db2 --skip-final-snapshot
aws rds delete-db-cluster --db-cluster-identifier cluster1 --skip-final-snapshot
Possible Solution
Implement concurrency-safe map iteration
Additional Information/Context
Timeline
- 08:54:41 → Initial connection to Aurora PostgreSQL cluster succeeds. Monitoring routines for topology are started (FailoverPlugin, ClusterTopologyMonitor, etc.).
- 08:55:04 → A failover occurs:
- Writer db1 goes down, client detects unexpected EOF.
- Plugin starts writer failover procedure, finds that db2 has become the new writer.
- Transaction rollback fails (tx is closed).
- 08:55:05 → New writer db2 is confirmed. Monitoring routines are restarted.
- 08:55:59 → While the monitoring code is executing (Conn.QueryContext), a concurrent map read+write happens inside efm.MonitorImpl.
That triggered the fatal runtime panic.
Investigation
- The AWS Aurora Go wrapper driver, specifically the Enhanced Failover Monitoring (EFM) plugin spawns multiple goroutines for monitoring cluster topology (efm.MonitorImpl, ClusterTopologyMonitorImpl).
- These goroutines share a map that tracks hosts / states.
- They don’t synchronize access properly.
- During failover (a stressful moment with frequent updates), one goroutine writes to the map while another is iterating → boom.
The AWS Advanced Go Wrapper version used
1.0.0
Go version used
1.25.1
Operating System and version
Linux 6.1.128-136.201.amzn2023.aarch64