Profiling Golang for Terraform Provider
TL;DR: Profiling a Terraform provider revealed a performance bottleneck caused by repeated d.Get("task") calls inside a DiffSuppressFunc. The fix, implemented in PR #44543, introduces a singleTask flag to avoid repeated expensive calls, reducing plan time from ~15 minutes to seconds.
A colleague told me she had issues deploying aws_appflow_flow, a Terraform resource for AWS.
Every time she ran it, Terraform indicated that all task blocks were being modified, even though nothing had changed.
She shared a truncated Terraform plan, but running a full plan would take ~15 minutes — which was clearly unusual.
I tried to replicate the problem with a minimal resource.
Even without a pre-existing resource, terraform plan was extremely slow when there were hundreds of tasks.
Verifying the provider is at fault
Since Terraform runs providers in separate processes, we can confirm the provider is causing the slowdown:
''' PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1687211 matthew 20 0 2625140 338656 219444 S 113.3 5.6 1:02.02 /home/matthew/.terraform.d/plugins/terraform-provider-aws '''
A standard provider resource has four main functions:
- Read
- Create
- Update
- Delete
Even during a plain plan without a pre-existing resource, all of these should not have been called — yet the plan was still extremely slow.
The culprit: d.Get() in DiffSuppressFunc
Looking at the resource:
‘‘‘golang “source_fields”: { Type: schema.TypeList, Optional: true, Computed: true, Elem: &schema.Schema{ Type: schema.TypeString, ValidateFunc: validation.StringLenBetween(0, 2048), }, DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false }, }, '’’
Each call to d.Get("task") reconstructs the entire task set.
With hundreds of tasks, calling this repeatedly inside DiffSuppressFunc creates an O(n²) performance problem.
Profiling
I added a small HTTP server for Go’s pprof:
‘‘‘diff import ( “net/http” _ “net/http/pprof” )
go func() { log.Println(http.ListenAndServe(“0.0.0.0:6060”, nil)) }() '’’
Running:
'’’ go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile?seconds=30 '''
confirmed that d.Get() inside DiffSuppressFunc was dominating CPU usage.
The fix: singleTask flag (PR #44543)
The PR introduced a simple flag to ensure the expensive operation only happens once per plan evaluation.
Before: repeated d.Get() for each task
‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false } '’’
After: use zsingleTask flag
I added a flag that is calculated once per resource and then the field’s diff suppression simply references this: ‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if !d.Get(“single_task_flag”).(bool) { return false }
return oldValue == "0" && newValue == "1"
} '’’
With this change, execution time dropped dramatically from ~15 minutes to seconds for hundreds of tasks.
Lessons Learned
- Avoid repeated calls to
d.Get()for large nested resources. - Profiling Go providers with
pprofis simple and powerful. - Diff logic in Terraform providers runs even during a
planwith no resource changes. - Using a simple flag (
singleTask) to cache or guard expensive operations can make a huge difference.
Conclusion
Golang makes it easy to profile and optimize Terraform providers. With minimal setup, you can identify expensive calls, implement small optimizations, and see massive performance improvements — as demonstrated in PR #44543.
For further reading: