Occasional blog posts from a random systems engineer

Profiling Golang for Terraform Provider

· Read in about 3 min · (610 Words)

TL;DR: Profiling a Terraform provider revealed a performance bottleneck caused by repeated d.Get("task") calls inside a DiffSuppressFunc. The fix, implemented in PR #44543, introduces a singleTask flag to avoid repeated expensive calls, reducing plan time from ~15 minutes to seconds.


A colleague told me she had issues deploying aws_appflow_flow, a Terraform resource for AWS. Every time she ran it, Terraform indicated that all task blocks were being modified, even though nothing had changed. She shared a truncated Terraform plan, but running a full plan would take ~15 minutes — which was clearly unusual.

I tried to replicate the problem with a minimal resource. Even without a pre-existing resource, terraform plan was extremely slow when there were hundreds of tasks.


Verifying the provider is at fault

Since Terraform runs providers in separate processes, we can confirm the provider is causing the slowdown:

''' PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1687211 matthew 20 0 2625140 338656 219444 S 113.3 5.6 1:02.02 /home/matthew/.terraform.d/plugins/terraform-provider-aws '''

A standard provider resource has four main functions:

  • Read
  • Create
  • Update
  • Delete

Even during a plain plan without a pre-existing resource, all of these should not have been called — yet the plan was still extremely slow.


The culprit: d.Get() in DiffSuppressFunc

Looking at the resource:

‘‘‘golang “source_fields”: { Type: schema.TypeList, Optional: true, Computed: true, Elem: &schema.Schema{ Type: schema.TypeString, ValidateFunc: validation.StringLenBetween(0, 2048), }, DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false }, }, '’’

Each call to d.Get("task") reconstructs the entire task set. With hundreds of tasks, calling this repeatedly inside DiffSuppressFunc creates an O(n²) performance problem.


Profiling

I added a small HTTP server for Go’s pprof:

‘‘‘diff import ( “net/http” _ “net/http/pprof” )

go func() { log.Println(http.ListenAndServe(“0.0.0.0:6060”, nil)) }() '’’

Running:

'’’ go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile?seconds=30 '''

confirmed that d.Get() inside DiffSuppressFunc was dominating CPU usage.


The fix: singleTask flag (PR #44543)

The PR introduced a simple flag to ensure the expensive operation only happens once per plan evaluation.

Before: repeated d.Get() for each task

‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false } '’’

After: use zsingleTask flag

I added a flag that is calculated once per resource and then the field’s diff suppression simply references this: ‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if !d.Get(“single_task_flag”).(bool) { return false }

return oldValue == "0" && newValue == "1"

} '’’

With this change, execution time dropped dramatically from ~15 minutes to seconds for hundreds of tasks.


Lessons Learned

  • Avoid repeated calls to d.Get() for large nested resources.
  • Profiling Go providers with pprof is simple and powerful.
  • Diff logic in Terraform providers runs even during a plan with no resource changes.
  • Using a simple flag (singleTask) to cache or guard expensive operations can make a huge difference.

Conclusion

Golang makes it easy to profile and optimize Terraform providers. With minimal setup, you can identify expensive calls, implement small optimizations, and see massive performance improvements — as demonstrated in PR #44543.

For further reading: