Profiling Golang for Terraform Provider
A colleague said to me that she had issues with deploying aws_appflow_flow, a Terraform resource for AWS.
She told me that every time she ran it, Terraform showed that all the task blocks were being modified, even though they hadn’t changed.
She showed me a truncated Terraform plan (due to terminal history size) but said it would take 15 minutes to run another plan — this was odd!
I took a cut-down version of her resource and started trying to replicate it. After resolving her original issue (which I may touch on in this post), the provider’s performance was abysmal. I realised that even without a resource present, the plan would take around 15 minutes to run!
From my testing, a small number of tasks worked fine, but as soon as we reached around 300–400, the time increased enormously — perhaps implying some operation of O(n²), since the time for ~200 tasks was relatively short, and 400 was closer to 5 minutes.
To verify this, since Terraform executes providers as separate processes, we can be confident that the provider is at fault:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1687211 matthew 20 0 2625140 338656 219444 S 113.3 5.6 1:02.02 /home/matthew/.terraform.d/plugins/terraform-provider-aws
A standard provider resource has four main functions:
- Read
- Create
- Update
- Delete
From testing, I could see that the slowdown occurred even when performing a Terraform “plan” without a pre-existing resource — meaning that none of these should have been called.
Looking at the resource, we can see that task is just a set of objects:
"task": {
Type: schema.TypeSet,
Required: true,
Elem: &schema.Resource{
Schema: map[string]*schema.Schema{
"connector_operator": {
Type: schema.TypeList,
Optional: true,
Elem: &schema.Resource{
Schema: map[string]*schema.Schema{
"amplitude": {
Type: schema.TypeString,
Optional: true,
ValidateDiagFunc: enum.Validate[types.AmplitudeConnectorOperator](),
},
Right off the bat, one thing looks interesting:
"source_fields": {
Type: schema.TypeList,
Optional: true,
Computed: true,
Elem: &schema.Schema{
Type: schema.TypeString,
ValidateFunc: validation.StringLenBetween(0, 2048),
},
DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool {
if v, ok := d.Get("task").(*schema.Set); ok && v.Len() == 1 {
if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 {
if sf, ok := tl["source_fields"].([]any); ok && len(sf) == 1 {
if sf[0] == "" {
return oldValue == "0" && newValue == "1"
}
}
}
}
return false
},
},
Profiling
I decided to try profiling the provider to see what it was doing — not something I’d done in Golang before.
I read the package documentation and an article:
- https://go.dev/doc/diagnostics
- https://medium.com/@jhathnagoda/go-profiling-with-pprof-a-step-by-step-guide-a62323915cb0
Adding a basic HTTP server in main.go:
diff --git a/main.go b/main.go
index 92ee54dd321..444d1dee983 100644
--- a/main.go
+++ b/main.go
@@ -7,6 +7,8 @@ import (
"context"
"flag"
"log"
+ "net/http"
+ _ "net/http/pprof"
"runtime/debug"
"github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server"
@@ -15,6 +17,10 @@ import (
)
func main() {
+ go func() {
+ log.Println(http.ListenAndServe("0.0.0.0:6060", nil))
+ }()
+
debugFlag := flag.Bool("debug", false, "Start provider in debug mode.")
flag.Parse()
As a side note, the underscore import is interesting:
In Go, the underscore (_) is used as a blank identifier, which allows you to import packages solely for their side effects without using them directly in your code.
Not only does the pprof package add profiling functionality, it also updates the HTTP server to add profiling endpoints — just by importing it. That might be some interesting code to read through!
Building Terraform Providers
I’m including this because a couple of years ago, when I looked for this information, I found conflicting advice. After building the binary for a provider, I:
- Copy the binary into a plugins directory — this can be anywhere:
cp ./main ~/.terraform.d/plugins/terraform-provider-aws
- Update
~/.terraformrcwith the following example:
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"
disable_checkpoint = true
provider_installation {
dev_overrides {
"hashicorp/aws" = "/home/ME/.terraform.d/plugins"
}
# For all other providers, install them directly from their origin provider
# registries as normal. If you omit this, Terraform will _only_ use
# the dev_overrides block, and so no other providers will be available.
direct {}
}
- Remove the
.terraformdirectory in the project you want to test - Run
terraform init, which should show something like this:
➜ test-tf terraform-1.5.7 init
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using hashicorp/aws v6.15.0 from the shared cache directory
╷
│ Warning: Provider development overrides are in effect
│
│ The following provider development overrides are set in the CLI configuration:
│ - hashicorp/aws in /home/ME/.terraform.d/plugins
│
│ Skip terraform init when using provider development overrides. It is not necessary and may error unexpectedly.
╵
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Running the Profiling
Running terraform plan and, after waiting for it to get stuck in the loop, I ran:
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile\?seconds\=30
The result was the following:

Aha — we can see immediately that the diff function is indeed part of the call to the most expensive function, and digging into this we can see an anonymous function appflow.resourceFlow.func1, which is likely our diff function, which then calls d.Get().
Clearly, d.Get is very expensive in generating objects, so the only way around it is to avoid calling it repeatedly.
With some managing and storing a custom attribute to calculate the number of tasks with one call to d.Get(), rather than once per task, the execution time decreased from 15 minutes (or much more!) to seconds — which is fantastic.
Conclusion
I’ve touched on profiling previously, but in other languages — and the mixture between setting up the profiler, handling multiple threads, and finding different applications to interpret the profile data is normally quite long. To me, Golang has made this astoundingly easy, and I thoroughly recommend it to anyone who might benefit from profiling, without the fear of getting too deep into profiling tools!