Skip to content

Go back

A First Look at Golang Code Obfuscation

Published:  at  03:22 PM

Translated from  Chinese version  by  Claude Opus 4.6

TOC

This article was originally published on Seebug

In recent years, Golang has surged in popularity. Thanks to its excellent performance, high development efficiency, and cross-platform capabilities, it has been widely adopted in software development. While enjoying the conveniences Golang brings, developers also need to think about how to protect their code and increase the difficulty of reverse engineering.

Due to mechanisms like reflection in Golang, a large amount of information such as file paths and function names must be packed into the binary. This information cannot be stripped, so we consider obfuscating the code to raise the bar for reverse engineering.

This article primarily explores Golang code obfuscation techniques by analyzing the implementation of the burrowers/garble project. Due to the scarcity of related resources, most of the content here is based on source code analysis. If there are any errors, please feel free to point them out in the comments or via email.

Prerequisites

The Compilation Process

Go’s compilation process can be abstracted as:

  1. Lexical analysis: converting a character sequence into a token sequence
  2. Syntax analysis: parsing tokens into an AST
  3. Type checking
  4. Generating intermediate code
  5. Generating machine code

This article will not delve into compiler theory in detail. For further reading, I recommend Go Language Design and Implementation - Compilation Principles and Introduction to the Go compiler.

Let’s explore the compilation process more intuitively from the source code perspective. The implementation of go build is in src/cmd/go/internal/work/build.go. Ignoring the handling of compiler type selection, environment information, etc., we focus only on the core part:

func runBuild(ctx context.Context, cmd *base.Command, args []string) {
	...
  var b Builder
  ...
  pkgs := load.PackagesAndErrors(ctx, args)
  ...
	a := &Action{Mode: "go build"}
	for _, p := range pkgs {
		a.Deps = append(a.Deps, b.AutoAction(ModeBuild, depMode, p))
	}
	...
	b.Do(ctx, a)
}

The Action struct here represents a single action. Each action has a description, an associated package, dependencies (Deps), and other information. All related actions together form an action graph.

// An Action represents a single action in the action graph.
type Action struct {
	Mode     string         // description of action operation
	Package  *load.Package  // the package this action works on
	Deps     []*Action      // actions that must happen before this one
	Func     func(*Builder, context.Context, *Action) error // the action itself (nil = no-op)
	...
}

After creating action a as the “root vertex,” it iterates over the packages specified for compilation, creating an action for each one. This creation process is recursive — during creation, it analyzes each package’s dependencies and creates actions for them as well. For example, the src/cmd/go/internal/work/action.go (b *Builder) CompileAction method:

for _, p1 := range p.Internal.Imports {
	a.Deps = append(a.Deps, b.CompileAction(depMode, depMode, p1))
}

The final a.Deps serves as the “starting points” of the action graph. Once the action graph is constructed, action a is used as the “root” for a depth-first traversal, where dependent actions are sequentially added to the task queue and then executed concurrently via action.Func.

Each type of action has a designated method for its Func, which is the core part of the action. For example:

a := &Action{
  Mode: "build",
  Func: (*Builder).build,
  ...
}

a := &Action{
  Mode: "link",
  Func: (*Builder).link,
  ...
}
...

Digging further, you’ll find that aside from some necessary preprocessing, (*Builder).link calls the BuildToolchain.ld method, and (*Builder).build calls methods like BuildToolchain.symabis, BuildToolchain.gc, BuildToolchain.asm, and BuildToolchain.pack to implement the core functionality. BuildToolchain is of the toolchain interface type, which defines the following methods:

// src/cmd/go/internal/work/exec.go
type toolchain interface {
	// gc runs the compiler in a specific directory on a set of files
	// and returns the name of the generated output file.
	gc(b *Builder, a *Action, archive string, importcfg, embedcfg []byte, symabis string, asmhdr bool, gofiles []string) (ofile string, out []byte, err error)
	// cc runs the toolchain's C compiler in a directory on a C file
	// to produce an output file.
	cc(b *Builder, a *Action, ofile, cfile string) error
	// asm runs the assembler in a specific directory on specific files
	// and returns a list of named output files.
	asm(b *Builder, a *Action, sfiles []string) ([]string, error)
	// symabis scans the symbol ABIs from sfiles and returns the
	// path to the output symbol ABIs file, or "" if none.
	symabis(b *Builder, a *Action, sfiles []string) (string, error)
	// pack runs the archive packer in a specific directory to create
	// an archive from a set of object files.
	// typically it is run in the object directory.
	pack(b *Builder, a *Action, afile string, ofiles []string) error
	// ld runs the linker to create an executable starting at mainpkg.
	ld(b *Builder, root *Action, out, importcfg, mainpkg string) error
	// ldShared runs the linker to create a shared library containing the pkgs built by toplevelactions
	ldShared(b *Builder, root *Action, toplevelactions []*Action, out, importcfg string, allactions []*Action) error

	compiler() string
	linker() string
}

Go implements this interface separately for the gc and gccgo compilers. go build selects between them during program initialization:

func init() {
	switch build.Default.Compiler {
	case "gc", "gccgo":
		buildCompiler{}.Set(build.Default.Compiler)
	}
}

func (c buildCompiler) Set(value string) error {
	switch value {
	case "gc":
		BuildToolchain = gcToolchain{}
	case "gccgo":
		BuildToolchain = gccgoToolchain{}
  ...
}

Here we only look at the gc compiler portion in src/cmd/go/internal/work/gc.go. Taking the gc method as an example:

func (gcToolchain) gc(b *Builder, a *Action, archive string, importcfg, embedcfg []byte, symabis string, asmhdr bool, gofiles []string) (ofile string, output []byte, err error) {
	// ...
	// Assemble arguments
	// ...

	args := []interface{}{cfg.BuildToolexec, base.Tool("compile"), "-o", ofile, "-trimpath", a.trimpath(), gcflags, gcargs, "-D", p.Internal.LocalPrefix}

	// ...

	output, err = b.runOut(a, base.Cwd, nil, args...)
	return ofile, output, err
}

At a high level, the gc method doesn’t actually perform the compilation work itself. Its main role is to assemble a command that invokes the binary located at base.Tool("compile"). These programs can be called Go compilation tools, located in the pkg/tool directory with source code in src/cmd. Similarly, the other methods also call their respective compilation tools to perform the actual compilation work.

Attentive readers may notice an interesting detail: the actual executable in the assembled command is not the compilation tool itself, but cfg.BuildToolexec. Tracing this to its definition reveals it is set by the go build -toolexec flag. The official description is:

-toolexec 'cmd args'
  a program to use to invoke toolchain programs like vet and asm.
  For example, instead of running asm, the go command will run
  'cmd args /path/to/asm <arguments for asm>'.

In other words, -toolexec specifies a program to run the compilation tools. This can be thought of as a hook mechanism — by using this flag to specify our own program, we can intervene in the compilation process by having our program invoke the compilation tools. The garble project analyzed below uses exactly this approach. Here’s a command excerpt from the compilation process (go build -n outputs the executed commands) to help illustrate. For example, if we specify -toolexec=/home/atom/go/bin/garble, then the actual command executed during compilation is:

/home/atom/go/bin/garble /usr/local/go/pkg/tool/linux_amd64/compile -o $WORK/b016/_pkg_.a -trimpath "/usr/local/go/src/sync=>sync;$WORK/b016=>" -p sync -std -buildid FRNt7EHDh77qHujLKnmK/FRNt7EHDh77qHujLKnmK -goversion go1.16.4 -D "" -importcfg $WORK/b016/importcfg -pack -c=4 /usr/local/go/src/sync/cond.go /usr/local/go/src/sync/map.go /usr/local/go/src/sync/mutex.go /usr/local/go/src/sync/once.go /usr/local/go/src/sync/pool.go /usr/local/go/src/sync/poolqueue.go /usr/local/go/src/sync/runtime.go /usr/local/go/src/sync/runtime2.go /usr/local/go/src/sync/rwmutex.go /usr/local/go/src/sync/waitgroup.go

To summarize, go build invokes compilation tools like compile by assembling commands, and we can use the go build -toolexec flag to specify a program that “intervenes” in the compilation process.

go/ast

In Golang, AST types and methods are defined by the go/ast standard library. The garble project analyzed later involves extensive type assertions and type switches with go/ast, so it’s important to have a general understanding of these types. Most types are defined in src/go/ast/ast.go, where the comments are quite detailed. For convenience, I’ve put together a relationship diagram. The branches in the diagram represent inheritance relationships, and all types are based on the Node interface:

go/ast types

This article doesn’t intend to dive deep into ASTs, but I believe a basic understanding should be sufficient for the rest of this article. If you find it difficult to follow, I recommend reading Introduction to Go Syntax Trees — A Journey into Building Your Own Programming Language and Compiler! to fill in any gaps, or using the online tool goast-viewer to visualize ASTs for analysis.

Tool Analysis

Among open-source Go code obfuscation projects, the two with the most stars are burrowers/garble and unixpickle/gobfuscate. The former has more up-to-date features, so this article primarily analyzes garble, version 8edde922ee5189f1d049edb9487e6090dd9d45bd.

Features

At a high level, garble can be divided into two modes:

Fetching and modifying arguments takes up a significant amount of code. For easier analysis, later sections will gloss over these details. Interested readers can consult the official documentation to learn about each argument’s purpose.

Constructing the Target List

The target list is constructed in active mode. Here’s an excerpt of the key code:

// listedPackage contains the 'go list -json -export' fields obtained by the
// root process, shared with all garble sub-processes via a file.
type listedPackage struct {
	Name       string
	ImportPath string
	ForTest    string
	Export     string
	BuildID    string
	Deps       []string
	ImportMap  map[string]string
	Standard   bool

	Dir     string
	GoFiles []string

	// The fields below are not part of 'go list', but are still reused
	// between garble processes. Use "Garble" as a prefix to ensure no
	// collisions with the JSON fields from 'go list'.

	GarbleActionID []byte

	Private bool
}

func setListedPackages(patterns []string) error {
  args := []string{"list", "-json", "-deps", "-export", "-trimpath"}
  args = append(args, cache.BuildFlags...)
  args = append(args, patterns...)
  cmd := exec.Command("go", args...)
  ...
  cache.ListedPackages = make(map[string]*listedPackage)
  for ...{
    var pkg listedPackage
    ...
    cache.ListedPackages[pkg.ImportPath] = &pkg
    ...
  }
}

The core mechanism uses the go list command, where the -deps flag is officially described as:

The -deps flag causes list to iterate over not just the named packages but also all their dependencies. It visits them in a depth-first post-order traversal, so that a package is listed only after all its dependencies. Packages not explicitly listed on the command line will have the DepOnly field set to true.

This traversal is actually quite similar to how go build creates actions, as analyzed earlier. Through this command, garble can obtain all dependency information for the project (including transitive dependencies), iterating over and storing them in cache.ListedPackages. Additionally, it marks whether each dependency package is under the env.GOPRIVATE directory — only files under this directory will be obfuscated (with the exception that some parts of runtime are processed when the -tiny flag is used). You can set the environment variable GOPRIVATE="*" to expand the scope for better obfuscation results. Regarding the scope of obfuscation, garble’s author is also working on improvements: idea: break away from GOPRIVATE? #276.

At this point, the obfuscation targets have been identified. Along with some configuration-saving operations, the active mode’s tasks are essentially complete, and it can then execute the assembled command, leading to launcher mode.

In launcher mode, the three compilation tools — compile/asm/link — are intercepted to “intervene in the compilation process.” The quotes are intentional because garble doesn’t actually perform any compilation work itself. Like go build, it acts as a middleman, modifying source code or the arguments passed to the compilation tools, ultimately relying on these three tools to do the actual compilation. Let’s analyze each one.

compile

The implementation is in the main.go transformCompile function. Its main job is processing Go files and modifying command arguments. The go build -n flag outputs the executed commands, and we can pass this flag when using garble to get a more intuitive view of the compilation process. Here’s an excerpt:

/home/atom/go/bin/garble /usr/local/go/pkg/tool/linux_amd64/compile -o $WORK/b016/_pkg_.a -trimpath "/usr/local/go/src/sync=>sync;$WORK/b016=>" -p sync -std -buildid FRNt7EHDh77qHujLKnmK/FRNt7EHDh77qHujLKnmK -goversion go1.16.4 -D "" -importcfg $WORK/b016/importcfg -pack -c=4 /usr/local/go/src/sync/cond.go /usr/local/go/src/sync/map.go /usr/local/go/src/sync/mutex.go /usr/local/go/src/sync/once.go /usr/local/go/src/sync/pool.go /usr/local/go/src/sync/poolqueue.go /usr/local/go/src/sync/runtime.go /usr/local/go/src/sync/runtime2.go /usr/local/go/src/sync/rwmutex.go /usr/local/go/src/sync/waitgroup.go

This command uses the compile tool to compile files like cond.go into intermediate code. When garble detects that the current compilation tool is compile, it “intercepts” it and performs obfuscation and other tasks before the tool runs. Let’s analyze the key parts.

First, the input Go files are parsed into ASTs:

var files []*ast.File
for _, path := range paths {
  file, err := parser.ParseFile(fset, path, nil, parser.ParseComments)
  if err != nil {
    return nil, err
  }
  files = append(files, file)
}

Then type checking is performed — this is also a step in normal compilation. If type checking fails, it means the files cannot be compiled successfully, and the program exits.

Since the type names of nodes involved in reflection (reflect.ValueOf / reflect.TypeOf) may be used in subsequent logic, their names cannot be obfuscated:

if fnType.Pkg().Path() == "reflect" && (fnType.Name() == "TypeOf" || fnType.Name() == "ValueOf") {
  for _, arg := range call.Args {
    argType := tf.info.TypeOf(arg)
    tf.recordIgnore(argType, tf.pkg.Path())
  }
}

This introduces an important map that persists throughout each compile lifecycle, recording all objects that cannot be obfuscated: types used in reflection arguments, identifiers used in constant expressions and go:linkname, and variables and types imported from unobfuscated packages:

// ignoreObjects records all the objects we cannot obfuscate. An object
// is any named entity, such as a declared variable or type.
//
// So far, this map records:
//
//  * Types which are used for reflection; see recordReflectArgs.
//  * Identifiers used in constant expressions; see RecordUsedAsConstants.
//  * Identifiers used in go:linkname directives; see handleDirectives.
//  * Types or variables from external packages which were not
//    obfuscated, for caching reasons; see transformGo.
ignoreObjects map[types.Object]bool

Let’s use the case of identifying “identifiers used in constant expressions” with the ast.GenDecl type as an example:

// RecordUsedAsConstants records identifieres used in constant expressions.
func RecordUsedAsConstants(node ast.Node, info *types.Info, ignoreObj map[types.Object]bool) {
	visit := func(node ast.Node) bool {
		ident, ok := node.(*ast.Ident)
		if !ok {
			return true
		}

		// Only record *types.Const objects.
		// Other objects, such as builtins or type names,
		// must not be recorded as they would be false positives.
		obj := info.ObjectOf(ident)
		if _, ok := obj.(*types.Const); ok {
			ignoreObj[obj] = true
		}

		return true
	}

	switch x := node.(type) {
	...
	// in a const declaration all values must be constant representable
	case *ast.GenDecl:
		if x.Tok != token.CONST {
			break
		}
		for _, spec := range x.Specs {
			spec := spec.(*ast.ValueSpec)

			for _, val := range spec.Values {
				ast.Inspect(val, visit)
			}
		}
	}
}

Suppose the code to be obfuscated is:

package obfuscate

const (
	H2 string = "a"
	H4 string = "a" + H2
	H3 int    = 123
	H5 string = "a"
)

We can see that the identifier used in a constant expression is H2. Let’s walk through the determination process in the code. First, the entire const block matches the ast.GenDecl type. Then it iterates over its Specs (each definition), and for each spec, iterates over its Values (the expressions on the right side of the equals sign). It then uses ast.Inspect() to traverse each element in val, executing visit(). If an element node’s type is ast.Ident and the object it points to is of type types.Const, that object is recorded in tf.recordIgnore. It’s a bit convoluted, so let’s print the AST:

ignoreObjects-example

We can clearly see that H2 in H4 string = "a" + H2 fully meets the criteria and should be recorded in tf.recordIgnore. The upcoming analysis will involve many type assertions and type switches, which may look complex but are fundamentally similar to the process we just analyzed — we just need to write a demo and print the AST to understand it easily.

Back to main.go transformCompile. Next, the current package name is obfuscated and written into the command arguments and source files, provided the file is neither in the main package nor outside the env.GOPRIVATE directory. The next step processes comments and source code. There’s special handling for runtime and CGO here, which we can safely ignore, and look directly at the handling for regular Go code:

// transformGo obfuscates the provided Go syntax file.
func (tf *transformer) transformGo(file *ast.File) *ast.File {
	if opts.GarbleLiterals {
		file = literals.Obfuscate(file, tf.info, fset, tf.ignoreObjects)
	}

	pre := func(cursor *astutil.Cursor) bool {...}
	post := func(cursor *astutil.Cursor) bool {...}

	return astutil.Apply(file, pre, post).(*ast.File)
}

First it obfuscates literals, then recursively processes each node of the AST, and finally returns the processed AST. These parts share a similar approach, all using astutil.Apply(file, pre, post) for recursive AST processing, where pre and post functions are called before and after visiting child nodes, respectively. Much of this code consists of rather tedious filtering operations, so here’s just a brief analysis:

At this point, the source code obfuscation is complete. All that remains is to write the new code to a temporary directory and splice the address into the command to replace the original file paths. A new compile command is now ready, and executing it compiles the obfuscated code using the compilation tools.

asm

This is relatively simple and only applies to private packages. The core operations are:

This is also relatively simple. The core operations are:

Obfuscation Results

Let’s write a small piece of code and compile it twice: once with go build . and once with go env -w GOPRIVATE="*" && garble -literals build .. As you can see, the simple code on the left becomes much harder to read after obfuscation:

obfuscated-show-1

obfuscated-show-2

Let’s also load them into IDA and parse with go_parser. In the unobfuscated file, information like file names and function names is clearly visible, and the code logic is fairly clean:

obfuscated-show-ida-1

After obfuscation, function names and other information are replaced with garbled text. Moreover, since strings have been replaced with anonymous functions, the code logic is much more confusing:

obfuscated-show-ida-2

When dealing with larger projects with more dependencies, the chaos introduced by code obfuscation becomes even more severe. Since third-party dependency packages are also obfuscated, reverse engineers can no longer guess the code logic based on imported third-party packages.

Conclusion

This article explored the general workflow of how Golang’s compilation process invokes the toolchain, as well as the burrowers/garble project, from a source code implementation perspective. We learned how to use go/ast to perform code obfuscation. Through obfuscation, the code’s logical structure and the information retained in the binary become much harder to read, significantly increasing the difficulty of reverse engineering.



Previous Post
Hijacking Golang Compilation