/home/rook1e

AI Coding Notes 1

Sat, 03 Jan 2026 08:08:49 GMT

Outsource the grunt work of CRUD to a Coding Agent, freeing up your energy for discussing solutions, breaking down tasks, reviewing code, and tackling hard problems. Enjoy the pure joy of creation.
The loom of software development has arrived. AI Agents won't fully replace programmers, and probably won't reduce the number of jobs in the long run either, but they will reshape how the entire industry operates. We may see software engineering practices, programming languages, and system architectures better suited for AI Agents, driving digitalization across all aspects of society at greater scale, lower cost, and higher efficiency.
Burnout comes easier. AI thinks and generates far faster than human eyes can read and minds can process. Auditing every line of AI-generated code and context-switching between multiple task sessions is mentally exhausting.
Claude Code + Opus/Sonnet 4.5 is still SOTA, but the moat isn't that deep.
GLM 4.6 is an excellent workhorse model: fast, cheap, and generates at about 80% quality — perfect as an executor after you've designed the approach/plan with a SOTA model.
OpenCode is a solid open-source alternative to Claude Code. OpenCode has many strengths, such as built-in LSP, easy model switching, and a built-in HTTP API. But the downsides are equally significant — it's not stable enough, and the system prompts still need polish. Promising for the future.
Stick with mainstream tech stacks.
Don't buy annual subscriptions for AI products.

Indie Hacking Memo

Sun, 03 Aug 2025 14:11:13 GMT

Over a year into indie hacking, I haven't launched a successful product, but I've learned a lot.

Use Boring Tech Stacks

Most indie hackers I've observed come from a programmer background, and "the best tech stack for indie hacking" is a recurring topic in various communities.

In this circle, the cool kids on the block use Next.js, like Next.js + Prisma + Shadcn UI + NextAuth + Supabase. They'll share how smooth the DX is and how quickly they can build a beautiful UI, but beneath the surface:

These so-called "full-stack frameworks" are built for the frontend, with extremely limited backend capabilities.
These tech stacks iterate very quickly, unnecessarily increasing learning and migration costs.
A huge number of JS dependencies are like ticking time bombs.
Serverless architecture is restrictive; even scheduled tasks might require workarounds (I know Vercel has this feature, but even Pro accounts have quantity limits).
The surprise from Vercel bills.

However, users don't care about your code. Simple tech stacks can build very successful projects:

Pieter Levels likes to put all his frontend and backend code into one huge index.php, but his projects print money.
Levels.fyi uses Google Sheets as a backend to serve millions of users.

So, if your technical background leans towards the backend, you don't have to use Shadcn UI. Just stick to a backend-first approach and use boring technology: use the backend tech stack you're most comfortable with, use a templating engine for server-side rendering, and deploy to a VPS (remember to put it behind Cloudflare CDN).

After all, writing code is just the first step.

MVP Should Only Include One Core Feature

MVP, as the name suggests, must be minimum, completed with the least amount of effort:

Solve only one pain point at a time.
Focus only on core features; it doesn't even necessarily require writing code.
The UI can be rough, simple and elegant is fine.
Don't design caching, message queues, etc., and don't use K8s.

Performance, stability, and a refined UI are sweet problems for the future. Refactoring the codebase can wait until your MRR meets expectations.

Charge From Day One

Pricing strategy is also a long-debated, subjective topic.

Offering a free trial to lower the barrier to entry seems reasonable, but in practice, it's a different story:

It attracts customers who only want freebies.
It adds an extra conversion step: traffic -> trial users -> paying users.
People don't value free things.
Suggestions from free users may be less valuable.

Instead of offering a free trial, charge from day one, but offer a money-back guarantee if the user is not satisfied within XX days:

It conveys confidence to the user ("This product will definitely solve your pain point") and provides a safety net ("Even if you don't like it, you can get an unconditional refund within 14 days").
It pre-filters high-risk users through payment channels like Stripe.
If no one pays, it indicates a false demand or that you haven't found your niche yet.
Genuinely ask for user feedback during refunds; this feedback is more valuable.

Regarding pricing, I prefer Tibo's perspective:

A low price DOESN’T compensate for delivering LOW value.

I price my SaaS in a range: $29-$99, and decide what to build, and how to build it based on that.

Fail Fast, Grow Fast

One of the advantages of indie hacking is the low cost of experimentation. And indie hacking itself has a very high failure rate.

Even Pieter Levels only made money from 4 out of his first 70 projects (https://x.com/levelsio/status/1457315274466594817).

Be a Salesperson, a Founder, Not Just a Developer

Writing code is the simplest part of indie hacking because the input-output is stable and predictable, especially with AI boosting efficiency now. Beyond coding, how to build connections and trust with customers and eventually get them to pay is a difficult question with no standard answer.

After launching the product, you need to day after day:

Reply to customer emails and DMs.
Manage personal and product social media, newsletters, etc.
Optimize cold reach content strategies and discover new potential user groups.
Do SEO.

These are all things that might not generate significant revenue for months but are essential, and they are also things that most tech people are not good at. Not to mention subsequent tasks like company registration, taxes, and data compliance.

So don't limit yourself. Not only should you maintain the code as a developer, but you should also manage your business as an entrepreneur.

RawWeb Updates: SimHash and Meilisearch

Mon, 14 Apr 2025 03:29:48 GMT

Over the past two weeks, I've made two significant changes to RawWeb:

Introduced SimHash for document deduplication.
Migrated from Elasticsearch to Meilisearch to lower operational costs.

The implementation went smoothly. I successfully migrated and cleaned up 56k similar documents, but I also ran into some memory and performance challenges with Meilisearch.

Document Deduplication

Previously, I simply used URLs as the unique constraint, which often led to discovering a large number of duplicate documents during maintenance.

Common reasons for duplication:

Non-standardized URLs
- Inconsistent case sensitivity
- Inconsistent trailing slashes
- Useless query parameters
- Different number or order of query parameters
A single blog having multiple domains
Blogs changing their path structure, e.g., from /blog/1 to /posts/1

URL-based deduplication is simple to implement but has significant limitations and can't cover all edge cases. To solve the duplication problem fundamentally, we need to work with the content itself.

SimHash

SimHash is a locality-sensitive hashing algorithm. It reflects the features of different parts of a text and allows for efficient similarity assessment using the Hamming distance.

For example, consider the following two strings. Their SimHash values differ by only a few bits, while their md5 hashes are completely different.

data	simhash	md5
i think a is the best	10000110110000000000000000101001	846ff6bebe901ead008e9c0e01a87470
i think b is the best	10000010110000000000100001101010	ba1d2dc00d0a23dbb2001d570f03fb19

Compared to other text similarity calculation methods, Simhash's advantages are its lightweight computation, storage, and comparison. It also has a strong endorsement: Google's web crawler uses SimHash to identify duplicate web pages.

Calculating Hash and Hamming Distance

Implementing SimHash is simple. I relied entirely on Claude Sonnet 3.5 to implement the basic SimHash and Hamming distance calculations, then used test cases to evaluate the results.

Key points:

Use a 64-bit hash value. This is a balanced length.
Use the fnv hash algorithm. It's simple, efficient, and distributes well.
Split the hash value into four 16-bit segments for database storage. This provides more flexibility when comparing Hamming distances.

package simhash

import (
	"backend/core/pkgs/simhash/tokenizer"
	"hash/fnv"
)

const (
	// HASH_BITS represents the number of bits in a SimHash value
	HASH_BITS = 64
)

// CalculateSimHash calculates the SimHash value of a text
// SimHash is a hashing algorithm used for calculating text similarity, effectively detecting the degree of similarity between texts
// Algorithm steps:
// 1. Tokenize the text
// 2. Calculate the hash value for each token
// 3. Merge all token hash values into a feature vector
// 4. Derive the final SimHash value from the feature vector
func CalculateSimHash(text string) uint64 {
	if text == "" {
		return 0
	}

	words := tokenizer.Tokenize(text)

	weights := make([]int, HASH_BITS)

	// Calculate hash for each token and update weights
	for _, word := range words {
		hash := getHash(word)
		// Update weights based on each bit position
		for i := 0; i < HASH_BITS; i++ {
			if (hash & (1 << uint(i))) != 0 {
				weights[i]++
			} else {
				weights[i]--
			}
		}
	}

	// Generate the final simhash value
	var simhash uint64
	for i := 0; i < HASH_BITS; i++ {
		if weights[i] > 0 {
			simhash |= (1 << uint(i))
		}
	}

	return simhash
}

// getHash calculates the hash value of a string
func getHash(s string) uint64 {
	h := fnv.New64a()
	h.Write([]byte(s))
	return h.Sum64()
}

// HammingDistance calculates the Hamming distance between two simhash values
// Hamming distance is the number of different characters at corresponding positions in two equal-length strings
// In SimHash, a smaller Hamming distance indicates higher similarity between two texts
func HammingDistance(hash1, hash2 uint64) int {
	xor := hash1 ^ hash2
	distance := 0

	// Count the number of different bits (Brian Kernighan algorithm, better performance)
	for xor != 0 {
		distance++
		xor &= xor - 1
	}

	return distance
}

// SplitSimHash splits a simhash value into four 16-bit parts
func SplitSimHash(hash uint64) [4]uint16 {
	return [4]uint16{
		uint16((hash >> 48) & 0xFFFF),
		uint16((hash >> 32) & 0xFFFF),
		uint16((hash >> 16) & 0xFFFF),
		uint16(hash & 0xFFFF),
	}
}

// MergeSimHash merges four 16-bit parts into a single simhash value
func MergeSimHash(parts [4]uint16) uint64 {
	return (uint64(parts[0]) << 48) | (uint64(parts[1]) << 32) | (uint64(parts[2]) << 16) | uint64(parts[3])
}

// IsSimilar determines whether two texts are similar
// threshold represents the similarity threshold, typically a value between 3-10
// Returns true if the two texts are similar, false if not
func IsSimilar(hash1, hash2 uint64, threshold int) bool {
	return HammingDistance(hash1, hash2) <= threshold
}

# pgsql

CREATE OR REPLACE FUNCTION hamming_distance(
    simhash1_1 SMALLINT,
    simhash1_2 SMALLINT,
    simhash1_3 SMALLINT,
    simhash1_4 SMALLINT,
    simhash2_1 SMALLINT,
    simhash2_2 SMALLINT,
    simhash2_3 SMALLINT,
    simhash2_4 SMALLINT
) RETURNS INTEGER
	PARALLEL SAFE
AS $$
BEGIN
    RETURN bit_count((simhash1_1 # simhash2_1)::BIT(16)) +
           bit_count((simhash1_2 # simhash2_2)::BIT(16)) +
           bit_count((simhash1_3 # simhash2_3)::BIT(16)) +
           bit_count((simhash1_4 # simhash2_4)::BIT(16));
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Further Optimizing Tokenization

The tokens generated by the tokenizer are the basic units for calculating SimHash. The higher the quality of tokenization, the more accurately SimHash can reflect the document's features.

I tested several types of tokenizers:

Based on language features
- Charabia works quite well, maintained by the Meilisearch team.
- gse seems sufficient for Chinese tokenization based on test cases, but the overall experience isn't as good as Charabia.
Whitespace: Suitable for languages like English that use spaces as separators, but requires additional implementation for normalization, stopword removal, etc.
Unicode: Intended as a tokenizer for CJK languages, but the tokenization quality was not ideal.
N-gram: Considered as a general-purpose tokenizer, but the quality fluctuates significantly.

Overall, Charabia produced the best results. However, it's a Rust project, while the RawWeb backend stack is Go. This requires using CGO to call Charabia (calling via Go's exec package is at least 10x slower than calling via CGO), which introduces cross-compilation complexity.

I'm not familiar with Rust or CGO, so most of the following code was generated by Claude Sonnet 3.5/3.7, with some adjustments based on the actual situation.

First, expose Charabia's Tokenize method with a simple Rust function:

use charabia::Tokenize;
use libc::{c_char};
use serde_json::json;
use std::ffi::{CStr, CString};
use std::ptr;

fn tokenize_string(input: &str) -> Vec<String> {
    input
        .tokenize()
        .filter(|token| token.is_word())
        .map(|token| token.lemma().to_string().trim().to_string())
        .filter(|token| !token.is_empty())
        .collect()
}

/// Tokenizes the input string and returns a JSON string containing the tokens
///
/// # Safety
///
/// This function is unsafe because it deals with raw pointers
#[no_mangle]
pub unsafe extern "C" fn tokenize(input: *const c_char) -> *mut c_char {
	// C stuff ...

    // Tokenize the input
    let tokens = tokenize_string(input_str);

	// C stuff ...

Cargo.toml:

# ...

[lib]
name = "charabia_rs"
crate-type = ["cdylib", "staticlib"]

[dependencies]
charabia = { version = "0.9.3", default-features = false, features = [
    "chinese-segmentation", # disable chinese-normalization (https://github.com/meilisearch/charabia/issues/331)
    #"german-segmentation",
    "japanese",
] }

# ...

For local testing, just run cargo build --release. But cross-platform compilation is much more complicated. Fortunately, the Zig toolchain greatly simplifies C cross-compilation, eliminating the need for musl libc!

Install Zig and zigbuild, then compile:

cargo zigbuild --release --target aarch64-unknown-linux-gnu

After compiling the Rust code into a .so file, call its exported method in RawWeb. Need to configure linking to the correct .so during cross-compilation and loading the .so file from ./lib when the application starts online:

// #cgo linux,amd64 LDFLAGS: -L${SRCDIR}/charabia-rs/target/x86_64-unknown-linux-gnu/release -lcharabia_rs -Wl,-rpath,./lib
// #cgo linux,arm64 LDFLAGS: -L${SRCDIR}/charabia-rs/target/aarch64-unknown-linux-gnu/release -lcharabia_rs -Wl,-rpath,./lib
// #cgo LDFLAGS: -L${SRCDIR}/charabia-rs/target/release -lcharabia_rs
// #include <stdlib.h>
// #include <stdint.h>
//
// typedef void* charabia_result_t;
//
// extern char* tokenize(const char* input);
// extern void free_tokenize_result(char* ptr);
import "C"

// Tokenize tokenizes the given text using the Rust implementation via cgo
func Tokenize(text string) []string {
	// C stuff ...

	cResult := C.tokenize(cText)

	// C stuff ...

Also using Zig for cross-compiling the Go code:

CGO_ENABLED=1 GOOS=linux GOARCH=arm64 CC="zig cc -target aarch64-linux" CXX="zig c++ -target aarch64-linux" go build ...

Finally, during deployment, place libcharabia_rs.so into the ./lib/ directory so it can be loaded.

Filtering Similar Content

According to resources, a Hamming distance of less than 3 for a 64-bit SimHash can generally identify similar content.

However, due to limitations in tokenization quality, content length, etc., I observed false positives even with a Hamming distance threshold of 1 in my test cases. Additionally, my server has low specs. Calculating and comparing the Hamming distance for one document against 700k records takes about 1.2 seconds. At this rate, a full comparison would take 10 days, which is unacceptable.

Therefore, for now, I only filtered documents with identical hash values. This avoids calculating Hamming distance and allows the search to use database indexes, making it very fast. Ultimately, I cleaned up 56,000 similar documents. This number was much higher than I expected. Given that I encountered SimHash collisions during testing, I reasonably suspect there might be quite a few false positives among them. Further optimization of tokenization and token weighting is needed.

References

Migrating to Meilisearch

The full-text search engine I previously used was Elasticsearch. It's feature-rich and battle-tested in countless production environments.

As RawWeb's features and data volume stabilized, I realized I wasn't using most of Elasticsearch's capabilities, yet I still had to bear the extra operational costs (actually, it had been very stable since deployment, but if such a behemoth encountered problems one day, I wasn't sure if I had the ability or energy to fix them). Also, the elasticsearch-go client is very difficult to use.

Meilisearch is a more lightweight alternative, with most features working out of the box. The migration process was very smooth, although a few unexpected issues popped up.

Multilingual Documents

Referencing W3Techs' statistics on content languages on the internet, RawWeb specifically tags content in English, Chinese, German, French, Spanish, Russian, and Japanese to enable filtering search results by language.

In Elasticsearch, I used separate fields like content_en, content_zh, etc., with dedicated tokenizers. Theoretically, this step could be simplified in Meilisearch because it can automatically detect content language. However, I ended up splitting the content into multiple indexes because:

RawWeb's existing natural language detection module can automatically sample based on document title and content length, switching precision modes automatically, which is more efficient than full-text detection.
To filter search results by language, I need to add a lang field in Meilisearch to mark the document's language. So, besides Meilisearch, I still need to perform natural language detection once.

Splitting different languages into separate indexes also aligns with Meilisearch's official recommendations.

Issue 1: High Storage Space Usage

The PostgreSQL database size is about 2.4GB. After importing documents, the Meilisearch database size grew to about 23GB (with searchableAttributes and filterableAttributes configured correctly).

Initially, I didn't realize the implication about disk usage mentioned in the documentation, which led to the hard drive filling up. Fortunately, hard drive space is the cheapest cloud resource, so expanding it wasn't expensive.

Besides this, there's another potential issue: Meilisearch doesn't release disk space after deleting documents (docs). Reclaiming space might require using snapshots (related discussion).

Issue 2: Memory Usage Limit Ineffective

Meilisearch is deployed on a low-spec server with 2 vCPUs and 4GB RAM. Since Elasticsearch previously ran fine on a server with the same configuration, I assumed Meilisearch would be smooth sailing too. After indexing all documents, I went to sleep peacefully (I later realized they were probably just queued in Meilisearch's task queue at that time).

I woke up to find the server's CPU maxed out and disk read speeds exceeding 1GB/s, causing the entire system to freeze. After a forced reboot, I checked the system logs and the only abnormality found was an OOM error from the Meilisearch container. I then used MEILI_MAX_INDEXING_MEMORY to limit indexing memory usage to 2GB. However, the next day, it experienced OOM and maxed-out CPU again.

Looking through the documentation, I found the MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE parameter. Although experimental, I tried it and found it worked really well. CPU and disk I/O were no longer aggressive.

Issue 3: Very Slow Document Deletion Leading to Task Backlog

To clean up documents in Meilisearch that were deleted from the database, the operation for each batch during synchronization was:

Delete documents in Meilisearch within the range id >= ? AND id <= ?.
Add new documents.

After deploying, data synchronization ran into problems. Investigation revealed that Meilisearch had accumulated 133k tasks:

//GET /tasks?statuses=enqueued,processing

{"results":[{"uid":16354,"batchUid":null,"indexUid":"items_es","status":"enqueued","type":"documentAdditionOrUpdate","canceledBy":null,"details":{"receivedDocuments":7,"indexedDocuments":null},"error":null,"duration":null,"enqueuedAt":"2025-04-12T04:59:27.183657254Z","startedAt":null,"finishedAt":null},...],"total":13385,"limit":20,"from":16354,"next":16334}

Observing task execution, I found that document deletion operations were extremely slow. A range deletion involving up to 1,000 documents took nearly 20 minutes. Furthermore, because deletion and addition operations were interspersed during synchronization, Meilisearch couldn't automatically merge adjacent tasks.

I couldn't find similar issues online; this seems to be a problem unique to my setup. I suspect it's because of the limited memory and the fact that Hetzner's expanded storage performance is only 1/10th of the original disk performance (benchmark). I'll retest this once I get sponsorship funds to upgrade to a higher-spec server.

Conclusion

The main goals of this update have been achieved. I will continue debugging and optimizing.

Additionally, the implementation process exposed some operational issues, such as excessive downtime and lack of server resource alerts (I removed New Relic monitoring during the last refactor). These will be targets for the next round of optimization.

The first three iterations of RawWeb.org's tech stack

Sat, 08 Feb 2025 04:25:26 GMT

RawWeb.org is a search engine project I launched in 2024-08. The initial goal was to help more people discover personal digital gardens that are often overlooked by mainstream search engines. I also wanted to explore some interesting tech stacks through practical implementation.

Currently, it has indexed 17k sites and 615k articles. Feel free to submit your favorite independent blogs.

This article only represents my personal experience and views.

Middleware

PostgreSQL is used as the database, instead of SQLite, because I might need Pg's rich plugins in the future. Redis is used for caching. RabbitMQ is used as the message queue.

Additionally, a search engine requires crawler and full-text search capabilities.

Elasticsearch is used for full-text search. The reason for not implementing inverted indexing myself or using lightweight solutions like Meilisearch is that ES has better Chinese tokenizers.

To reduce potential risks and development complexity, the crawler only obtains data from websites' RSS feeds. Therefore, the crawler is simply implemented as an HTTP requester and RSS parser.

Keeping things simple, all the above components are deployed as single-node, without any optimization tricks (I don't know how).

Multi-language Content Support

This is a search engine capable of indexing content in multiple languages, where tokenization quality determines search result quality.

To configure specialized tokenizers for different languages, multiple fields are set up in Elasticsearch, such as content-en, content-zh, to store content in different languages.

This involves:

Natural language detection
Routing content to dedicated fields with specialized tokenizers

First, clean the raw content:

Parse HTML, remove useless tags like style, script;
Remove code, URLs, and other content as much as possible to avoid affecting language detection accuracy;
Remove HTML and XML tags to get plain text;
Remove excess whitespace characters.

Then identify the content's language. There are two approaches:

The first is lingua, which has implementations in Python, Go, and other languages. It has excellent performance and accuracy, and allows selective loading of language models. The downside is that it increases the executable size by about 100MB.

The second is Elasticsearch's built-in lang_ident_model_1, which requires creating a pipeline to call. In testing, the accuracy was good but performance was an issue. With the same data, it was 4 times slower than the Python version of lingua running on lower-spec hardware. I suspect this is because lang_ident_model_1 needs to test all supported languages, while lingua only needs to load a few language models.

Considering performance and flexibility, lingua was ultimately chosen. Lingua has high and low accuracy modes, with low accuracy offering about 2x performance improvement without significant accuracy loss for inputs over 120 characters. So currently, a hybrid approach of high and low accuracy detection is used, with input being the title and content sampling. In actual testing, detecting one article only takes 100μs.

Once the content's language is determined, the best tokenizer can be set for it. Based on W3Techs' estimated internet content distribution, separate tokenizers are set for the most mainstream languages - Chinese, English, Spanish, Russian, German, French, and Japanese, while other languages use the default tokenizer.

Backend

The crawler is a simple Go program. The main backend went through three iterations with Django, Nest.js, and Go.

v1 - Django

Tech stack:

Django v5
django-ninja as API endpoint
huey as task queue, though I only used it for managing scheduled tasks
uv as package manager

I had been recommended Django multiple times before, and I wanted to learn a batteries-included framework through this project. Considering it has Django Admin, I used it for prototype development.

Django's documentation quality is among the best I've seen, making it very pleasant to read. Since the project is frontend-backend separated and doesn't use built-in plugins like auth and view, Django's "batteries" didn't reduce my workload, and the overall development experience wasn't particularly exciting.

Considering the framework's stability and community prosperity, I would probably like Django if I were a dynamic language enthusiast. Unfortunately, I've been deeply influenced by Go's philosophy, and Django's level of "magic" exceeded my comfort zone, like using field name + double underscore + method name to build query conditions. BTW, It's hard to imagine I once wanted to learn RoR.

Finally, after development was complete, even with all built-in plugins disabled, async maximally utilized, and Uvicorn in use, the load test results were far below my expectations. So I started looking into rebuilding with Node.js.

v2 - Nest.js

Tech stack:

TypeScript
Components wrapped by Nest
Prisma as ORM

Since the main latency in a search request comes from waiting for Elasticsearch, with the web service mainly acting as a request forwarder, this I/O-intensive scenario is very suitable for Node.js.

Popular frameworks include Nest.js and Adonis.js, and I ultimately chose the more popular Nest. Don't ask why not Express or Fastify - they're not full-fledged frameworks.

Nest seems more like a dependency injector plus multiple officially maintained components (modules). Although it includes common components like cache and message queue, from my observation, most are wrappers around third-party libraries, so Nest users don't need to piece things together themselves. However, even with official wrappers, I was still unfortunately affected by underlying library changes (cache-manager@6).

For developers with Java/Spring background, Nest might be great. But for me, Nest's various decorators, pipes, and other concepts created a heavy mental burden. When switching back to a Nest project after two or three months, I needed to review the documentation to confirm their usage.

Additionally, while the documentation appears comprehensive, its quality is far below Django's. For example, I couldn't understand the module lifecycle part from the documentation alone, and finally had to rely on an article analyzing the source code to roughly figure it out.

Exploring new technology is always good, but choosing Nest for this project was a mistake because the project's complexity was even less than the complexity Nest introduced.

v3 - Go

Tech stack:

Echo as API endpoint
GORM Gen as ORM

Based on the previous two experiences, I've temporarily demystified batteries-included frameworks. After coming full circle, I found my true love was still the original - Go.

I previously had two main complaints about web development with Go:

The syntax is too basic, making CRUD uncomfortable
Lack of good ORM or SQL builder

Fortunately, both issues have been largely resolved.

Thanks to the development of LLM and AI IDE, Go's basic syntax is no longer a disadvantage but has become somewhat of an advantage (to me), as LLMs can very easily understand the code, and AI completion is very accurate.

Regarding ORM, a quite popular opinion in the Go community is that "ORM is harmful," preferring approaches like sqlc generating Go code from SQL, or sqlx directly using SQL. ORM indeed sometimes makes simple things complex - for example, Prisma only recently started supporting true JOIN. However, a well-designed, type-safe ORM can greatly improve CRUD experience.

GORM Gen made me fall in love with GORM again. Through code generation, it not only achieves type safety but, more importantly, can generate Go code from custom SQL, meaning I have almost full SQL capabilities.

Thus, this code refactoring with Go was very enjoyable, except for the disastrous official Elasticsearch SDK.

Go also reduced the infra burden, no longer needing multi-stage builds in Dockerfile (without CI server or GitHub Action, the previous two tech stacks required building Docker images after pushing code to production environment).

Keeping things simple, I also removed RabbitMQ, instead using a database table to store tasks and providing an API for the crawler to sync data. Since Redis might be simplified away in the future, I didn't use Redis as message queue here.

Alternatives

There are some interesting options I passed on but might try in the future:

C# & .Net. I've heard C# is very enjoyable to write with, and .Net is a great enterprise framework. But I'm not interested in OOP, and I'm concerned about whether Microsoft might make risky moves in .Net open source work again (Hot Reload removed from dotnet watch - Why?).
Elixir & Phoenix. Elixir's features seem very suitable for high-concurrency scenarios, and the development experience is very good. But I currently don't have the energy to learn functional programming.

<details> <summary>Easter egg</summary> Are you looking for Rust? Haha, I'll never learn it for web development. </details>

Frontend

The frontend uses my favorite SvelteKit, compiled into hybrid SSG and SPA pages. UI components are from shadcn-svelte.

React is good, but I equally dislike most things in its ecosystem, especially Next.js. I don't understand why the community keeps getting "richer" while making developers more miserable. Svelte is currently my painkiller, and I recommend you try it too.

Infrastructure

Avoiding vendor lock-in, only using generic infra technologies:

Backend services are orchestrated with Docker Compose, compiled and deployed to VPS by a simple Shell script
Main backend services use Hetzner's Arm VPS, currently two Debian instances with 2 vCPU + 4G RAM (Great value for money, welcome to use my aff to register, you'll get €20 credit)
Crawler service is on another budget VPS
Web pages, CDN, DNS are on Cloudflare
Monitoring service uses self-hosted Uptime Kuma, and some services are connected to New Relic

Future plans include setting up a Prometheus + Grafana observability system to visualize metrics like search volume and new indexing volume.

Setting Up a Lightweight Remote Linux Dev Environment (Fedora 38)

Fri, 14 Jul 2023 09:02:18 GMT

First, pick a distro. It needs comprehensive and up-to-date package repositories, but since I won't necessarily use it every day, I'd prefer to avoid rolling releases. This time I went with Fedora Workstation 38, then trimmed down GNOME and other services to achieve:

Idle memory usage under 300MB
Desktop environment available on demand

Basic Setup

Enable sshd:

sudo systemctl start sshd
sudo systemctl enable sshd

Rename home directory folders to English:

export LANG=en_US
xdg-user-dirs-gtk-update

After the conversion, set the system language back to zh_CN, restart, and when prompted at login, choose not to convert and not to ask again.

Remove unnecessary third-party repositories:

sudo rm /etc/yum.repos.d/_copr\:copr.fedorainfracloud.org\:phracek\:PyCharm.repo
sudo rm /etc/yum.repos.d/rpmfusion-nonfree-steam.repo

Install Go + nvim development tools:

sudo dnf install vim neovim go gcc-c++

Remove Software Update Services

Right after booting, memory usage was already at 1.5GB, with packagekitd taking up a large chunk.

PackageKit is a generic abstraction layer over package managers like dnf and apt. But since I only use dnf and don't need "advanced" tools like gnome-software -- not to mention its heavy resource consumption -- I removed both components:

sudo dnf remove gnome-software PackageKit

This saved 600MB of memory.

Disable GNOME Desktop Environment

Most of the time I connect via SSH, and only use RDP remote desktop on rare occasions. So I disabled gdm (GNOME Display Manager) to save resources (reference):

sudo systemctl stop gdm
sudo systemctl disable gdm

Start it manually when needed:

sudo systemctl start gdm

This saved another 600MB+ of memory.

Remote Desktop

When you need the desktop environment, start gdm first, then connect remotely.

Auto-login + Unlocking Remote Login Password

In this version, remote desktop is built into GNOME and requires an active user session, so you'd need to log in via VNC first.

For convenience, you can set up GNOME auto-login (documentation):

# /etc/gdm/custom.conf

[daemon]
AutomaticLoginEnable=True
AutomaticLogin={{ username }}

However, you'll find that after each auto-login, the remote desktop password gets reset to a random one. This happens because service passwords on the system are stored encrypted in a keyring. The default keyring's unlock password is the login password, and it gets unlocked together during a normal login. But with auto-login, no password is entered to unlock it, so GNOME can't read the encrypted remote login password and generates a new random one instead.

Setting the default keyring password to empty would leave all stored passwords in plaintext, which is insecure.

Following a community solution, create a password-free insecure keyring specifically for storing the RDP password:

Install the keyring management tool. After installation, you can find the "Passwords and Keys" application:

sudo dnf install seahorse

In the tool, check the default "Login" keyring. It should already contain a remote desktop password entry named "GNOME Remote Desktop RDP credentials". Delete this entry.
Create a new keyring with an empty password, and set it as the default keyring.
Restart the system to apply the new default keyring.
Set the remote desktop password. Check the newly created keyring again -- it should now contain the "GNOME Remote Desktop RDP credentials" entry. From now on, remote desktop will use this keyring to read and set passwords.
Restore the default keyring back to the original "Login" keyring, then restart the system.

Note: You need to change the default keyring so that GNOME creates the password entry. Such entries appear as "Password or Key", whereas manually created password entries show up as "Stored Note" and won't be used by GNOME.

Using Remote Desktop While Locked

By design, GNOME remote desktop mirrors the local screen -- when the local screen is locked, the remote desktop connection is dropped.

This differs from Windows, where connecting via remote desktop automatically locks the local desktop. The GNOME team hasn't officially responded to this feature request.

For now, I've simply disabled the auto-lock screen.

Deep Dive into Linux TProxy

Fri, 23 Jun 2023 09:06:25 GMT

This was my first foray into the kernel networking stack. If you spot any errors, feel free to let me know (email) and I will annotate corrections in the article.

TProxy (Transparent Proxy) is a kernel-supported transparent proxying mechanism introduced in Linux 2.6.28. Unlike NAT, which modifies the packet's destination address for redirection, TProxy merely replaces the socket held by the packet's skb, without modifying packet headers.

Terminology note: TProxy is the general name for the feature, while TPROXY is the name of an iptables extension.

IP_TRANSPARENT

The IP_TRANSPARENT option allows a socket to treat any non-local address as a local address, enabling it to bind to non-local addresses and masquerade as a non-local address when sending and receiving data.

int opt = 1;
setsockopt(sockfd, SOL_IP, IP_TRANSPARENT, &opt, sizeof(opt));

For example, a gateway (192.168.0.1 / 123.x.x.94) acting as a transparent proxy intercepts the connection between a client (192.168.0.200) and a remote server (157.x.x.149). It connects to the remote server on behalf of the client, while also masquerading as the remote server when communicating with the client:

$ netstat -atunp
Proto Recv-Q Send-Q Local Address           Foreign Address            State       PID/Program name
tcp        0      0 123.x.x.94:37338        157.x.x.149:443            ESTABLISHED 2904/proxy
tcp        0      0 ::ffff:157.x.x.149:443  ::ffff:192.168.0.200:56418 ESTABLISHED 2904/proxy

Inbound Redirection

Why Replace the Socket

When the kernel networking stack receives a packet, it looks up the most closely matching socket from the corresponding protocol's hash table based on the packet's 5-tuple, then places the packet into that socket's receive queue. Taking UDP as an example:

// https://elixir.bootlin.com/linux/v6.1.34/source/net/ipv4/udp.c#L2405
int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
		   int proto)
{
	// ...
	sk = skb_steal_sock(skb, &refcounted);
	if (sk) {
		// ...
		ret = udp_unicast_rcv_skb(sk, skb, uh);

static inline struct sock *
skb_steal_sock(struct sk_buff *skb, bool *refcounted)
{
	if (skb->sk) {
		struct sock *sk = skb->sk;
		// ...
		return sk;

static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
			       struct udphdr *uh)
{
	// ...
	ret = udp_queue_rcv_skb(sk, skb);

Netfilter hooks execute before the protocol stack, so modifying skb->sk in netfilter determines which socket's receive queue the packet will ultimately be placed into.

Kernel Implementation

Based on kernel v6.1.34, using the iptables TPROXY module implementation as an example. The nftables implementation is essentially the same.

Core Logic

The main processing flow is in tproxy_tg4() from net/netfilter/xt_TPROXY.c.

Extract headers from the skb:

static unsigned int
tproxy_tg4(struct net *net, struct sk_buff *skb, __be32 laddr, __be16 lport,
	   u_int32_t mark_mask, u_int32_t mark_value)
{
	const struct iphdr *iph = ip_hdr(skb);
	struct udphdr _hdr, *hp;
	struct sock *sk;

	hp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_hdr), &_hdr);
	if (hp == NULL)
		return NF_DROP;

Then begin searching for a socket (sk in the code) to replace the packet skb's original socket.

If a previous packet with the same 4-tuple was already redirected, then the proxy should have already established a connection with the client, and the current packet should also be redirected to that connection:

	/* check if there's an ongoing connection on the packet
	 * addresses, this happens if the redirect already happened
	 * and the current packet belongs to an already established
	 * connection */
	sk = nf_tproxy_get_sock_v4(net, skb, iph->protocol,
				   iph->saddr, iph->daddr,
				   hp->source, hp->dest,
				   skb->dev, NF_TPROXY_LOOKUP_ESTABLISHED);

Set the default redirection destination — unprocessed packets should all be redirected here. The rule-specified address takes priority; otherwise, the primary address of the receiving network device is used:

	laddr = nf_tproxy_laddr4(skb, laddr, iph->daddr);
	if (!lport)
		lport = hp->dest;

__be32 nf_tproxy_laddr4(struct sk_buff *skb, __be32 user_laddr, __be32 daddr)
{
	const struct in_ifaddr *ifa;
	struct in_device *indev;
	__be32 laddr;

	if (user_laddr)
		return user_laddr;

	laddr = 0;
	indev = __in_dev_get_rcu(skb->dev);

	in_dev_for_each_ifa_rcu(ifa, indev) {
		if (ifa->ifa_flags & IFA_F_SECONDARY)
			continue;

		laddr = ifa->ifa_local;
		break;
	}

	return laddr ? laddr : daddr;
}

Forward SYN packets to the proxy to establish new connections instead of reusing TIME_WAIT connections. My guess is that this allows the proxy to more easily synchronize the state of both sides of the connection (client <-> proxy <-> remote):

	/* UDP has no TCP_TIME_WAIT state, so we never enter here */
	if (sk && sk->sk_state == TCP_TIME_WAIT)
		/* reopening a TIME_WAIT connection needs special handling */
		sk = nf_tproxy_handle_time_wait4(net, skb, laddr, lport, sk);

/**
 * nf_tproxy_handle_time_wait4 - handle IPv4 TCP TIME_WAIT reopen redirections
 * @skb:	The skb being processed.
 * @laddr:	IPv4 address to redirect to or zero.
 * @lport:	TCP port to redirect to or zero.
 * @sk:		The TIME_WAIT TCP socket found by the lookup.
 *
 * We have to handle SYN packets arriving to TIME_WAIT sockets
 * differently: instead of reopening the connection we should rather
 * redirect the new connection to the proxy if there's a listener
 * socket present.
 *
 * nf_tproxy_handle_time_wait4() consumes the socket reference passed in.
 *
 * Returns the listener socket if there's one, the TIME_WAIT socket if
 * no such listener is found, or NULL if the TCP header is incomplete.
 */
struct sock *
nf_tproxy_handle_time_wait4(struct net *net, struct sk_buff *skb,
			 __be32 laddr, __be16 lport, struct sock *sk)
{
	const struct iphdr *iph = ip_hdr(skb);
	struct tcphdr _hdr, *hp;

	hp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_hdr), &_hdr);
	if (hp == NULL) {
		inet_twsk_put(inet_twsk(sk));
		return NULL;
	}

	if (hp->syn && !hp->rst && !hp->ack && !hp->fin) {
		/* SYN to a TIME_WAIT socket, we'd rather redirect it
		 * to a listener socket if there's one */
		struct sock *sk2;

		sk2 = nf_tproxy_get_sock_v4(net, skb, iph->protocol,
					    iph->saddr, laddr ? laddr : iph->daddr,
					    hp->source, lport ? lport : hp->dest,
					    skb->dev, NF_TPROXY_LOOKUP_LISTENER);
		if (sk2) {
			nf_tproxy_twsk_deschedule_put(inet_twsk(sk));
			sk = sk2;
		}
	}

	return sk;
}

If no established connection was matched, use the listening-state redirection destination socket:

	else if (!sk)
		/* no, there's no established connection, check if
		 * there's a listener on the redirected addr/port */
		sk = nf_tproxy_get_sock_v4(net, skb, iph->protocol,
					   iph->saddr, laddr,
					   hp->source, lport,
					   skb->dev, NF_TPROXY_LOOKUP_LISTENER);

Finally, verify that the new socket meets the transparent proxy requirements, then replace the packet skb's original socket:

	/* NOTE: assign_sock consumes our sk reference */
	if (sk && nf_tproxy_sk_is_transparent(sk)) {
		/* This should be in a separate target, but we don't do multiple
		   targets on the same rule yet */
		skb->mark = (skb->mark & ~mark_mask) ^ mark_value;
		nf_tproxy_assign_sock(skb, sk);
		return NF_ACCEPT;
	}

	return NF_DROP;
}

/* assign a socket to the skb -- consumes sk */
static inline void nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
{
	skb_orphan(skb);
	skb->sk = sk;
	skb->destructor = sock_edemux;
}

Socket Matching

nf_tproxy_get_sock_v4() is a simple wrapper around the generic TCP/UDP socket matching methods.

// https://elixir.bootlin.com/linux/v6.1.34/source/net/ipv4/netfilter/nf_tproxy_ipv4.c#L75
/*
 * This is used when the user wants to intercept a connection matching
 * an explicit iptables rule. In this case the sockets are assumed
 * matching in preference order:
 *
 *   - match: if there's a fully established connection matching the
 *     _packet_ tuple, it is returned, assuming the redirection
 *     already took place and we process a packet belonging to an
 *     established connection
 *
 *   - match: if there's a listening socket matching the redirection
 *     (e.g. on-port & on-ip of the connection), it is returned,
 *     regardless if it was bound to 0.0.0.0 or an explicit
 *     address. The reasoning is that if there's an explicit rule, it
 *     does not really matter if the listener is bound to an interface
 *     or to 0. The user already stated that he wants redirection
 *     (since he added the rule).
 *
 * Please note that there's an overlap between what a TPROXY target
 * and a socket match will match. Normally if you have both rules the
 * "socket" match will be the first one, effectively all packets
 * belonging to established connections going through that one.
 */
struct sock *
nf_tproxy_get_sock_v4(struct net *net, struct sk_buff *skb,
		      const u8 protocol,
		      const __be32 saddr, const __be32 daddr,
		      const __be16 sport, const __be16 dport,
		      const struct net_device *in,
		      const enum nf_tproxy_lookup_t lookup_type)
{
	struct inet_hashinfo *hinfo = net->ipv4.tcp_death_row.hashinfo;
	struct sock *sk;
	switch (protocol) {

TCP has corresponding matching methods for both states. The only extra step is incrementing the reference count for listening-state sockets to prevent them from being garbage collected:

	case IPPROTO_TCP: {
		struct tcphdr _hdr, *hp;

		hp = skb_header_pointer(skb, ip_hdrlen(skb),
					sizeof(struct tcphdr), &_hdr);
		if (hp == NULL)
			return NULL;

		switch (lookup_type) {
		case NF_TPROXY_LOOKUP_LISTENER:
			sk = inet_lookup_listener(net, hinfo, skb,
						  ip_hdrlen(skb) + __tcp_hdrlen(hp),
						  saddr, sport, daddr, dport,
						  in->ifindex, 0);

			if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
				sk = NULL;
			/* NOTE: we return listeners even if bound to
			 * 0.0.0.0, those are filtered out in
			 * xt_socket, since xt_TPROXY needs 0 bound
			 * listeners too
			 */
			break;
		case NF_TPROXY_LOOKUP_ESTABLISHED:
			sk = inet_lookup_established(net, hinfo, saddr, sport,
						     daddr, dport, in->ifindex);
			break;
		default:
			BUG();
		}
		break;
		}

UDP requires additional checks to determine whether the match result is usable:

	case IPPROTO_UDP:
		sk = udp4_lib_lookup(net, saddr, sport, daddr, dport,
				     in->ifindex);
		if (sk) {
			int connected = (sk->sk_state == TCP_ESTABLISHED);
			int wildcard = (inet_sk(sk)->inet_rcv_saddr == 0);

			/* NOTE: we return listeners even if bound to
			 * 0.0.0.0, those are filtered out in
			 * xt_socket, since xt_TPROXY needs 0 bound
			 * listeners too
			 */
			if ((lookup_type == NF_TPROXY_LOOKUP_ESTABLISHED &&
			      (!connected || wildcard)) ||
			    (lookup_type == NF_TPROXY_LOOKUP_LISTENER && connected)) {
				sock_put(sk);
				sk = NULL;
			}
		}
		break;

There are two qualifying conditions:

connected indicates whether the socket is "connected"
wildcard indicates whether the bind address is INADDR_ANY (0.0.0.0)

However, the condition !connected || wildcard is puzzling, because when connected is true, wildcard is necessarily false, making || wildcard redundant.

When a UDP socket connect()s to a target, it enters the connected state. If it was not previously bound to an exact IP that could be written into the IP packet's destination address field, then during connect() the system's static routing selects a local address to use as both the source address and the local bind address, and assigns it to the inet_rcv_saddr field. Only a disconnect will set the inet_rcv_saddr field back to INADDR_ANY:

// https://elixir.bootlin.com/linux/v6.1.34/source/net/ipv4/datagram.c#L64
int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
{
	//...

	if (!inet->inet_saddr)
		inet->inet_saddr = fl4->saddr;	/* Update source address */
	if (!inet->inet_rcv_saddr) {
		inet->inet_rcv_saddr = fl4->saddr;
		if (sk->sk_prot->rehash)
			sk->sk_prot->rehash(sk);
	}

	// ...

	sk->sk_state = TCP_ESTABLISHED;

	// ...
}

int __udp_disconnect(struct sock *sk, int flags)
{
	struct inet_sock *inet = inet_sk(sk);
	/*
	 *	1003.1g - break association.
	 */

	sk->sk_state = TCP_CLOSE;

	// ...

	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
		inet_reset_saddr(sk);

	// ...
}

static __inline__ void inet_reset_saddr(struct sock *sk)
{
	inet_sk(sk)->inet_rcv_saddr = inet_sk(sk)->inet_saddr = 0;

Therefore, a connected UDP socket's inet_rcv_saddr is always an exact IP address and can never be INADDR_ANY.

The commit that added these qualifying conditions mentions that nf_tproxy_get_sock_v4() is also used by the iptables socket extension. I suspect this might be a historical artifact.

Usage

Using the iptables TPROXY extension as an example:

Specify the redirection destination with --on-port/--on-ip
Since the packet's destination address is not modified, the routing decision after PREROUTING will still forward the packet to the FORWARD chain because the destination is not a local address. Therefore, policy routing is needed to steer the packet into the INPUT chain

ip rule add fwmark 0x233 table 100
ip route add local default dev lo table 100

iptables -t mangle -A PREROUTING -p udp -j TPROXY --on-ip 127.0.0.1 --on-port 10000 --tproxy-mark 0x233
iptables -t mangle -A PREROUTING -p tcp -j TPROXY --on-ip 127.0.0.1 --on-port 10000 --tproxy-mark 0x233

This replaces the packet's original socket with the one bound to :10000, while also setting the 0x233 fwmark. Policy routing is configured so that all packets with the 0x233 fwmark use routing table 100. The local type rule in table 100 achieves the destinations are assigned to this host. **The packets are looped back and delivered locally**. (documentation), ~~and packets sent from the loopback device are all treated as destined for the local host~~, thereby preventing them from being forwarded out.

2025/03/31 Update:

"Packets sent from the loopback device are all treated as destined for the local host" is incorrect. The real key is the routing rule ip route add local default dev lo table 100, where local forces the packet to be received locally. So when the packet comes back out of lo and reaches the Routing decision after PREROUTING again, it is considered destined for the local host and is delivered to INPUT.

Therefore, the inbound/outbound flow works like this:

Inbound traffic -> PREROUTING, fwmark is added to the packet -> Routing decision finds the fwmark matches a routing rule, local forces local delivery -> forwarded to lo -> comes back out of lo as inbound traffic again -> PREROUTING -> Routing decision determines the packet is destined for the local host -> INPUT
Outbound traffic -> OUTPUT, fwmark is added -> Routing decision finds the fwmark matches a routing rule... (the rest follows the same flow as inbound)

Using `-m socket` for Traffic Splitting to Improve Performance

There is no very clear explanation for this; the following is my personal understanding and speculation.

The comment in nf_tproxy_get_sock_v4() mentions this point:

/*
 * Please note that there's an overlap between what a TPROXY target
 * and a socket match will match. Normally if you have both rules the
 * "socket" match will be the first one, effectively all packets
 * belonging to established connections going through that one.
*/

After a packet redirected by TProxy establishes a connection, the networking stack has a mapping between the packet's original 5-tuple and the socket. Subsequent packets for that connection will match the socket through the stack's normal processing — the same socket that TPROXY's sk = nf_tproxy_get_sock_v4(...., NF_TPROXY_LOOKUP_ESTABLISHED) would match — which is already the redirected one, making the subsequent replacement unnecessary.

2024/06/17 Update: Analysis of the performance difference.

In TProxy, nf_tproxy_assign_sock is executed to replace the sk. The skb_orphan call within it invokes the skb destructor sock_edemux, which calls sock_gen_put to decrement the sk's reference count. But for "already-redirected connections," this is entirely redundant, because the old and new sk are the same.

In contrast, the socket module only needs to call sock_gen_put when the found sk differs from the one associated with the skb.

Therefore, the redundant and frequent invocations of sock_gen_put in TProxy can impact performance to some degree.

Additionally, since TProxy and socket were committed together. I speculate that the developers intended transparent proxying to be a collaborative effort between these two modules: socket handles established connections, while TProxy handles new connections. This also explains why TProxy does not check sk != skb->sk when replacing the sk — perhaps precisely because the developers assumed that TProxy mostly handles new connections that have not been redirected yet, and the established connection check is just a safety fallback.

It is relatively uncommon for proxy programs to connect() to the client for UDP, so only TCP is used as an example here:

iptables -t mangle -N tproxy_divert
iptables -t mangle -A tproxy_divert -j MARK --set-mark 0x233
iptables -t mangle -A tproxy_divert -j ACCEPT

iptables -t mangle -A PREROUTING -p tcp -m socket -j tproxy_divert
iptables -t mangle -A PREROUTING -p tcp -j TPROXY --on-port 10000 --on-ip 127.0.0.1 --tproxy-mark 0x233

Retrieving the Original Destination Address

TCP

Use getsockname() to obtain the "local" address of the client socket, which is the packet's original destination address:

client_fd = accept(server_fd, (struct sockaddr*)&client_addr, &addr_len);

getsockname(client_fd, (struct sockaddr*) orig_dst, &addrlen)

UDP

Use setsockopt(..., SOL_IP, IP_RECVORIGDSTADDR, ...) to set the socket option so that recvmsg() provides IP_RECVORIGDST ancillary data, which is the packet's destination address. Thanks to TProxy not modifying the original packet, this ancillary information is obtained from the IP header:

// /net/ipv4/ip_sockglue.c
static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
{
	struct sockaddr_in sin;
	const struct iphdr *iph = ip_hdr(skb);
	__be16 *ports = (__be16 *)skb_transport_header(skb);

	if (skb_transport_offset(skb) + 4 > (int)skb->len)
		return;

	/* All current transport protocols have the port numbers in the
	 * first four bytes of the transport header and this function is
	 * written with this assumption in mind.
	 */

	sin.sin_family = AF_INET;
	sin.sin_addr.s_addr = iph->daddr;
	sin.sin_port = ports[1];
	memset(sin.sin_zero, 0, sizeof(sin.sin_zero));

	put_cmsg(msg, SOL_IP, IP_ORIGDSTADDR, sizeof(sin), &sin);
}

Use recvmsg() to read the packet and its ancillary data
The ancillary data with level SOL_IP and type IP_ORIGDSTADDR contains the original destination address

Complete example:

#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>

#define MAX_BUF_SIZE 1024
#define SRC_ADDR INADDR_ANY
#define SRC_PORT 9999

int main() {
  int sockfd;
  struct sockaddr_in bind_addr, client_addr;
  char buffer[MAX_BUF_SIZE];

  if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
    perror("socket");
    exit(EXIT_FAILURE);
  }

  int opt = 1;
  if (setsockopt(sockfd, SOL_IP, IP_TRANSPARENT, &opt, sizeof(opt)) < 0) {
    perror("IP_TRANSPARENT");
    exit(EXIT_FAILURE);
  }

  // bind
  memset(&bind_addr, 0, sizeof(bind_addr));
  bind_addr.sin_family = AF_INET;
  bind_addr.sin_addr.s_addr = htonl(SRC_ADDR);
  bind_addr.sin_port = htons(SRC_PORT);
  if (bind(sockfd, (struct sockaddr *)&bind_addr, sizeof(bind_addr)) < 0) {
    perror("bind");
    exit(EXIT_FAILURE);
  }

  // recvmsg
  if (setsockopt(sockfd, SOL_IP, IP_RECVORIGDSTADDR, &opt, sizeof(opt)) < 0) {
    perror("IP_RECVORIGDSTADDR");
    exit(EXIT_FAILURE);
  }
  while (1) {
    memset(buffer, 0, sizeof(buffer));
    struct msghdr msgh = {0};
    struct iovec iov[1];
    iov[0].iov_base = buffer;
    iov[0].iov_len = sizeof(buffer);
    msgh.msg_iov = iov;
    msgh.msg_iovlen = 1;
    msgh.msg_name = &client_addr;
    msgh.msg_namelen = sizeof(client_addr);
    char cmsgbuf[CMSG_SPACE(sizeof(int))];
    msgh.msg_control = cmsgbuf;
    msgh.msg_controllen = sizeof(cmsgbuf);
    if (recvmsg(sockfd, &msgh, 0) < 0) {
      perror("recvmsg");
      continue;
    }

    struct cmsghdr *cmsg;
    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
      if (cmsg->cmsg_level == IPPROTO_IP && cmsg->cmsg_type == IP_ORIGDSTADDR) {
        struct sockaddr_in *addr = (struct sockaddr_in *)CMSG_DATA(cmsg);
        printf("Original DST ADDR: %s\n", inet_ntoa(addr->sin_addr));
        break;
      }
    }
    printf("Data: %s\n", buffer);
  }

  close(sockfd);

  return 0;
}

References

Examples:

https://stackoverflow.com/a/5814636/12812480
https://stackoverflow.com/a/44206723/12812480
https://github.com/kristrev/tproxy-example
https://github.com/KatelynHaworth/go-tproxy

Hijacking Golang Compilation

Wed, 03 Nov 2021 08:22:46 GMT

This article was originally published on Seebug

A while ago, I studied 0x7F's "DLL Hijacking and Its Applications", which mentioned using DLL hijacking to hijack compilers for supply chain attacks. This reminded me that certain mechanisms in Go could also be leveraged to achieve build hijacking, so I did some research and testing.

The Compilation Process

First, let's understand what go build does.

package main

func main() {
	print("i'm testapp!")
}

Using this simple program as an example, go build -x main.go compiles and prints the compilation process (due to space constraints, the most basic dependencies are not force-recompiled):

The above commands can be summarized as:

Create a temporary directory
Generate configuration files needed by compile, run compile to produce object files ***.a (other build tools perform similar operations)
Write the build ID
Repeat steps 2 and 3 to compile all dependencies
Generate configuration files needed by link, run link to combine the object files into an executable
Write the build ID
Move the linked executable to the current directory and delete the temporary directory

A few interesting things can be observed from these commands.

Each compilation stage is handled by a separate tool program, such as compile, link, and asm. These tool programs can be accessed via go tool, and I'll refer to the ones used for compilation as build tools.

The commands contain large sections of packagefile xxx/xxx=xxx.a entries that specify the mapping between code dependencies and object files. These mappings are written into importcfg/importcfg.link as configuration files for compile/link.

Additionally, temporary directories in the form of $WORK/b001 are created. Before running the build tools, go build resolves all dependency relationships, creates corresponding actions for each package based on those dependencies, and ultimately forms an action graph. Executing these actions in order completes the compilation, with each action corresponding to a temporary directory. For example, compiling a program with go build -a -work (-a forces recompilation of everything, -work preserves temporary directories):

The figure shows the temporary directories used by each action. For instance, b062 contains the compilation configuration file importcfg and the compiled object file _pkg_.a, while the last action's directory b001 contains not only compilation artifacts but also the link configuration importcfg.link and the link result exe/a.out.

In summary, here are the key takeaways:

The main work of go build: analyze dependencies, compile source code into object files, and link object files into an executable
Object files and configuration files are stored in temporary directories (b001 is the last one and where the executable is produced); temporary directories can be preserved with the -work flag
Build tools are invoked to handle different stages of compilation
Later actions depend on the results of earlier actions

The compilation process is quite "decentralized," which creates opportunities for us:

The build tools are open source, so they can be modified and replaced in the go env GOTOOLDIR directory
Leveraging the go build -toolexec mechanism

Both approaches share a similar idea. This article explores the second approach.

Hijacking the Build

While researching code obfuscation some time ago, I learned about the -toolexec mechanism of go build. Here's the relevant excerpt:

Keen readers may have noticed an interesting detail: the actual target in the assembled command is not the build tool itself, but cfg.BuildToolexec. Tracing back to its definition reveals that it's set by the go build -toolexec parameter. The official description is:
-toolexec 'cmd args'
a program to use to invoke toolchain programs like vet and asm.
For example, instead of running asm, the go command will run
  'cmd args /path/to/asm <arguments for asm>'.
That is, the program specified by -toolexec is used to run the build tools. This can essentially be seen as a hook mechanism — by specifying our own program with this parameter, we can invoke the build tools through it during compilation, thereby intervening in the build process.

So our goal is to implement a tool similar to garble, which I'll call a wrapper. By inserting -toolexec "/path/to/wrapper" into the project's build script or wherever build commands exist, the wrapper will find a suitable location (tentatively the top of main.main()) to insert the payload when the build command runs.

First, we need to locate the target source file.

/path/to/wrapper /opt/homebrew/Cellar/go/1.17.2/libexec/pkg/tool/darwin_arm64/compile -o $WORK/b042/_pkg_.a -trimpath "$WORK/b042=>" -shared -p strings -std -complete -buildid ygbMG98G6g0UHH5pai26/ygbMG98G6g0UHH5pai26 -goversion go1.17.2 -importcfg $WORK/b042/importcfg -pack /opt/homebrew/Cellar/go/1.17.2/libexec/src/strings/builder.go /opt/homebrew/Cellar/go/1.17.2/libexec/src/strings/compare.go
...(omitted)

This is a command executed by go build -toolexec "/path/to/wrapper", where the target source file paths for compile are appended at the end. After extracting the file paths, we determine whether a file contains main.main() based on its content. There are many ways to do this — for instance, simply checking if it starts with package main and contains func main(){, or more rigorously by parsing the AST and checking the following characteristics:

Since all files in a single compile command belong to the same package, we can skip the remaining files as soon as one doesn't meet the criteria.

In summary, the first step filters by the following conditions:

The invoked tool is compile
The file has a .go extension
The AST shows the package name is main, and there exists an ast.FuncDecl named main in Decls

With the target source file located, the next step is to insert the payload by modifying the AST.

Based on the AST diagram from the previous step, each statement in main() is parsed as an ast.Stmt interface type, stored in Body.List. So we construct AST nodes following the format of concrete statements, such as:

var cmd = `exec.Command("open", "/System/Applications/Calculator.app").Run()`
payloadExpr, err := parser.ParseExpr(cmd)
// handle err
payloadExprStmt := &ast.ExprStmt{
  X: payloadExpr,
}

Insert the payload node into main()'s Body.List:

// Method 1
ast.Inspect(f, func(n ast.Node) bool {
  switch x := n.(type) {
  case *ast.FuncDecl:
    if x.Name.Name == "main" && x.Recv == nil {
      stmts := make([]ast.Stmt, 0, len(x.Body.List)+1)
      stmts = append(stmts, payloadExprStmt)
      stmts = append(stmts, x.Body.List...)
      x.Body.List = stmts
      return false
    }
  }
  return true
})

// Method 2
pre := func(cursor *astutil.Cursor) bool {
  switch cursor.Node().(type) {
  case *ast.FuncDecl:
    if fd := cursor.Node().(*ast.FuncDecl); fd.Name.Name == "main" && fd.Recv == nil {
      return true
    }
    return false
  case *ast.BlockStmt:
    return true
  case ast.Stmt:
    if _, ok := cursor.Parent().(*ast.BlockStmt); ok {
      cursor.InsertBefore(payloadExprStmt)
    }
  }
  return true
}
post := func(cursor *astutil.Cursor) bool {
  if _, ok := cursor.Parent().(*ast.BlockStmt); ok {
    return false
  }
  return true
}
f = astutil.Apply(f, pre, post).(*ast.File)

Finally, save the modified AST as a file, replace the file path in the original compile command, and execute the command.

Simple enough — it seems like everything works smoothly at this point. However, testing reveals an error: os/exec cannot be found:

/var/folders/z5/1_qfr0f55x97c63p412hprzw0000gn/T/gobuild_cache_1747406166/main.go:5:2: could not import "os/exec": open : no such file or directory

Recall the "Compilation Process" section above: both the compilation and linking stages require the object files that were compiled earlier for their dependencies. Moreover, the dependency analysis and action graph construction are completed by go build before running the build tools and cannot be hijacked via -toolexec. So inserting a dependency into the AST's import nodes doesn't modify the existing dependency relationships or action graph, meaning there's no object file available for os/exec.

Since the action graph is missing os/exec and its dependencies, we can complete the missing actions ourselves — that is, compile the corresponding object files and add them to importcfg.

Comparing the importcfg files reveals that there are more transitive dependencies than expected. Fortunately, they're all recorded in importcfg, so we create a new go build to compile a simplified payload:

package main

import "os/exec"

func main() {
	exec.Command("xxx").Run()
}

By adding the -work flag to preserve the temporary directory from this build, we can read the importcfg in temporary directory b001 to obtain the object file paths for os/exec's dependencies, and then append these configuration entries to the original importcfg as needed.

Trying again, we can see the payload is successfully inserted.

Additionally, you may notice that the tests above all use the -a flag. This is because go build has caching and incremental compilation mechanisms — a normal go build might hit the cache and not invoke the tools at all. So we need to add the -a flag to force recompilation of all dependencies, or run go clean -cache before building to clear the cache, or change the GOCACHE environment variable to point to a new directory.

Finally, let's recap the steps:

During compile:
1. Locate the target file
2. Compile a simplified payload to obtain importcfg and its dependency artifacts
3. Supplement importcfg
4. Insert the payload into the AST and save to a temporary file
5. Modify the file path in the original compile command and execute it
During link:
1. Locate the target file
2. Supplement importcfg.link
3. Execute the link command

Conclusion

The approach demonstrated in this article leverages the -toolexec mechanism of go build to let a tool intervene in the compilation process and insert a payload into temporary files.

From a practical standpoint, many challenges remain — for example, how to covertly insert -toolexec and -a into build scripts. Without suitable camouflage techniques, modifying and replacing the build tools compile and link following the approach described in this article might be a better choice.

The code related to this article is available at go-build-hijacking. I'll continue to add improvements as new ideas come up. Feel free to reach out via issues or email.

Ref

A First Look at Golang Code Obfuscation

Wed, 19 May 2021 07:22:36 GMT

This article was originally published on Seebug

In recent years, Golang has surged in popularity. Thanks to its excellent performance, high development efficiency, and cross-platform capabilities, it has been widely adopted in software development. While enjoying the conveniences Golang brings, developers also need to think about how to protect their code and increase the difficulty of reverse engineering.

Due to mechanisms like reflection in Golang, a large amount of information such as file paths and function names must be packed into the binary. This information cannot be stripped, so we consider obfuscating the code to raise the bar for reverse engineering.

This article primarily explores Golang code obfuscation techniques by analyzing the implementation of the burrowers/garble project. Due to the scarcity of related resources, most of the content here is based on source code analysis. If there are any errors, please feel free to point them out in the comments or via email.

Prerequisites

The Compilation Process

Go's compilation process can be abstracted as:

Lexical analysis: converting a character sequence into a token sequence
Syntax analysis: parsing tokens into an AST
Type checking
Generating intermediate code
Generating machine code

This article will not delve into compiler theory in detail. For further reading, I recommend Go Language Design and Implementation - Compilation Principles and Introduction to the Go compiler.

Let's explore the compilation process more intuitively from the source code perspective. The implementation of go build is in src/cmd/go/internal/work/build.go. Ignoring the handling of compiler type selection, environment information, etc., we focus only on the core part:

func runBuild(ctx context.Context, cmd *base.Command, args []string) {
	...
  var b Builder
  ...
  pkgs := load.PackagesAndErrors(ctx, args)
  ...
	a := &Action{Mode: "go build"}
	for _, p := range pkgs {
		a.Deps = append(a.Deps, b.AutoAction(ModeBuild, depMode, p))
	}
	...
	b.Do(ctx, a)
}

The Action struct here represents a single action. Each action has a description, an associated package, dependencies (Deps), and other information. All related actions together form an action graph.

// An Action represents a single action in the action graph.
type Action struct {
	Mode     string         // description of action operation
	Package  *load.Package  // the package this action works on
	Deps     []*Action      // actions that must happen before this one
	Func     func(*Builder, context.Context, *Action) error // the action itself (nil = no-op)
	...
}

After creating action a as the "root vertex," it iterates over the packages specified for compilation, creating an action for each one. This creation process is recursive — during creation, it analyzes each package's dependencies and creates actions for them as well. For example, the src/cmd/go/internal/work/action.go (b *Builder) CompileAction method:

for _, p1 := range p.Internal.Imports {
	a.Deps = append(a.Deps, b.CompileAction(depMode, depMode, p1))
}

The final a.Deps serves as the "starting points" of the action graph. Once the action graph is constructed, action a is used as the "root" for a depth-first traversal, where dependent actions are sequentially added to the task queue and then executed concurrently via action.Func.

Each type of action has a designated method for its Func, which is the core part of the action. For example:

a := &Action{
  Mode: "build",
  Func: (*Builder).build,
  ...
}

a := &Action{
  Mode: "link",
  Func: (*Builder).link,
  ...
}
...

Digging further, you'll find that aside from some necessary preprocessing, (*Builder).link calls the BuildToolchain.ld method, and (*Builder).build calls methods like BuildToolchain.symabis, BuildToolchain.gc, BuildToolchain.asm, and BuildToolchain.pack to implement the core functionality. BuildToolchain is of the toolchain interface type, which defines the following methods:

// src/cmd/go/internal/work/exec.go
type toolchain interface {
	// gc runs the compiler in a specific directory on a set of files
	// and returns the name of the generated output file.
	gc(b *Builder, a *Action, archive string, importcfg, embedcfg []byte, symabis string, asmhdr bool, gofiles []string) (ofile string, out []byte, err error)
	// cc runs the toolchain's C compiler in a directory on a C file
	// to produce an output file.
	cc(b *Builder, a *Action, ofile, cfile string) error
	// asm runs the assembler in a specific directory on specific files
	// and returns a list of named output files.
	asm(b *Builder, a *Action, sfiles []string) ([]string, error)
	// symabis scans the symbol ABIs from sfiles and returns the
	// path to the output symbol ABIs file, or "" if none.
	symabis(b *Builder, a *Action, sfiles []string) (string, error)
	// pack runs the archive packer in a specific directory to create
	// an archive from a set of object files.
	// typically it is run in the object directory.
	pack(b *Builder, a *Action, afile string, ofiles []string) error
	// ld runs the linker to create an executable starting at mainpkg.
	ld(b *Builder, root *Action, out, importcfg, mainpkg string) error
	// ldShared runs the linker to create a shared library containing the pkgs built by toplevelactions
	ldShared(b *Builder, root *Action, toplevelactions []*Action, out, importcfg string, allactions []*Action) error

	compiler() string
	linker() string
}

Go implements this interface separately for the gc and gccgo compilers. go build selects between them during program initialization:

func init() {
	switch build.Default.Compiler {
	case "gc", "gccgo":
		buildCompiler{}.Set(build.Default.Compiler)
	}
}

func (c buildCompiler) Set(value string) error {
	switch value {
	case "gc":
		BuildToolchain = gcToolchain{}
	case "gccgo":
		BuildToolchain = gccgoToolchain{}
  ...
}

Here we only look at the gc compiler portion in src/cmd/go/internal/work/gc.go. Taking the gc method as an example:

func (gcToolchain) gc(b *Builder, a *Action, archive string, importcfg, embedcfg []byte, symabis string, asmhdr bool, gofiles []string) (ofile string, output []byte, err error) {
	// ...
	// Assemble arguments
	// ...

	args := []interface{}{cfg.BuildToolexec, base.Tool("compile"), "-o", ofile, "-trimpath", a.trimpath(), gcflags, gcargs, "-D", p.Internal.LocalPrefix}

	// ...

	output, err = b.runOut(a, base.Cwd, nil, args...)
	return ofile, output, err
}

At a high level, the gc method doesn't actually perform the compilation work itself. Its main role is to assemble a command that invokes the binary located at base.Tool("compile"). These programs can be called Go compilation tools, located in the pkg/tool directory with source code in src/cmd. Similarly, the other methods also call their respective compilation tools to perform the actual compilation work.

Attentive readers may notice an interesting detail: the actual executable in the assembled command is not the compilation tool itself, but cfg.BuildToolexec. Tracing this to its definition reveals it is set by the go build -toolexec flag. The official description is:

-toolexec 'cmd args'
  a program to use to invoke toolchain programs like vet and asm.
  For example, instead of running asm, the go command will run
  'cmd args /path/to/asm <arguments for asm>'.

In other words, -toolexec specifies a program to run the compilation tools. This can be thought of as a hook mechanism — by using this flag to specify our own program, we can intervene in the compilation process by having our program invoke the compilation tools. The garble project analyzed below uses exactly this approach. Here's a command excerpt from the compilation process (go build -n outputs the executed commands) to help illustrate. For example, if we specify -toolexec=/home/atom/go/bin/garble, then the actual command executed during compilation is:

/home/atom/go/bin/garble /usr/local/go/pkg/tool/linux_amd64/compile -o $WORK/b016/_pkg_.a -trimpath "/usr/local/go/src/sync=>sync;$WORK/b016=>" -p sync -std -buildid FRNt7EHDh77qHujLKnmK/FRNt7EHDh77qHujLKnmK -goversion go1.16.4 -D "" -importcfg $WORK/b016/importcfg -pack -c=4 /usr/local/go/src/sync/cond.go /usr/local/go/src/sync/map.go /usr/local/go/src/sync/mutex.go /usr/local/go/src/sync/once.go /usr/local/go/src/sync/pool.go /usr/local/go/src/sync/poolqueue.go /usr/local/go/src/sync/runtime.go /usr/local/go/src/sync/runtime2.go /usr/local/go/src/sync/rwmutex.go /usr/local/go/src/sync/waitgroup.go

To summarize, go build invokes compilation tools like compile by assembling commands, and we can use the go build -toolexec flag to specify a program that "intervenes" in the compilation process.

go/ast

In Golang, AST types and methods are defined by the go/ast standard library. The garble project analyzed later involves extensive type assertions and type switches with go/ast, so it's important to have a general understanding of these types. Most types are defined in src/go/ast/ast.go, where the comments are quite detailed. For convenience, I've put together a relationship diagram. The branches in the diagram represent inheritance relationships, and all types are based on the Node interface:

This article doesn't intend to dive deep into ASTs, but I believe a basic understanding should be sufficient for the rest of this article. If you find it difficult to follow, I recommend reading Introduction to Go Syntax Trees — A Journey into Building Your Own Programming Language and Compiler! to fill in any gaps, or using the online tool goast-viewer to visualize ASTs for analysis.

Tool Analysis

Among open-source Go code obfuscation projects, the two with the most stars are burrowers/garble and unixpickle/gobfuscate. The former has more up-to-date features, so this article primarily analyzes garble, version 8edde922ee5189f1d049edb9487e6090dd9d45bd.

Features

Supports modules, Go 1.16+
Does not handle the following cases:
- CGO
- Items marked as ignoreObjects:
  - Types of arguments passed to reflect.ValueOf or reflect.TypeOf
  - Functions used in go:linkname
  - Exported methods
  - Types and variables imported from unobfuscated packages
  - Constants
- The runtime package and its dependencies (support obfuscating the runtime package #193)
- Go plugins
Hashes the names of eligible packages, functions, variables, types, etc.
Replaces strings with anonymous functions
Removes debug information and symbol tables
Can output obfuscated Go code via the -debugdir option
Can specify different seeds to produce different obfuscation results

At a high level, garble can be divided into two modes:

Active mode: When the first command argument matches one of garble's presets, it means garble was invoked directly by the user. In this phase, it configures settings based on arguments, retrieves dependency package information, and then persists the configuration. If the command is build or test, it adds -toolexec=path/to/garble to set itself as the launcher for compilation tools, leading to launcher mode.
Launcher mode: It "intercepts" the three tools — compile/asm/link — performing source code obfuscation and modifying runtime arguments before the compilation tools run, then finally runs the tools to compile the obfuscated code.

Fetching and modifying arguments takes up a significant amount of code. For easier analysis, later sections will gloss over these details. Interested readers can consult the official documentation to learn about each argument's purpose.

Constructing the Target List

The target list is constructed in active mode. Here's an excerpt of the key code:

// listedPackage contains the 'go list -json -export' fields obtained by the
// root process, shared with all garble sub-processes via a file.
type listedPackage struct {
	Name       string
	ImportPath string
	ForTest    string
	Export     string
	BuildID    string
	Deps       []string
	ImportMap  map[string]string
	Standard   bool

	Dir     string
	GoFiles []string

	// The fields below are not part of 'go list', but are still reused
	// between garble processes. Use "Garble" as a prefix to ensure no
	// collisions with the JSON fields from 'go list'.

	GarbleActionID []byte

	Private bool
}

func setListedPackages(patterns []string) error {
  args := []string{"list", "-json", "-deps", "-export", "-trimpath"}
  args = append(args, cache.BuildFlags...)
  args = append(args, patterns...)
  cmd := exec.Command("go", args...)
  ...
  cache.ListedPackages = make(map[string]*listedPackage)
  for ...{
    var pkg listedPackage
    ...
    cache.ListedPackages[pkg.ImportPath] = &pkg
    ...
  }
}

The core mechanism uses the go list command, where the -deps flag is officially described as:

The -deps flag causes list to iterate over not just the named packages but also all their dependencies. It visits them in a depth-first post-order traversal, so that a package is listed only after all its dependencies. Packages not explicitly listed on the command line will have the DepOnly field set to true.

This traversal is actually quite similar to how go build creates actions, as analyzed earlier. Through this command, garble can obtain all dependency information for the project (including transitive dependencies), iterating over and storing them in cache.ListedPackages. Additionally, it marks whether each dependency package is under the env.GOPRIVATE directory — only files under this directory will be obfuscated (with the exception that some parts of runtime are processed when the -tiny flag is used). You can set the environment variable GOPRIVATE="*" to expand the scope for better obfuscation results. Regarding the scope of obfuscation, garble's author is also working on improvements: idea: break away from GOPRIVATE? #276.

At this point, the obfuscation targets have been identified. Along with some configuration-saving operations, the active mode's tasks are essentially complete, and it can then execute the assembled command, leading to launcher mode.

In launcher mode, the three compilation tools — compile/asm/link — are intercepted to "intervene in the compilation process." The quotes are intentional because garble doesn't actually perform any compilation work itself. Like go build, it acts as a middleman, modifying source code or the arguments passed to the compilation tools, ultimately relying on these three tools to do the actual compilation. Let's analyze each one.

compile

The implementation is in the main.go transformCompile function. Its main job is processing Go files and modifying command arguments. The go build -n flag outputs the executed commands, and we can pass this flag when using garble to get a more intuitive view of the compilation process. Here's an excerpt:

/home/atom/go/bin/garble /usr/local/go/pkg/tool/linux_amd64/compile -o $WORK/b016/_pkg_.a -trimpath "/usr/local/go/src/sync=>sync;$WORK/b016=>" -p sync -std -buildid FRNt7EHDh77qHujLKnmK/FRNt7EHDh77qHujLKnmK -goversion go1.16.4 -D "" -importcfg $WORK/b016/importcfg -pack -c=4 /usr/local/go/src/sync/cond.go /usr/local/go/src/sync/map.go /usr/local/go/src/sync/mutex.go /usr/local/go/src/sync/once.go /usr/local/go/src/sync/pool.go /usr/local/go/src/sync/poolqueue.go /usr/local/go/src/sync/runtime.go /usr/local/go/src/sync/runtime2.go /usr/local/go/src/sync/rwmutex.go /usr/local/go/src/sync/waitgroup.go

This command uses the compile tool to compile files like cond.go into intermediate code. When garble detects that the current compilation tool is compile, it "intercepts" it and performs obfuscation and other tasks before the tool runs. Let's analyze the key parts.

First, the input Go files are parsed into ASTs:

var files []*ast.File
for _, path := range paths {
  file, err := parser.ParseFile(fset, path, nil, parser.ParseComments)
  if err != nil {
    return nil, err
  }
  files = append(files, file)
}

Then type checking is performed — this is also a step in normal compilation. If type checking fails, it means the files cannot be compiled successfully, and the program exits.

Since the type names of nodes involved in reflection (reflect.ValueOf / reflect.TypeOf) may be used in subsequent logic, their names cannot be obfuscated:

if fnType.Pkg().Path() == "reflect" && (fnType.Name() == "TypeOf" || fnType.Name() == "ValueOf") {
  for _, arg := range call.Args {
    argType := tf.info.TypeOf(arg)
    tf.recordIgnore(argType, tf.pkg.Path())
  }
}

This introduces an important map that persists throughout each compile lifecycle, recording all objects that cannot be obfuscated: types used in reflection arguments, identifiers used in constant expressions and go:linkname, and variables and types imported from unobfuscated packages:

// ignoreObjects records all the objects we cannot obfuscate. An object
// is any named entity, such as a declared variable or type.
//
// So far, this map records:
//
//  * Types which are used for reflection; see recordReflectArgs.
//  * Identifiers used in constant expressions; see RecordUsedAsConstants.
//  * Identifiers used in go:linkname directives; see handleDirectives.
//  * Types or variables from external packages which were not
//    obfuscated, for caching reasons; see transformGo.
ignoreObjects map[types.Object]bool

Let's use the case of identifying "identifiers used in constant expressions" with the ast.GenDecl type as an example:

// RecordUsedAsConstants records identifieres used in constant expressions.
func RecordUsedAsConstants(node ast.Node, info *types.Info, ignoreObj map[types.Object]bool) {
	visit := func(node ast.Node) bool {
		ident, ok := node.(*ast.Ident)
		if !ok {
			return true
		}

		// Only record *types.Const objects.
		// Other objects, such as builtins or type names,
		// must not be recorded as they would be false positives.
		obj := info.ObjectOf(ident)
		if _, ok := obj.(*types.Const); ok {
			ignoreObj[obj] = true
		}

		return true
	}

	switch x := node.(type) {
	...
	// in a const declaration all values must be constant representable
	case *ast.GenDecl:
		if x.Tok != token.CONST {
			break
		}
		for _, spec := range x.Specs {
			spec := spec.(*ast.ValueSpec)

			for _, val := range spec.Values {
				ast.Inspect(val, visit)
			}
		}
	}
}

Suppose the code to be obfuscated is:

package obfuscate

const (
	H2 string = "a"
	H4 string = "a" + H2
	H3 int    = 123
	H5 string = "a"
)

We can see that the identifier used in a constant expression is H2. Let's walk through the determination process in the code. First, the entire const block matches the ast.GenDecl type. Then it iterates over its Specs (each definition), and for each spec, iterates over its Values (the expressions on the right side of the equals sign). It then uses ast.Inspect() to traverse each element in val, executing visit(). If an element node's type is ast.Ident and the object it points to is of type types.Const, that object is recorded in tf.recordIgnore. It's a bit convoluted, so let's print the AST:

We can clearly see that H2 in H4 string = "a" + H2 fully meets the criteria and should be recorded in tf.recordIgnore. The upcoming analysis will involve many type assertions and type switches, which may look complex but are fundamentally similar to the process we just analyzed — we just need to write a demo and print the AST to understand it easily.

Back to main.go transformCompile. Next, the current package name is obfuscated and written into the command arguments and source files, provided the file is neither in the main package nor outside the env.GOPRIVATE directory. The next step processes comments and source code. There's special handling for runtime and CGO here, which we can safely ignore, and look directly at the handling for regular Go code:

// transformGo obfuscates the provided Go syntax file.
func (tf *transformer) transformGo(file *ast.File) *ast.File {
	if opts.GarbleLiterals {
		file = literals.Obfuscate(file, tf.info, fset, tf.ignoreObjects)
	}

	pre := func(cursor *astutil.Cursor) bool {...}
	post := func(cursor *astutil.Cursor) bool {...}

	return astutil.Apply(file, pre, post).(*ast.File)
}

First it obfuscates literals, then recursively processes each node of the AST, and finally returns the processed AST. These parts share a similar approach, all using astutil.Apply(file, pre, post) for recursive AST processing, where pre and post functions are called before and after visiting child nodes, respectively. Much of this code consists of rather tedious filtering operations, so here's just a brief analysis:

literals.Obfuscate pre

Skips the following cases: values that need to be inferred, those containing non-basic types, types that need to be inferred (implicit type definitions), and constants marked in ignoreObj. For constants that pass the filter, their token is changed from const to var to facilitate later replacement with anonymous functions. However, if any single constant in a const block cannot be changed to var, the entire block remains unmodified.
literals.Obfuscate post

Replaces string, byte slice, or array values with anonymous functions. The effect is shown below:
transformGo pre

Skips nodes with names containing _ (unnamed) or _C / _cgo (cgo code). For embedded fields, it finds the actual object to process, then further filters based on the object's type:
- types.Var: Skips non-global variables. For fields, the struct's type name is used as a hash salt. If the field's parent struct is unobfuscated, it's recorded in tf.ignoreObjects.
- types.TypeName: Skips non-global types. If the type was not obfuscated at its definition site, it's skipped.
- types.Func: Skips exported methods, main/init/TestMain functions, and test functions.
If a node passes the filter, its name is hashed.
transformGo post: Hashes import paths.

At this point, the source code obfuscation is complete. All that remains is to write the new code to a temporary directory and splice the address into the command to replace the original file paths. A new compile command is now ready, and executing it compiles the obfuscated code using the compilation tools.

asm

This is relatively simple and only applies to private packages. The core operations are:

Adding the temporary directory path to the beginning of the -trimpath argument
Replacing called function names with their obfuscated versions. In Go assembly files, called function names are preceded by ·, which is used as the search pattern.

link

This is also relatively simple. The core operations are:

Replacing the package name (pkg) and variable name (name) marked by the -X pkg.name=str argument with their obfuscated versions
Clearing the -buildid argument to prevent build ID leakage
Adding the -w -s flags to remove debug information, the symbol table, and the DWARF symbol table

Obfuscation Results

Let's write a small piece of code and compile it twice: once with go build . and once with go env -w GOPRIVATE="*" && garble -literals build .. As you can see, the simple code on the left becomes much harder to read after obfuscation:

Let's also load them into IDA and parse with go_parser. In the unobfuscated file, information like file names and function names is clearly visible, and the code logic is fairly clean:

After obfuscation, function names and other information are replaced with garbled text. Moreover, since strings have been replaced with anonymous functions, the code logic is much more confusing:

When dealing with larger projects with more dependencies, the chaos introduced by code obfuscation becomes even more severe. Since third-party dependency packages are also obfuscated, reverse engineers can no longer guess the code logic based on imported third-party packages.

Conclusion

This article explored the general workflow of how Golang's compilation process invokes the toolchain, as well as the burrowers/garble project, from a source code implementation perspective. We learned how to use go/ast to perform code obfuscation. Through obfuscation, the code's logical structure and the information retained in the binary become much harder to read, significantly increasing the difficulty of reverse engineering.

/home/rook1e

AI Coding Notes 1

Indie Hacking Memo

Use Boring Tech Stacks

MVP Should Only Include One Core Feature

Charge From Day One

Fail Fast, Grow Fast

Be a Salesperson, a Founder, Not Just a Developer

RawWeb Updates: SimHash and Meilisearch

Document Deduplication

SimHash

Calculating Hash and Hamming Distance

Further Optimizing Tokenization

Filtering Similar Content

References

Migrating to Meilisearch

Multilingual Documents

Issue 1: High Storage Space Usage

Issue 2: Memory Usage Limit Ineffective

Issue 3: Very Slow Document Deletion Leading to Task Backlog

Conclusion

The first three iterations of RawWeb.org's tech stack

Middleware

Multi-language Content Support

Backend

v1 - Django

v2 - Nest.js

v3 - Go

Alternatives

Frontend

Infrastructure

Setting Up a Lightweight Remote Linux Dev Environment (Fedora 38)

Basic Setup

Remove Software Update Services

Disable GNOME Desktop Environment

Remote Desktop

Auto-login + Unlocking Remote Login Password

Using Remote Desktop While Locked

Deep Dive into Linux TProxy

IP_TRANSPARENT

Inbound Redirection

Why Replace the Socket

Kernel Implementation

Core Logic

Socket Matching

Usage

Using -m socket for Traffic Splitting to Improve Performance

Retrieving the Original Destination Address

TCP

UDP

References

Hijacking Golang Compilation

The Compilation Process

Hijacking the Build

Conclusion

Ref

A First Look at Golang Code Obfuscation

Prerequisites

The Compilation Process

go/ast

Tool Analysis

Features

Constructing the Target List

compile

asm

link

Obfuscation Results

Conclusion

Using `-m socket` for Traffic Splitting to Improve Performance