2026腾讯游戏安全技术竞赛-安卓客户端安全-决赛 Writeup

前言

什么?你怎么知道我2026腾讯游戏安全技术大赛安卓客户端决赛拿了第三名,嘻嘻🤭

image-20260424150212264

这决赛真是打的累死我了,为了补最后一天去打软件安全赛的时间,熬夜打了两天,总时长大概花费 48 小时,打完在看雪上看到别人的分析比我完善得多,以为寄了呢,担惊受怕几天,还是很幸运拿到奖了!( •̀ ω •́ )y

*.gdc 文件解密与还原

*.gdc 文件解密

依旧分析 libgodot_android.so ,搜索 PackedSourcePCK,定位到函数 sub_3804BEC

image-20260418001238548

密钥和初赛是一样的,依旧让 ai 进行分析,以下加密分析为 ai 给出

1 密钥

32 字节硬编码密钥存储于 byte_400EF18(.bss 段):

1
2
CE 4D F8 75 3B 59 A5 A3 9A DE 58 AC 07 EF 94 7A
3D A3 9F 2A F7 5E 32 84 D5 12 17 C0 4D 49 A0 61

密钥在 sub_3804BEC 中被逐字节复制到堆缓冲区后传入 sub_38013D0

2 AES 实现细节

S-Box 生成(sub_197C090 @ 0x197C118)
1
2
3
4
5
6
// 标准 AES S-Box 构造
byte_4045038[0] = 0x63; // sbox[0] = 99(标准值)
for (i = 1; i < 256; i++) {
inv = log_table[~exp_table[i+1] + 256]; // 有限域求逆
sbox[i] = affine(inv) ^ 0x63; // 仿射变换
}

仿射变换:((x>>7)|(2*x)) ^ ((x>>4)|(16*x)) ^ x ^ ((x>>6)|(4*x)) ^ ((x>>5)|(8*x)) — 与标准 AES 完全一致。

轮常量(Round Constants)
1
2
3
xmmword_5128E0 = 01 00 00 00  02 00 00 00  04 00 00 00  08 00 00 00
xmmword_512C30 = 10 00 00 00 20 00 00 00 40 00 00 00 80 00 00 00
qword_4045030 = 1B 00 00 00 36 00 00 00

即 Rcon = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36} — 标准 AES 轮常量

密钥扩展(sub_197C090)
  • key_bits=256 → rounds=14(*a1 = 14
  • 标准 AES-256 密钥调度:每 8 个字使用 RotWord + SubWord + Rcon,每 4 个字使用 SubWord
ECB 块加密(sub_197CC38 软件路径 / sub_197E104 硬件路径)
  • 软件路径:T-table 实现(dword_4046238 / dword_4046A38),标准 4 表查找
  • 硬件路径:ARM64 AES 指令(AESE / AESMC),通过 getauxval(AT_HWCAP) 检测硬件支持
  • byte_400B798 = 0xFF(初始未决定状态),首次调用时检测并缓存结果

3 CFB 模式实现(sub_197DB14)

反汇编关键循环(解密方向,0x197DC24):

1
2
3
4
5
6
7
8
9
10
loc_197DC24:
LDRB W8, [X21], #1 ; W8 = *ciphertext++
LDRB W9, [X22, X25] ; W9 = keystream[offset]
ADD W10, W25, #1
SUB X23, X23, #1 ; length--
EOR W9, W9, W8 ; plaintext = keystream[offset] ^ ciphertext
STRB W9, [X20], #1 ; *output++ = plaintext
STRB W8, [X22, X25] ; IV[offset] = ciphertext(CFB 反馈)
AND X25, X10, #0xF ; offset = (offset + 1) & 0xF
CBZ X23, exit

解密逻辑等价:

1
2
3
4
5
6
for each ciphertext_byte:
if offset == 0:
state = AES_ECB_encrypt(state) # 加密 IV 块
plaintext = state[offset] ^ ciphertext_byte
state[offset] = ciphertext_byte # CFB 反馈
offset = (offset + 1) % 16

加密方向同理,只是 XOR 顺序不同:

1
2
ciphertext = state[offset] ^ plaintext
state[offset] = ciphertext # 反馈密文

两个方向都使用 AES 正向加密(ENCRYPT)来加密 IV 块 — 这是 CFB 模式的标准行为。

4 完整性校验:标准 MD5

MD5 初始化向量(sub_196FB84):

1
xmmword_510EE0 = 01 23 45 67  89 AB CD EF  FE DC BA 98  76 54 32 10

{0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476}标准 MD5 IV

  • 使用 64 字节块处理
  • 标准 MD5 填充(0x80 + 零填充 + 64-bit 比特长度)
  • sub_196EF98 为 MD5 压缩函数

5 结论:标准 AES-256-CFB(128-bit 段)

通过对汇编级别的逐指令分析,确认使用的是标准 AES-256-CFB 加密,未做任何魔改。

解密

最终给出解密脚本如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
"""Decrypt Godot encrypted assets (.gdc / .gdextension / .scn / .tscn) and PCK index.

Algorithm: AES-256-CFB (128-bit segments) + MD5 integrity check.
Key: hardcoded 32-byte key from libgodot_android.so @ byte_400EF18.

Individual encrypted file format:
[0:16] MD5 digest of plaintext
[16:24] plaintext length (LE uint64)
[24:40] AES IV (16 bytes)
[40:] ciphertext (AES-256-CFB encrypted)

PCK (assets.sparsepck) index format:
[0:4] magic "GDPC"
[4:20] version (4 x uint32: major, minor, patch, revision)
[20:24] flags (uint32, bit0=encrypted, bit1=signed, bit2=extra)
[24:32] padding (uint64)
[32:40] seek_offset (uint64) -- for version 3
[seek:] nfiles (uint32)
[+4:] MD5(16) + plain_len(8) + IV(16) + encrypted_index
"""
from __future__ import annotations

import argparse
import hashlib
import struct
from pathlib import Path

from Crypto.Cipher import AES

KEY_HEX = "ce4df8753b59a5a39ade58ac07ef947a3da39f2af75e3284d51217c04d49a061"
ENCRYPTED_SUFFIXES = {".gdc", ".scn", ".gdextension", ".tscn"}


def aes256_cfb_decrypt(key: bytes, iv: bytes, ciphertext: bytes) -> bytes:
"""Standard AES-256-CFB decryption with 128-bit segments."""
cipher = AES.new(key, AES.MODE_CFB, iv=iv, segment_size=128)
return cipher.decrypt(ciphertext)


def decrypt_godot_file(src: Path, dst: Path, key: bytes) -> tuple[bool, str]:
"""Decrypt a single encrypted Godot file."""
data = src.read_bytes()
if len(data) < 40:
return False, "too short"

md5_expected = data[:16]
plain_len = struct.unpack("<Q", data[16:24])[0]
iv = data[24:40]
ciphertext = data[40:]

if plain_len > len(ciphertext) + 16 or plain_len > 100 * 1024 * 1024:
return False, "invalid plaintext length"

plaintext = aes256_cfb_decrypt(key, iv, ciphertext)[:plain_len]

if hashlib.md5(plaintext).digest() != md5_expected:
return False, "md5 mismatch"

dst.parent.mkdir(parents=True, exist_ok=True)
dst.write_bytes(plaintext)
return True, f"{plain_len} bytes"


def decrypt_pck_index(pck_path: Path, key: bytes) -> list[dict]:
"""Decrypt and parse a PCK file index, returning list of file entries."""
data = pck_path.read_bytes()
off = 0

magic = data[off:off + 4]; off += 4
if magic != b"GDPC":
raise ValueError(f"Invalid magic: {magic}")

major, minor, patch, rev = struct.unpack_from("<IIII", data, off); off += 16
flags = struct.unpack_from("<I", data, off)[0]; off += 4

if not (flags & 1):
print("PCK index is not encrypted")
return []

_padding = struct.unpack_from("<Q", data, off)[0]; off += 8

if major == 3:
seek_offset = struct.unpack_from("<Q", data, off)[0]; off += 8
off = seek_offset

nfiles = struct.unpack_from("<I", data, off)[0]; off += 4

md5_expected = data[off:off + 16]; off += 16
plain_len = struct.unpack_from("<Q", data, off)[0]; off += 8
iv = data[off:off + 16]; off += 16
ciphertext = data[off:off + plain_len]

plaintext = aes256_cfb_decrypt(key, iv, ciphertext)[:plain_len]

if hashlib.md5(plaintext).digest() != md5_expected:
raise ValueError("PCK index MD5 mismatch")

entries = []
idx = 0
for _ in range(nfiles):
name_len = struct.unpack_from("<I", plaintext, idx)[0]; idx += 4
name = plaintext[idx:idx + name_len].rstrip(b"\x00").decode("utf-8"); idx += name_len
file_offset = struct.unpack_from("<Q", plaintext, idx)[0]; idx += 8
file_size = struct.unpack_from("<Q", plaintext, idx)[0]; idx += 8
file_iv = plaintext[idx:idx + 16]; idx += 16
file_flags = struct.unpack_from("<I", plaintext, idx)[0]; idx += 4
entries.append({
"name": name,
"offset": file_offset,
"size": file_size,
"iv": file_iv.hex(),
"flags": file_flags,
"encrypted": bool(file_flags & 1),
})

return entries


def main() -> int:
parser = argparse.ArgumentParser(description="Decrypt Godot encrypted assets.")
parser.add_argument("--input", default="final/assets",
help="Encrypted assets root directory.")
parser.add_argument("--output", default="final/assets_decrypted",
help="Output directory for decrypted files.")
parser.add_argument("--pck", default=None,
help="Optional: path to .sparsepck to dump index.")
args = parser.parse_args()

key = bytes.fromhex(KEY_HEX)

if args.pck:
print(f"=== Decrypting PCK index: {args.pck} ===")
entries = decrypt_pck_index(Path(args.pck), key)
for e in entries:
enc_flag = " [ENC]" if e["encrypted"] else ""
print(f" {e['name']}: offset={e['offset']}, size={e['size']}{enc_flag}")
print(f"Total entries: {len(entries)}\n")

input_root = Path(args.input)
output_root = Path(args.output)
total = decrypted = skipped = 0

for src in sorted(input_root.rglob("*")):
if not src.is_file() or src.suffix.lower() not in ENCRYPTED_SUFFIXES:
continue

total += 1
rel = src.relative_to(input_root)
ok, reason = decrypt_godot_file(src, output_root / rel, key)
if ok:
decrypted += 1
print(f"[ok] {rel} -> {reason}")
else:
skipped += 1
print(f"[skip] {rel} -> {reason}")

print(f"\ntotal={total} decrypted={decrypted} skipped={skipped}")
return 0


if __name__ == "__main__":
raise SystemExit(main())

正当我以为决赛没有在获取 gd 源码上为难我时,gdre_tools 恢复出的结果给了我一巴掌

*.gd 文件还原

按照上一步解密得到的 gdc 文件,直接使用 gdre_tools 恢复得到的内容如图

image-20260418002418617

全是乱码,让我不得不重新回去检查还有什么加密或是另外的操作,这里卡了我很久,一直以为是加解密出了问题,结果发现是对 gdc 文件的 token 列表进行了混淆

让 ai 解析字节码结构,提取 token 列表,得到如下结果:

token 类型值与标准值对比

将游戏 .gdc 中的 token 类型值与 gdre_tools 期望的标准值对比:

游戏文件中的值 gdre_tools 解释为 实际应为
11 AMPERSAND ANNOTATION (@)
12 IDENTIFIER IDENTIFIER
51 CF_IF CF_IF
1 ANNOTATION NEWLINE
3 LITERAL INDENT
2 IDENTIFIER DEDENT

可以看到:

  • IDENTIFIER(12)和 CF_IF(51)等主体 token 看上去数值偏移了约 10-11
  • NEWLINE/INDENT/DEDENT 被搬到了 1/3/2 的位置
  • ANNOTATION 从标准的 1 移到了 11

至此确认:游戏引擎修改了 GDScript token 类型枚举值的分配

Token 编码格式

每个 token 占 8 字节:前 4 字节为 token 值,后 4 字节为行号。

token 值的编码:

1
2
3
4
bit 31                    bit 8  bit 7  bit 6        bit 0
┌──────────────────────────┬─────┬──────────────────────┐
│ data (24 bits) │ 0x80│ token_type (7 bits) │
└──────────────────────────┴─────┴──────────────────────┘
  • bit 7 固定为 1(0x80 标志位)
  • bit 0-6:token 类型(0-100,共 101 种)
  • bit 8-31:附加数据
    • 对于 TK_IDENTIFIER:标识符在标识符表中的索引
    • 对于 TK_CONSTANT:常量在常量表中的索引
    • 对于 TK_BUILT_IN_FUNC 等:内置函数编号
    • 对于 TK_NEWLINE:空(数据区为 0)

提取 token 类型的方法:

1
2
token_type = token_value & 0x7F       # 取低 7 位
data = (token_value >> 8) # 取高 24 位(去掉 0x80 标志后)

libgodot_android.so 再分析

通过搜索字符串 GDScriptParser ,首先注意到的是 off_3EC2548 ,这里一开始也看到过,但没有怎么在意,都是坑啊。

于是让 ai 顺着这个逻辑去找,给出以下分析

关键字扫描器分析(sub_14773E0)

此函数长度约 40KB(0x14773E0 - 0x147B8B4),是 GDScript tokenizer 的核心。

整体结构
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// 伪代码
void scan_keyword_or_identifier(tokenizer_t *tok) {
// 1. 读取当前字符
char first = current_char(tok);

// 2. 收集完整标识符字符串
while (is_identifier_char(next_char)) { ... }

// 3. 按首字母 switch-case 匹配关键字
switch (first) {
case 'a': // and, as, assert, await
if (match("and")) { set_token(tok, 0x14); return; }
if (match("as")) { set_token(tok, 0x3E); return; }
if (match("assert")) { set_token(tok, 0x3F); return; }
if (match("await")) { set_token(tok, 0x40); return; }
break;
case 'b': // break, breakpoint
if (match("break")) { set_token(tok, 0x38); return; }
if (match("breakpoint")) { set_token(tok, 0x41); return; }
break;
// ... 更多首字母分支
}

// 4. 不匹配任何关键字 → 标识符
set_token(tok, 0x0C); // IDENTIFIER

// 5. 特殊常量 true/false/null → 常量
// set_token(tok, 0x0D); // CONSTANT
}
ARM64 反汇编中的关键字匹配模式

while 关键字为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
; 检查 first_char == 'w' (0x77)
CMP W8, #0x77
B.NE next_case

; 比较剩余字符 "hile"
LDR X9, [X19, #identifier_buf]
LDR W10, ='h' | ('i'<<8) | ('l'<<16) | ('e'<<24)
LDR W11, [X9, #1]
CMP W10, W11
B.NE check_when

; 匹配成功 → set_token(tok, 0x37)
MOV W1, #0x37 ; token type = 55 (game enum)
MOV X0, X19
BL sub_14763A8 ; set_token()

按照标准 Godot 4.5,while 应为 TK_CF_WHILE = 49。但此处设置为 0x37 = 55,说明标准值 49 被映射到了游戏值 55,偏移 +6… 但实际分析后发现偏移是 std + 11 = 49 + 11 = 60

等等,让我重新对照。标准 token enum 中:

  • TK_CF_WHILE = 44(标准值)
  • 游戏中 while → 0x37 = 55
  • 55 - 44 = 11 → 偏移 +11

这验证了 std 40-87 范围内的 token 一律偏移 +11。

完整关键字映射提取结果

通过逐条分析 sub_14773E0 中所有 MOV W1, #imm; BL sub_14763A8 模式,提取出全部关键字→游戏 token 值的映射:

关键字 游戏值(hex) 游戏值(dec) 标准 Token 标准值 偏移
and 0x14 20 TK_OP_AND 10 +10
or 0x15 21 TK_OP_OR 11 +10
not 0x16 22 TK_OP_NOT 12 +10
in 0x48 72 TK_OP_IN 61 +11
is 0x49 73 TK_OP_IS 62 +11
as 0x3E 62 TK_CF_AS 51 +11
if 0x33 51 TK_CF_IF 40 +11
elif 0x34 52 TK_CF_ELIF 41 +11
else 0x35 53 TK_CF_ELSE 42 +11
for 0x36 54 TK_CF_FOR 43 +11
while 0x37 55 TK_CF_WHILE 44 +11
break 0x38 56 TK_CF_BREAK 45 +11
continue 0x39 57 TK_CF_CONTINUE 46 +11
pass 0x3A 58 TK_CF_PASS 47 +11
return 0x3B 59 TK_CF_RETURN 48 +11
match 0x3C 60 TK_CF_MATCH 49 +11
when 0x3D 61 TK_CF_WHEN 50 +11
assert 0x3F 63 TK_PR_ASSERT 52 +11
await 0x40 64 TK_PR_AWAIT 53 +11
breakpoint 0x41 65 TK_PR_BREAKPOINT 54 +11
class 0x42 66 TK_PR_CLASS 55 +11
class_name 0x43 67 TK_PR_CLASS_NAME 56 +11
const 0x44 68 TK_PR_CONST 57 +11
enum 0x45 69 TK_PR_ENUM 58 +11
extends 0x46 70 TK_PR_EXTENDS 59 +11
func 0x47 71 TK_PR_FUNC 60 +11
namespace 0x4A 74 TK_PR_NAMESPACE 63 +11
preload 0x4B 75 TK_PR_PRELOAD 64 +11
self 0x4C 76 TK_PR_SELF 65 +11
signal 0x4D 77 TK_PR_SIGNAL 66 +11
static 0x4E 78 TK_PR_STATIC 67 +11
super 0x4F 79 TK_PR_SUPER 68 +11
trait 0x50 80 TK_PR_TRAIT 69 +11
var 0x51 81 TK_PR_VAR 70 +11
void 0x52 82 TK_PR_VOID 71 +11
yield 0x53 83 TK_PR_YIELD 72 +11
PI 0x04 4 TK_CONST_PI 91 -87
TAU 0x05 5 TK_CONST_TAU 92 -87
INF 0x06 6 TK_CONST_INF 93 -87
NAN 0x07 7 TK_CONST_NAN 94 -87
标识符 0x0C 12 TK_IDENTIFIER 2 +10
true/false/null 0x0D 13 TK_LITERAL 3 +10
运算符/标点扫描器分析(sub_147B8B4)

此函数约 79KB(0x147B8B4 - 约 0x148FF00),处理所有非字母开头的 token。

结构
1
2
3
4
5
6
7
8
9
10
11
12
void scan_operator(tokenizer_t *tok) {
char ch = current_char(tok);
switch (ch) {
case '\n': handle_newline(); break;
case '@': handle_annotation(); break;
case '0'...'9': parse_number(); break;
case '"': case '\'': parse_string(); break;
case '+': case '-': case '*': case '/': // ... 运算符
case '(': case ')': case '[': case ']': // ... 括号
// ...
}
}
关键映射示例

!!= 为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
; 读取字符 '!'
CMP W8, #0x21 ; '!'
B.NE next_op

; 检查下一个字符是否为 '='
LDR W9, [X19, #next_char]
CMP W9, #0x3D ; '='
B.EQ handle_not_equal

; 单独的 '!' → BANG token
MOV W1, #0x13 ; game value 19
BL sub_14763A8
B done

handle_not_equal:
; '!=' → NOT_EQUAL token
MOV W1, #0x1B ; game value 27
BL sub_14763A8

标准值:TK_OP_BANG = 9, TK_OP_NOT_EQUAL = 17
游戏值:19 (9+10), 27 (17+10)
偏移:均为 +10

特殊 Token 处理函数
换行处理(sub_1475690)
1
2
3
; 处理 '\n' → NEWLINE token
MOV W1, #1 ; game value = 1
BL sub_14763A8

标准 TK_NEWLINE = 88,游戏值为 1。

注解处理(sub_14770DC)
1
2
3
; 处理 '@' → ANNOTATION token
MOV W1, #0x0B ; game value = 11
BL sub_14763A8

标准 TK_ANNOTATION = 1,游戏值为 11。

数字解析器(sub_147802C)
1
2
3
4
5
6
7
; 数字常量 → CONSTANT token
MOV W1, #0x0D ; game value = 13
BL sub_14763A8

; 数字解析错误 → ERROR token
MOV W1, #0x32 ; game value = 50
BL sub_14763A8

标准 TK_LITERAL = 3(游戏 13,偏移 +10)
标准 TK_ERROR = 98(游戏 50)

错误处理(sub_1476720)
1
2
3
; 所有错误路径都使用
MOV W1, #0x32 ; game value = 50
BL sub_14763A8

标准 TK_ERROR = 98,一律映射到游戏值 50。

完整映射关系

将所有从 .so 中提取的信息汇总,得到完整的游戏值 → 标准值映射表:

游戏值 标准值 Token 名称 分组
0 0 TK_EMPTY 不变
1 88 TK_NEWLINE 特殊搬迁
2 90 TK_DEDENT 特殊搬迁
3 89 TK_INDENT 特殊搬迁
4 91 TK_CONST_PI 特殊搬迁
5 92 TK_CONST_TAU 特殊搬迁
6 93 TK_CONST_INF 特殊搬迁
7 94 TK_CONST_NAN 特殊搬迁
8 95 TK_VCS_CONFLICT_MARKER 特殊搬迁
9 96 TK_BACKTICK 特殊搬迁
10 97 TK_QUESTION_MARK 特殊搬迁
11 1 TK_ANNOTATION 特殊搬迁
12 2 TK_IDENTIFIER +10 区段
13 3 TK_LITERAL +10 区段
14 4 TK_SELF +10 区段
+10 区段
49 39 TK_OP_ASSIGN_BIT_XOR +10 区段
50 98 TK_ERROR 特殊搬迁
51 40 TK_CF_IF +11 区段
52 41 TK_CF_ELIF +11 区段
+11 区段
98 87 TK_UNDERSCORE +11 区段
99 99 TK_EOF 不变
100 100 TK_MAX 不变

注意以上代码分析其实并不难,例如函数 sub_14773E0 内的代码混淆几乎没有,但如此繁杂的工作,交给 ai 还是太方便了

image-20260418111621250

还原 *.gdc 文件

最终根据上面的分析,写出还原代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
"""
Remap shuffled token types in game's .gdc files to standard Godot 4.5 token types.

The game's modified Godot engine uses a shuffled token enum. This script converts
game bytecode values → standard Godot 4.5 bytecode values so gdre_tools can decompile correctly.

Mapping:
- std 0-1 → game 0, 11 (EMPTY, ANNOTATION)
- std 2-39 → game 12-49 (shift +10: IDENTIFIER through ASSIGN_BIT_XOR)
- std 40-87 → game 51-98 (shift +11: keywords, brackets, punctuation, UNDERSCORE)
- std 88 (NEWLINE) → game 1
- std 89 (INDENT) → game 3
- std 90 (DEDENT) → game 2
- std 91-94 (PI/TAU/INF/NAN) → game 4-7
- std 95 (VCS_CONFLICT) → game 8
- std 96 (BACKTICK) → game 9
- std 97 (QUESTION_MARK) → game 10
- std 98 (ERROR) → game 50
- std 99 (EOF) → game 99
- std 100 (MAX) → game 100
"""

import struct
import zstandard
import os
import sys

# Build game_to_std mapping
game_to_std = {}

# std 0 (TK_EMPTY) → game 0
game_to_std[0] = 0

# std 1 (TK_ANNOTATION) → game 11
game_to_std[11] = 1

# std 2-39 → game 12-49 (shift +10)
for std in range(2, 40):
game_to_std[std + 10] = std

# std 40-87 → game 51-98 (shift +11)
for std in range(40, 88):
game_to_std[std + 11] = std

# Special relocations
game_to_std[1] = 88 # TK_NEWLINE
game_to_std[2] = 90 # TK_DEDENT
game_to_std[3] = 89 # TK_INDENT
game_to_std[4] = 91 # TK_CONST_PI
game_to_std[5] = 92 # TK_CONST_TAU
game_to_std[6] = 93 # TK_CONST_INF
game_to_std[7] = 94 # TK_CONST_NAN
game_to_std[8] = 95 # TK_VCS_CONFLICT_MARKER
game_to_std[9] = 96 # TK_BACKTICK
game_to_std[10] = 97 # TK_QUESTION_MARK
game_to_std[50] = 98 # TK_ERROR
game_to_std[99] = 99 # TK_EOF
game_to_std[100] = 100 # TK_MAX

ENCODE_FLAG_64 = 0x10000

def parse_and_remap_gdc(input_path, output_path):
with open(input_path, 'rb') as f:
data = f.read()

# Parse header
magic = data[:4]
assert magic == b'GDSC', f"Bad magic: {magic}"
version = struct.unpack_from('<I', data, 4)[0]
uncomp_size = struct.unpack_from('<I', data, 8)[0]
compressed = data[12:]

# Decompress
decompressed = bytearray(zstandard.ZstdDecompressor().decompress(compressed, max_output_size=uncomp_size))

# Parse structure header
off = 0
n_idents, n_consts, n_lines, n_tokens = struct.unpack_from('<4I', decompressed, off)

# Compute token section offset from the end (avoids parsing idents/consts)
# Layout: [header 16B] [idents] [consts] [line_starts n_lines*8] [column_info n_lines*8] [tokens n_tokens*8]
token_section_size = n_tokens * 8
off = len(decompressed) - token_section_size
remapped_count = 0
unknown_types = set()

for i in range(n_tokens):
tok_val = struct.unpack_from('<I', decompressed, off)[0]
tok_type = tok_val & 0x7F
tok_upper = tok_val & ~0x7F # preserve data and flag bits

if tok_type in game_to_std:
std_type = game_to_std[tok_type]
new_val = tok_upper | (std_type & 0x7F)
struct.pack_into('<I', decompressed, off, new_val)
if tok_type != std_type:
remapped_count += 1
else:
unknown_types.add(tok_type)

off += 8 # token_val + line_number

# Recompress
compressed_out = zstandard.ZstdCompressor().compress(bytes(decompressed))

# Write output
out_data = magic + struct.pack('<I', version) + struct.pack('<I', uncomp_size) + compressed_out
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, 'wb') as f:
f.write(out_data)

return remapped_count, unknown_types


def main():
base = os.path.dirname(os.path.abspath(__file__))
input_dir = os.path.join(base, "final", "assets_decrypted")
output_dir = os.path.join(base, "final", "assets_remapped")

if len(sys.argv) > 1:
input_dir = sys.argv[1]
if len(sys.argv) > 2:
output_dir = sys.argv[2]

total_files = 0
total_remapped = 0

for root, dirs, files in os.walk(input_dir):
for f in files:
if f.endswith('.gdc'):
in_path = os.path.join(root, f)
rel_path = os.path.relpath(in_path, input_dir)
out_path = os.path.join(output_dir, rel_path)

remapped, unknown = parse_and_remap_gdc(in_path, out_path)
total_files += 1
total_remapped += remapped
status = f" {rel_path}: {remapped} tokens remapped"
if unknown:
status += f" (UNKNOWN types: {unknown})"
print(status)

print(f"\nDone: {total_files} files, {total_remapped} total tokens remapped")


if __name__ == "__main__":
main()

至此,终于可以通过 gdre_tools 得到源码了!

*.gd 文件分析

Trigger1 — 示例 Flag(PART0)

1
2
3
4
5
6
7
func _kc():
var _bp = get_overlapping_bodies()
if _bp.size() < 1: return
if str(get_path()) != "/root/TownScene/Trigger1": return

var _lb = get_node("/root/TownScene/Label2")
_lb.text = "flag{sec2026_PART0_example}"

硬编码示例,仅在车辆碰撞 Trigger1 时显示。每帧 _process 调用 _kc() 检查是否有物体重叠。

注意:Trigger1 使用 _process 轮询而非 body_entered 信号,与其他 Trigger 不同。

同时初始化了 GameExtension.new(),每帧调用 _gx.Tick(),但未用于 flag 计算。

Trigger2 — Feistel 密码(PART1)

这是最重要的一个 flag 生成逻辑,完全用 GDScript 实现。

整体流程

1
2
3
4
5
6
7
Token hex (8 chars, e.g. "a3f7b2c1")
↓ _h2b()
PackedByteArray [0xa3, 0xf7, 0xb2, 0xc1] (4 bytes)
↓ _fe()
Feistel 加密 (8 轮)
↓ _b2h()
hex string → "flag{sec2026_PART1_XXXXXXXX}"

显然这个加密逻辑是可逆的,至此很容易就能写出 part1 部分 flag 生成与逆生成逻辑如下

token2flag1.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

// 轮函数 _rf 的实现
void round_function(uint8_t* block, const char* key, int key_len, int rn, uint8_t* out) {
for (int j = 0; j < 2; j++) {
uint8_t v = block[j] ^ (uint8_t)key[(j + rn) % key_len];
v = (v * 7 + rn) & 0xFF;
v = ((v << 3) | (v >> 5)) & 0xFF; // 循环左移 3 位
out[j] = v;
}
}

int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Usage: %s <8-char-hex-token>\n", argv[0]);
return 1;
}

// 解析输入的 16 进制字符串 (4字节)
uint8_t data[4];
for (int i = 0; i < 4; i++) {
sscanf(argv[1] + (i * 2), "%02hhx", &data[i]);
}

uint8_t lo[2] = {data[0], data[1]};
uint8_t hi[2] = {data[2], data[3]};
const char* key = "Sec2026_Godot";
int key_len = strlen(key);

// 8 轮 Feistel 网络
for (int rn = 0; rn < 8; rn++) {
uint8_t fv[2];
round_function(hi, key, key_len, rn, fv);

uint8_t nr[2];
nr[0] = lo[0] ^ fv[0];
nr[1] = lo[1] ^ fv[1];

// 状态更新: lo = hi, hi = nr
lo[0] = hi[0]; lo[1] = hi[1];
hi[0] = nr[0]; hi[1] = nr[1];
}

printf("flag{sec2026_PART1_%02x%02x%02x%02x}\n", lo[0], lo[1], hi[0], hi[1]);
return 0;
}

flag2token1.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

// 轮函数 _rf (与加密相同)
void round_function(uint8_t* block, const char* key, int key_len, int rn, uint8_t* out) {
for (int j = 0; j < 2; j++) {
uint8_t v = block[j] ^ (uint8_t)key[(j + rn) % key_len];
v = (v * 7 + rn) & 0xFF;
v = ((v << 3) | (v >> 5)) & 0xFF;
out[j] = v;
}
}

int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Usage: %s <8-char-hex-from-flag>\n", argv[0]);
return 1;
}

// 处理输入,支持直接输入 9e03da07 或完整的 flag 格式
char hex_input[9] = {0};
if (strlen(argv[1]) > 8) {
// 如果输入是 flag{..._9e03da07},提取最后8位
const char* last_underscore = strrchr(argv[1], '_');
if (last_underscore) strncpy(hex_input, last_underscore + 1, 8);
else {
// 兼容直接输入后缀的情况
strncpy(hex_input, argv[1], 8);
}
} else {
strncpy(hex_input, argv[1], 8);
}

uint8_t data[4];
for (int i = 0; i < 4; i++) {
sscanf(hex_input + (i * 2), "%02hhx", &data[i]);
}

// 初始状态 L 和 R
uint8_t lo_next[2] = {data[0], data[1]};
uint8_t hi_next[2] = {data[2], data[3]};
const char* key = "Sec2026_Godot";
int key_len = strlen(key);

// 逆向 8 轮 Feistel 网络 (从 rn=7 递减到 0)
for (int rn = 7; rn >= 0; rn--) {
// 在加密中: lo_next = hi_prev, hi_next = lo_prev ^ F(hi_prev)
// 逆推: hi_prev = lo_next, lo_prev = hi_next ^ F(lo_next)
uint8_t hi_prev[2] = {lo_next[0], lo_next[1]};

uint8_t fv[2];
round_function(hi_prev, key, key_len, rn, fv);

uint8_t lo_prev[2];
lo_prev[0] = hi_next[0] ^ fv[0];
lo_prev[1] = hi_next[1] ^ fv[1];

// 更新状态进入下一轮逆推
lo_next[0] = lo_prev[0]; lo_next[1] = lo_prev[1];
hi_next[0] = hi_prev[0]; hi_next[1] = hi_prev[1];
}

printf("Token: %02x%02x%02x%02x\n", lo_next[0], lo_next[1], hi_next[0], hi_next[1]);
return 0;
}

Trigger3 — Feistel 变体 + 原生扩展(PART2)

与 Trigger2 的差异

Trigger3 使用几乎相同的辅助函数(_h2b, _b2h, _xb, _rf),但轮函数有细微区别:

1
2
3
4
5
# Trigger2 的 _rf:
v = (v * 7 + _rn) & 255 # 系数 = 7

# Trigger3 的 _rf:
v = (v * (3 + 4) + _rn) & 255 # 系数 = 3 + 4 = 7(实际相同)

表面上 3 + 4 等于 7,两个轮函数在数学上完全等价。这可能是刻意的代码混淆。

原生扩展调用

1
2
3
4
5
6
7
var _gx = GameExtension.new()

func _w7(_ar):
var _raw = str(_lt.text).substr(7) # Token hex
var _buf = _raw.to_utf8_buffer() # 转为 bytes
var _rv = _gx.Process(_buf) # 调用原生方法!
_lb.text = "flag{sec2026_PART2_" + _rv + "}"

关键区别:Trigger3 虽然定义了 Feistel 相关函数,但实际 flag 计算并未调用 _fe,而是直接调用原生扩展 GameExtension.Process(_buf)

GameExtension 是通过 sec2026.gdextension 注册的 GDNative/GDExtension 类,其实现在 libgodot_android.so(或独立的 .so 库)中。Process() 方法接收 Token 的 UTF-8 bytes,返回处理后的字符串。

GDScript 中的 Feistel 函数可能是误导/干扰项,真正的逻辑在原生代码中。

Trigger4 — 纯原生扩展(PART3)

1
2
3
4
5
6
7
8
9
10
11
12
extends Area3D

var _gx

func _ready():
_gx = GameExtension.new()

func _process(_d):
_gx.Tick()
_rv = _d
_tv += _d * 2.0
_m3() # 浮动动画

Trigger4 的 GDScript 没有任何 flag 计算逻辑。它仅:

  1. 创建 GameExtension 实例
  2. 每帧调用 _gx.Tick()
  3. 执行 MeshInstance3D 的浮动/旋转/缩放动画

没有 body_entered 连接,没有读取 Token,也没有设置 Label2 文本。

这意味着 PART3 的整个 flag 生成逻辑都在原生扩展 GameExtension 中实现,可能通过 Tick() 检测碰撞或其他条件,直接操作场景节点。需要逆向 libgodot_android.so 中的 GameExtension 类才能分析 PART3。

反调试分析

.init_array 段中可以定位到sub_99094 函数,函数创建 3 个监控线程,都是用来进行反调试检测的

image-20260417192759312

1. ptrace 反调试技术

进入第一个函数,进去是有花指令的(arm架构的花指令有点不太熟悉了),这样去除可以正常反编译一部分代码

image-20260417193155130

去除之后,可以看到是经典的 fork + ptrace 自保护

image-20260417192652169

2. /proc 扫描 + Frida 检测

依旧点开发现有花指令,导致函数无法识别,一个一个 patch 即可。

类似如下,不一定对,但可以让 ida 反编译就行

image-20260417194949135

image-20260417195043815

反编译后如下,有非常严重的 ollvm 特征,非常乱,ida 插件 D810 无法去除,根本看不了一点,给 ai 分析吧

image-20260417201106018

找其中一个解密函数,写出代码来看看解密出来的字符串是啥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def decrypt(encrypted_data, key_64bit, length):

decrypted = bytearray(length)

import struct
key_bytes = struct.pack("<Q", key_64bit)

for i in range(length):
# 核心逻辑:数据 ^ 索引 ^ 密钥循环字节
decrypted[i] = encrypted_data[i] ^ (i & 0xFF) ^ key_bytes[i % 8]

return decrypted

key = 0x83B3BBD45AD4B5D4
length = 16
data = [ 0xFB, 0xC4, 0xA4, 0x36, 0xB3, 0x91, 0xC6, 0xE1, 0xB0, 0xDA, 0xF1, 0x25, 0xB9, 0xC5, 0xD6, 0x8C]
encrypted_data = bytearray(data)
decrypted_data = decrypt(encrypted_data, key, length)
print("Decrypted Data:", decrypted_data)

得到如下 Decrypted Data: bytearray(b'/proc/self/task\x00')

函数 sub_9A224 解密之后得到 /proc/self/task/%s/status

image-20260417201718592

函数 sub_9A9A8 解密得到 gmain

image-20260417201905813

这样很容易就猜出这个函数在做的两个反调试

  1. TracerPid 检测 :遍历 /proc/self/task/*/status,解析 TracerPid 字段,非零则说明被调试。

  2. Frida 检测 :在任务列表中搜索名为 gmain 的线程。Frida 注入后会启动 GLib 主循环线程,其默认名称为 gmain

3. 动态库扫描 + 计时检测 + CRC32校验

依旧有花,不多说了,反编译得到

image-20260417211549863

重点在函数 sub_96A00 中,依旧有花,去除得到

image-20260417211833141

dl_iterate_phdr 这是一个系统库函数,用于让程序遍历自己加载的所有共享对象。

sub_9EFB4 是回调函数,系统每找到一个加载的库(如 libc.so , libart.so 或应用自身的 .so),就会调用一次这个回调函数。

函数内是一个字符串对比,不难猜这里应该在检测 frida 或其他可疑字符串。

image-20260417212049733

再进入函数 sub_9AF98 ,也是有花,去除得到

image-20260419195936574

这里读取了时间,将秒和纳秒转换为微秒,结果存入全局变量 qword_1834B8

注意这里循环中的值 0xEDB88320 ,这是CRC-32标准中最为经典的多项式反转值,这里很明显能猜出是在遍历 .text 段,计算 CRC 值做校验

4. GDExtension 回调时间检测

但我 patch 掉上面三个检测后依然出现正常启动但是过一段时间再闪退的情况,这里卡了我很久,直到我看到全局变量 qword_1834B8 并不止 sub_9AD68 一个函数调用时,我就知道我找对了

image-20260417220104133

进入函数 sub_9AD68 ,去花得到

image-20260417220357695

果然是获取了时间,还计算了时间差

往上交叉定位到函数 sub_97B6C ,去花得到

image-20260417221357438

其中几个解密函数解密得到的字符串为如下

image-20260417222212543

那根据我们前面得到的 *.gd 源码,以及初赛的经验,这大概率是 PART2 flag生成过程中的某一个流程了

5. mprotect 权限恢复

这个发现是后面才找到的,因为即便我过了上面所有反调试,依然存在无法修改 .text 问题。我一直以为是还有别的我没有发现的检测,后来在编写 wp 总结时,发现了这个问题。

我发现函数 sub_9B7D8 的汇编比伪代码要长的多,于是我往下翻找,看到了这

image-20260419201421877

而这段代码交叉引用来自如下

image-20260419201854120

原来如此,是因为这里本要执行一个函数表,而我为了看下面的伪代码将 br 跳转 patch 了,导致没能在伪代码看到这里的逻辑。

言归正传,这里会将 .text 段权限恢复为 RX ,即删除写权限,而 frida 进行Interceptor.attach() 时,需要先 mprotect(RWX) 才能写入 hook。

我原以为这就是为什么我使用 frida 尝试写入 hook 时会崩的原因了,但是这个问题时有时无,让我有点不太清楚了。

bypass代码

本来以为很简单的,结果因为代码完整性校验,愣是整了很久,但还是有时能绕过代码完整性校验,有时不行,最后还是决定给出如下代码,只修改数据段,不修改 .text 段。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
// bypass_v9.js — 零代码修改 (纯数据段方案)
//
// 崩溃历史:
// v3: 仅 hook pthread_create → 12s 后 sub_9AD68 时间检测 TRAP
// v5: NOP 通用 SVC wrapper → 游戏 read() 断裂 → NULL mutex
// v6: NOP 内联 SVC → CFF 状态机走错 → NULL mutex
// v8: RET patch sub_9AD68 → NULL mutex (同 v5/v6 崩溃模式)
//
// 关键发现:
// 所有修改 .text 的版本 (v5/v6/v8) 都崩溃在相同位置 (Godot 触摸事件 NULL mutex),
// 而仅 hook 的 v3 崩溃在完全不同的位置 (sub_9AD68 本身)。
// → libsec2026.so 有代码完整性校验, 检测到 .text 被修改后破坏 Godot 引擎状态
//
// v9 策略 — 完全不修改 libsec2026.so 的 .text 段:
// 1. pthread_create hook 阻断 3 个反调试线程 (hook libc.so, 不碰 libsec2026.so)
// 2. 定时刷新 qword_1834B8 (数据段, 非代码段) 中的时间戳
// sub_9AD68 每次被 Godot 调用时, 读到的是 ~2s 前的时间戳
// → delta < 10s → 检测永远通过, 函数正常执行全部逻辑 (包括游戏功能)
// 3. 不 RET, 不 NOP, 不 Interceptor.replace — 零字节修改
//
// 时间戳格式 (从反汇编还原):
// clock_gettime(CLOCK_MONOTONIC) → tv_sec * 1000000 + tv_nsec / 1000
// 即 CLOCK_MONOTONIC 微秒, 存储在 qword_1834B8
//
// 用法:
// frida -U -f com.tencent.ACE.gamesec2026.final -l bypass_v9.js --no-pause

"use strict";

var LIB_NAME = "libsec2026.so";

var ANTI_DEBUG_THREAD_OFFSETS = [
0x9C654, // Thread 1: fork + ptrace
0x9CDC4, // Thread 2: /proc scan + Frida detection
0x9B7D8, // Thread 3: timing + dl_iterate_phdr
];

// qword_1834B8: sub_9AD68 和 sub_9AF98 共用的时间戳存储 (数据段, 非 .text)
var TIMESTAMP_OFFSET = 0x1834B8;

var libBase = null;
var blockedCount = 0;

var dummyThread = new NativeCallback(function (_arg) {
return ptr(0);
}, "pointer", ["pointer"]);

// ── 时间戳刷新器 ────────────────────────────────────────────────
var clock_gettime_fn = new NativeFunction(
Module.findExportByName("libc.so", "clock_gettime"),
"int", ["int", "pointer"]
);
var tsBuf = Memory.alloc(16);

function updateTimestamp() {
if (libBase === null) return;
try {
clock_gettime_fn(1, tsBuf); // CLOCK_MONOTONIC
var sec = tsBuf.readU64().toNumber(); // tv_sec
var nsec = tsBuf.add(8).readU64().toNumber(); // tv_nsec
var us = sec * 1000000 + Math.floor(nsec / 1000); // 微秒
libBase.add(TIMESTAMP_OFFSET).writeU64(us);
} catch (e) { }
}

// ── 主逻辑 ──────────────────────────────────────────────────────
function main() {
console.log("[*] bypass_v9: zero code modification (data-only)");

// 每 2 秒刷新时间戳, 确保 sub_9AD68 的 10s 超时永远不触发
// (安全余量 = 10 - 2 = 8 秒)
setInterval(updateTimestamp, 2000);

var pthread_create = Module.findExportByName("libc.so", "pthread_create");
Interceptor.attach(pthread_create, {
onEnter: function (args) {
var start_routine = args[2];

// ── 快速路径: libBase 已知 ──
if (libBase !== null) {
var offset = start_routine.sub(libBase).toInt32();
if (offset > 0 && offset < 0x200000 &&
ANTI_DEBUG_THREAD_OFFSETS.indexOf(offset) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount +
" @ 0x" + offset.toString(16));
args[2] = dummyThread;
return;
}
}

// ── 慢速路径: 首次检测 libsec2026.so ──
try {
var mod = Process.findModuleByAddress(start_routine);
if (mod !== null && mod.name === LIB_NAME) {
if (libBase === null) {
libBase = mod.base;
console.log("[*] " + LIB_NAME + " base: " + mod.base);

// 立即刷新时间戳, 不等 setInterval
updateTimestamp();
console.log("[+] Timestamp keeper active (qword_1834B8, every 2s)");
}

var offset = start_routine.sub(mod.base).toInt32();
if (ANTI_DEBUG_THREAD_OFFSETS.indexOf(offset) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount +
" @ 0x" + offset.toString(16));
args[2] = dummyThread;
}
}
} catch (e) { }
}
});

console.log("[+] Hooks ready, waiting for " + LIB_NAME + "...");
}

main();

PART2 flag

刚刚我们在第四处反调试分析时,发现了函数 sub_97B6C 如下

image-20260418144958681

在这个函数内可以识别出 GameExtension 的方法绑定,其中 Process 绑定到 sub_97704,跟进分析

进去 patch 掉最后一行跳转,得到如下

image-20260418145448916

继续跟进函数 sub_A936C ,去花得到如下

image-20260418145603765

这里同时加载了两个 16 字节常量,那这就非常可疑了,那接下来交给 ai 分析

sub_A936C(入口点)

sub_A936C 反编译很关键:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void __fastcall sub_A936C(__int64 input)
{
v5 = 0;
v6 = 0;
v7 = 0;

memcpy(&v5, input, 8);
memcpy(&v6, input, 8);

v2 = xmmword_58550;
v3 = xmmword_58600;

sub_A7900(ctx, &v2, &v3);
sub_A7194(ctx, &v5, 16);

// 后面把 16 字节密文格式化成 32 hex 字符
}

对应汇编里可以看到两个连续的 memcpy

1
2
0xA93E4  memcpy(&v5, input, 8)
0xA93F8 memcpy(&v6, input, 8)

这说明输入 block 不是 token 的 hex 解析值,而是:

1
block = token_ascii + token_ascii

比如:

1
2
token = "12345678"
block = 31 32 33 34 35 36 37 38 31 32 33 34 35 36 37 38

然后它加载两个 16 字节常量:

1
2
0x58550 key = 2c 7e 15 16 18 ae c2 a1 ab f7 15 88 09 cf 4f 3c
0x58600 iv = 2c 7e 15 16 1a 2d e4 71 cc ff 11 cf 04 88 2f 1d

整理成 hex:

1
2
KEY = 2c7e151618aec2a1abf7158809cf4f3c
IV = 2c7e15161a2de471ccff11cf04882f1d

调用关系变成:

1
2
3
4
5
sub_A936C
1. duplicate token ASCII bytes to 16 bytes
2. sub_A7900(ctx, KEY, IV)
3. sub_A7194(ctx, block, 16)
4. output block as 32 lowercase hex chars

到这里可以先形成一个大假设:

1
PART2 = custom_encrypt(token_ascii || token_ascii)

后面要验证的是 sub_A7900 初始化了什么、sub_A7194 做了什么、sub_A8D44 是什么加密核心。

sub_A7194:不是普通 CBC XOR

sub_A7194(ctx, block, 16) 看起来像 CBC 单块加密。它内部做了三件事:

1
2
3
4
prev = ctx + 0xc0
sub_AA9B0(block, prev)
sub_A8D44(block, ctx)
memcpy(ctx + 0xc0, block, 16)

ctx + 0xc0 正好是 sub_A7900 保存 IV 的位置。因此第一眼会以为:

1
2
block ^= IV
block = encrypt(block)

但这个判断不完整。必须继续看 sub_AA9B0

sub_AA9B0:用可控输入验证

为了确认 sub_AA9B0(dst, src) 的真实行为,对NativeFunction 做单函数测试:

1
2
dst = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
src = 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

输出是:

1
1f 10 1f 10 1f 10 1f 10 1f 10 1f 10 1f 10 1f 10

逐字节反推:

1
out[i] = dst[i] ^ src[index]

可以得到 index 规律:

1
2
3
4
5
6
7
8
9
i = 0  -> src[15]
i = 1 -> src[1]
i = 2 -> src[13]
i = 3 -> src[3]
i = 4 -> src[11]
i = 5 -> src[5]
...
i = 14 -> src[1]
i = 15 -> src[15]

也就是:

1
2
3
4
if i is even:
block[i] ^= IV[15 - i]
else:
block[i] ^= IV[i]

这是 PART2 很关键的一个小坑。它不是标准 block[i] ^= IV[i]

所以 sub_A7194 的输入预处理是:

1
2
3
4
5
6
raw = token_ascii + token_ascii
for i in range(16):
if i & 1:
block[i] = raw[i] ^ IV[i]
else:
block[i] = raw[i] ^ IV[15 - i]

12345678 为例:

1
2
raw = 31323334353637383132333435363738
pre_xor_result = 2c4cbb22fa1bc84940cd1efb23be4925

sub_A7900:上下文、S-box、轮密钥

sub_A7900(ctx, key, iv) 做初始化:

1
2
sub_A7DE8(ctx, key)       生成 12 组轮密钥,同时初始化 S-box
memcpy(ctx + 0xc0, iv, 16)

ctx 的布局可以理解成:

1
2
ctx[0x000 : 0x0c0] = 12 * 16 字节 round keys
ctx[0x0c0 : 0x0d0] = IV

S-box 生成

S-box 全局位置:

1
0x183700

静态文件里这块初始不是最终 S-box,运行 sub_A9884 后才生成。分析 sub_A9884 -> sub_A8C8C -> sub_A7598 / sub_A88F4 / sub_A96F0 可以得到:

  1. 有限域乘法不是 AES 的 0x11b,而是 0x171
  2. xtime 高位溢出时异或 0x71,因为 0x171 & 0xff = 0x71
  3. 非零字节先求乘法逆元,即 x^254
  4. 做类似 AES affine 的旋转异或,但常量和末尾旋转不同。

公式:

1
2
3
inv = 0 if x == 0 else gf_pow(x, 254, mod=0x171)
y = inv ^ rol8(inv, 1) ^ rol8(inv, 2) ^ rol8(inv, 3) ^ rol8(inv, 4) ^ 0x8f
sbox[x] = rol8(y, 5)

验证 S-box 前 16 字节:

1
f1 12 5d c6 a7 8a 6a 48 da 0f 11 3b 3c b3 2d 27

如果你生成的 S-box 前 16 字节不是这个值,后面所有结果都会错。

轮密钥生成

原始 key:

1
2c7e151618aec2a1abf7158809cf4f3c

轮密钥一共 12 组,每组 16 字节。前 16 字节就是原始 key:

1
RK[0] = 2c7e151618aec2a1abf7158809cf4f3c

RCON 表在 0x652c9

1
a7 a6 a5 a3 af b7 87 e7 27 d6 45 fc 10 cd 36 47

key schedule 是 AES-128 风格的 4 字节 word 扩展,但变体很明显:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
rk[0:16] = KEY

for word_i in range(4, 48):
prev = rk[(word_i - 1) * 4 : word_i * 4]

if word_i % 4 == 0:
b0, b1, b2, b3 = prev
temp = [
nibble_swap(SBOX[b3]) ^ 0xa7 ^ RCON[word_i // 4],
nibble_swap(SBOX[b0]),
nibble_swap(SBOX[b1]),
nibble_swap(SBOX[b2]),
]
else:
temp = prev

rk[word_i] = rk[word_i - 4] ^ temp

这里的 nibble_swap(x) 是:

1
((x >> 4) | ((x << 4) & 0xff)) & 0xff

前两组轮密钥验证:

1
2
RK[0] = 2c7e151618aec2a1abf7158809cf4f3c
RK[1] = 838e2abf9b20e81e30d7fd963918b2aa

完整 12 组轮密钥可以由脚本 part2_algorithms.py 自动生成,不建议手抄。

sub_A8D44:核心分组加密

sub_A8D44(block, ctx) 是真正的 16 字节分组加密。

它的结构是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sub_A7944(block)          前置 XOR mask
sub_AAB64(0, block, ctx) AddRoundKey 0

for round = 1..10:
sub_A82C8(block) SubBytes
sub_A8F00(block, round) RoundTweak
sub_AADE8(block) ShiftRows
sub_A6F20(block) MixColumns
sub_AAB64(round, block, ctx) AddRoundKey

round = 11:
sub_A82C8(block)
sub_A8F00(block, 11)
sub_AADE8(block)
sub_AAB64(11, block, ctx)

sub_A84A4(block) 后置 XOR mask

这很像 AES,但每个部件都被改过。

1. sub_A7944 和 sub_A84A4:前后 XOR mask

用输入:

1
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f

单独调用 sub_A7944,输出:

1
de 4e 88 34 c5 6e 5f e5 7b a4 15 9f b4 0b db 4d

输出和输入异或得到:

1
de 4f 8a 37 c1 6b 59 e2 73 ad 1f 94 b8 06 d5 42

这个常量就在 0x58510

1
PRE_XOR = de4f8a37c16b59e273ad1f94b806d542

同理,sub_A84A4 是后置 XOR:

1
POST_XOR = 7ce32891a65df014bb6907d84a35ec80

因此:

1
2
3
state[i] ^= PRE_XOR[i]       # 加密开始
...
out[i] = state[i] ^ POST_XOR[i] # 加密结束

2. sub_AAB64:AddRoundKey 不是普通 XOR

直觉上 AddRoundKey 应该是:

1
state[i] ^= round_key[round][i]

但单独测试 sub_AAB64(0, state, ctx),输入 00..0f,输出:

1
2c 7e 15 16 1c aa c6 a5 a3 ff 1d 80 05 c3 43 30

如果只异或 RK[0],结果对不上。继续看汇编能发现额外异或:

1
state[i] ^= RK[round * 16 + i] ^ ((i % 4) + round * 0x5b)

注意这个操作本身可逆,因为它只是 XOR。逆向时再做一次同样的 AddRoundKey 即可。

3. sub_A82C8:SubBytes

sub_A82C8 对每个字节查 SBOX[byte]

00..0f 验证:

1
2
input  = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
output = f1 12 5d c6 a7 8a 6a 48 da 0f 11 3b 3c b3 2d 27

这正好等于 S-box 的前 16 项。

4. sub_A8F00:RoundTweak

这是另一个容易写错的地方。

它先按矩阵方式重排,再异或一个 LCG 序列。用 00..0f、round 1 单独验证:

1
native output = e8 d3 04 2f f1 2a dd b6 9a e1 36 bd a3 38 0f 44

LCG 初始值:

1
x = (round * 0x9d + 0x47) & 0xff

每处理一个字节更新:

1
x = (x * 0xc3 + 0x2f) & 0xff

round 1 的 LCG 序列是:

1
e4 db 00 2f fc 23 d8 b7 94 eb 30 bf ac 33 08 47

native_output ^ lcg_sequence 得:

1
0c 08 04 00 0d 09 05 01 0e 0a 06 02 0f 0b 07 03

这个结果说明矩阵重排不是简单反列,而是:

1
tmp[c * 4 + r] = state[(3 - r) * 4 + c]

完整公式:

1
2
3
4
5
6
7
8
9
10
11
12
def round_tweak(state, round_no):
tmp = [0] * 16
for c in range(4):
for r in range(4):
tmp[c * 4 + r] = state[(3 - r) * 4 + c]

x = (round_no * 0x9d + 0x47) & 0xff
out = []
for b in tmp:
out.append(b ^ x)
x = (x * 0xc3 + 0x2f) & 0xff
return out

如果把这里写成 state[(3 - c) * 4 + r],测试向量会全部错误。

5. sub_AADE8:ShiftRows

sub_AADE8 是行移位,映射为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
s[13] = state[9]
s[9] = state[5]
s[5] = state[1]
s[1] = state[13]

s[2] = state[6]
s[6] = state[10]
s[10] = state[14]
s[14] = state[2]

s[3] = state[11]
s[11] = state[3]
s[7] = state[15]
s[15] = state[7]

00..0f 验证:

1
00 0d 06 0b 04 01 0a 0f 08 05 0e 03 0c 09 02 07

6. sub_A6F20:MixColumns

sub_A6F20 对每 4 字节一列做 GF(2^8) 矩阵乘法。有限域仍然是 0x171

矩阵是:

1
2
3
4
[ 6 3 5 2 ]
[ 2 6 3 5 ]
[ 5 2 6 3 ]
[ 3 5 2 6 ]

代码形式:

1
2
3
4
r0 = mul(s0,6) ^ mul(s1,3) ^ mul(s2,5) ^ mul(s3,2)
r1 = mul(s0,2) ^ mul(s1,6) ^ mul(s2,3) ^ mul(s3,5)
r2 = mul(s0,5) ^ mul(s1,2) ^ mul(s2,6) ^ mul(s3,3)
r3 = mul(s0,3) ^ mul(s1,5) ^ mul(s2,2) ^ mul(s3,6)

00..0f 验证:

1
0f 0f 0b 0b 07 07 03 03 1f 1f 1b 1b 17 17 13 13

算法汇总

PART2 后缀生成流程如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def part2_suffix(token):
raw = token_ascii + token_ascii

# sub_AA9B0, 注意不是普通 IV XOR
block = []
for i in range(16):
if i & 1:
block.append(raw[i] ^ IV[i])
else:
block.append(raw[i] ^ IV[15 - i])

# sub_A8D44
state = block ^ PRE_XOR
state = add_round_key(state, 0)

for r in range(1, 11):
state = sub_bytes(state)
state = round_tweak(state, r)
state = shift_rows(state)
state = mix_columns(state)
state = add_round_key(state, r)

state = sub_bytes(state)
state = round_tweak(state, 11)
state = shift_rows(state)
state = add_round_key(state, 11)
state = state ^ POST_XOR

return hex(state)

至此我们可以写出 part2 的 flag 生成与逆生成逻辑如下

token2flag2.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

// 常量定义
uint8_t KEY[16] = {0x2c,0x7e,0x15,0x16,0x18,0xae,0xc2,0xa1,0xab,0xf7,0x15,0x88,0x09,0xcf,0x4f,0x3c};
uint8_t IV[16] = {0x2c,0x7e,0x15,0x16,0x1a,0x2d,0xe4,0x71,0xcc,0xff,0x11,0xcf,0x04,0x88,0x2f,0x1d};
uint8_t PRE_XOR[16] = {0xde,0x4f,0x8a,0x37,0xc1,0x6b,0x59,0xe2,0x73,0xad,0x1f,0x94,0xb8,0x06,0xd5,0x42};
uint8_t POST_XOR[16] = {0x7c,0xe3,0x28,0x91,0xa6,0x5d,0xf0,0x14,0xbb,0x69,0x07,0xd8,0x4a,0x35,0xec,0x80};
uint8_t RCON[16] = {0xa7,0xa6,0xa5,0xa3,0xaf,0xb7,0x87,0xe7,0x27,0xd6,0x45,0xfc,0x10,0xcd,0x36,0x47};

uint8_t SBOX[256];
uint8_t ROUND_KEYS[12 * 16];

// GF(2^8) 运算
uint8_t xtime(uint8_t x) {
uint16_t res = (uint16_t)x << 1;
if (res & 0x100) res ^= 0x171;
return res & 0xFF;
}

uint8_t gf_mul(uint8_t a, uint8_t b) {
uint8_t out = 0;
for (int i = 0; i < 8; i++) {
if (b & 1) out ^= a;
a = xtime(a);
b >>= 1;
}
return out;
}

uint8_t gf_pow(uint8_t a, uint8_t e) {
uint8_t out = 1;
while (e) {
if (e & 1) out = gf_mul(out, a);
a = gf_mul(a, a);
e >>= 1;
}
return out;
}

uint8_t rol8(uint8_t x, int n) { return ((x << n) | (x >> (8 - n))) & 0xFF; }

void make_sbox() {
for (int x = 0; x < 256; x++) {
uint8_t inv = (x == 0) ? 0 : gf_pow(x, 254);
uint8_t y = inv ^ rol8(inv, 1) ^ rol8(inv, 2) ^ rol8(inv, 3) ^ rol8(inv, 4) ^ 0x8F;
SBOX[x] = rol8(y, 5);
}
}

void make_round_keys() {
memcpy(ROUND_KEYS, KEY, 16);
for (int i = 4; i < 48; i++) {
uint8_t temp[4];
memcpy(temp, &ROUND_KEYS[(i - 1) * 4], 4);
if (i % 4 == 0) {
uint8_t b0 = temp[0], b1 = temp[1], b2 = temp[2], b3 = temp[3];
temp[0] = ((SBOX[b3] >> 4) | (SBOX[b3] << 4)) ^ 0xA7 ^ RCON[i / 4];
temp[1] = (SBOX[b0] >> 4) | (SBOX[b0] << 4);
temp[2] = (SBOX[b1] >> 4) | (SBOX[b1] << 4);
temp[3] = (SBOX[b2] >> 4) | (SBOX[b2] << 4);
}
for (int j = 0; j < 4; j++)
ROUND_KEYS[i * 4 + j] = ROUND_KEYS[(i - 4) * 4 + j] ^ temp[j];
}
}

void round_tweak(uint8_t *state, int rn) {
uint8_t tmp[16];
for (int c = 0; c < 4; c++)
for (int r = 0; r < 4; r++)
tmp[c * 4 + r] = state[(3 - r) * 4 + c];
uint8_t x = (rn * 0x9D + 0x47) & 0xFF;
for (int i = 0; i < 16; i++) {
state[i] = tmp[i] ^ x;
x = (x * 0xC3 + 0x2F) & 0xFF;
}
}

void shift_rows(uint8_t *s) {
uint8_t t[16]; memcpy(t, s, 16);
s[13]=t[9]; s[9]=t[5]; s[5]=t[1]; s[1]=t[13];
s[2]=t[6]; s[6]=t[10]; s[10]=t[14]; s[14]=t[2];
s[3]=t[11]; s[11]=t[3]; s[7]=t[15]; s[15]=t[7];
}

void mix_columns(uint8_t *s) {
uint8_t mat[4][4] = {{6,3,5,2},{2,6,3,5},{5,2,6,3},{3,5,2,6}};
uint8_t tmp[16]; memcpy(tmp, s, 16);
for (int c = 0; c < 4; c++) {
for (int r = 0; r < 4; r++) {
uint8_t v = 0;
for (int k = 0; k < 4; k++) v ^= gf_mul(mat[r][k], tmp[c * 4 + k]);
s[c * 4 + r] = v;
}
}
}

void add_round_key(uint8_t *state, int rn) {
for (int i = 0; i < 16; i++)
state[i] ^= ROUND_KEYS[rn * 16 + i] ^ (((i & 3) + rn * 0x5B) & 0xFF);
}

int main(int argc, char *argv[]) {
if (argc < 2) return 1;
make_sbox(); make_round_keys();
uint8_t block[16];
for (int i = 0; i < 8; i++) {
uint8_t c = argv[1][i];
block[i] = block[i+8] = c;
}
for (int i = 0; i < 16; i++)
block[i] ^= (i & 1) ? IV[i] : IV[15 - i];

for (int i = 0; i < 16; i++) block[i] ^= PRE_XOR[i];
add_round_key(block, 0);
for (int r = 1; r <= 10; r++) {
for (int i = 0; i < 16; i++) block[i] = SBOX[block[i]];
round_tweak(block, r); shift_rows(block); mix_columns(block); add_round_key(block, r);
}
for (int i = 0; i < 16; i++) block[i] = SBOX[block[i]];
round_tweak(block, 11); shift_rows(block); add_round_key(block, 11);

printf("flag{sec2026_PART2_");
for (int i = 0; i < 16; i++) printf("%02x", block[i] ^ POST_XOR[i]);
printf("}\n");
return 0;
}

flag2token2.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

// --- 全局常量 ---
uint8_t KEY[16] = {0x2c,0x7e,0x15,0x16,0x18,0xae,0xc2,0xa1,0xab,0xf7,0x15,0x88,0x09,0xcf,0x4f,0x3c};
uint8_t IV[16] = {0x2c,0x7e,0x15,0x16,0x1a,0x2d,0xe4,0x71,0xcc,0xff,0x11,0xcf,0x04,0x88,0x2f,0x1d};
uint8_t PRE_XOR[16] = {0xde,0x4f,0x8a,0x37,0xc1,0x6b,0x59,0xe2,0x73,0xad,0x1f,0x94,0xb8,0x06,0xd5,0x42};
uint8_t POST_XOR[16] = {0x7c,0xe3,0x28,0x91,0xa6,0x5d,0xf0,0x14,0xbb,0x69,0x07,0xd8,0x4a,0x35,0xec,0x80};
uint8_t RCON[16] = {0xa7,0xa6,0xa5,0xa3,0xaf,0xb7,0x87,0xe7,0x27,0xd6,0x45,0xfc,0x10,0xcd,0x36,0x47};

uint8_t SBOX[256];
uint8_t INV_SBOX[256];
uint8_t ROUND_KEYS[12 * 16];

// --- GF(2^8) 基础数学运算 ---
uint8_t xtime(uint8_t x) {
uint16_t res = (uint16_t)x << 1;
if (res & 0x100) res ^= 0x171; // 自定义多项式
return res & 0xFF;
}

uint8_t gf_mul(uint8_t a, uint8_t b) {
uint8_t out = 0;
for (int i = 0; i < 8; i++) {
if (b & 1) out ^= a;
a = xtime(a);
b >>= 1;
}
return out;
}

uint8_t gf_pow(uint8_t a, uint8_t e) {
uint8_t out = 1;
while (e) {
if (e & 1) out = gf_mul(out, a);
a = gf_mul(a, a);
e >>= 1;
}
return out;
}

uint8_t rol8(uint8_t x, int n) {
return ((x << n) | (x >> (8 - n))) & 0xFF;
}

// --- 初始化 S盒 与 密钥扩展 ---
void make_sbox() {
for (int x = 0; x < 256; x++) {
uint8_t inv = (x == 0) ? 0 : gf_pow(x, 254);
uint8_t y = inv ^ rol8(inv, 1) ^ rol8(inv, 2) ^ rol8(inv, 3) ^ rol8(inv, 4) ^ 0x8F;
SBOX[x] = rol8(y, 5);
INV_SBOX[SBOX[x]] = x;
}
}

void make_round_keys() {
memcpy(ROUND_KEYS, KEY, 16);
for (int i = 4; i < 48; i++) {
uint8_t temp[4];
memcpy(temp, &ROUND_KEYS[(i - 1) * 4], 4);
if (i % 4 == 0) {
uint8_t b0 = temp[0], b1 = temp[1], b2 = temp[2], b3 = temp[3];
// nibble_swap(SBOX[...])
temp[0] = ((SBOX[b3] >> 4) | (SBOX[b3] << 4)) ^ 0xA7 ^ RCON[i / 4];
temp[1] = (SBOX[b0] >> 4) | (SBOX[b0] << 4);
temp[2] = (SBOX[b1] >> 4) | (SBOX[b1] << 4);
temp[3] = (SBOX[b2] >> 4) | (SBOX[b2] << 4);
}
for (int j = 0; j < 4; j++)
ROUND_KEYS[i * 4 + j] = ROUND_KEYS[(i - 4) * 4 + j] ^ temp[j];
}
}

// --- 逆向轮变换函数 ---
void add_round_key(uint8_t *state, int rn) {
for (int i = 0; i < 16; i++)
state[i] ^= ROUND_KEYS[rn * 16 + i] ^ (((i & 3) + rn * 0x5B) & 0xFF);
}

void inv_round_tweak(uint8_t *state, int rn) {
uint8_t x = (rn * 0x9D + 0x47) & 0xFF;
uint8_t seq[16];
for (int i = 0; i < 16; i++) {
seq[i] = x;
x = (x * 0xC3 + 0x2F) & 0xFF;
}
uint8_t out[16];
// 逆向转置: 原本是 out[c*4+r] = state[(3-r)*4+c]
// 逆向为: state_old[(3-r)*4+c] = (state_new[c*4+r] ^ seq[c*4+r])
for (int c = 0; c < 4; c++) {
for (int r = 0; r < 4; r++) {
out[(3 - r) * 4 + c] = state[c * 4 + r] ^ seq[c * 4 + r];
}
}
memcpy(state, out, 16);
}

void inv_shift_rows(uint8_t *s) {
uint8_t t[16]; memcpy(t, s, 16);
s[1]=t[5]; s[5]=t[9]; s[9]=t[13]; s[13]=t[1];
s[2]=t[14]; s[6]=t[2]; s[10]=t[6]; s[14]=t[10];
s[3]=t[11]; s[7]=t[15]; s[11]=t[3]; s[15]=t[7];
}

void inv_mix_columns(uint8_t *s) {
uint8_t mat[4][4] = {{0x80,0xF3,0x64,0xAF},{0xAF,0x80,0xF3,0x64},{0x64,0xAF,0x80,0xF3},{0xF3,0x64,0xAF,0x80}};
uint8_t tmp[16]; memcpy(tmp, s, 16);
for (int c = 0; c < 4; c++) {
for (int r = 0; r < 4; r++) {
uint8_t v = 0;
for (int k = 0; k < 4; k++) v ^= gf_mul(mat[r][k], tmp[c * 4 + k]);
s[c * 4 + r] = v;
}
}
}

int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <32-hex-suffix-or-flag>\n", argv[0]);
return 1;
}

make_sbox();
make_round_keys();

char *hex = argv[1];
if (strlen(hex) > 32) {
char *p = strstr(hex, "PART2_");
if (p) hex = p + 6;
}

uint8_t block[16];
for (int i = 0; i < 16; i++) {
uint32_t val;
sscanf(hex + i * 2, "%02x", &val);
block[i] = (uint8_t)val ^ POST_XOR[i];
}

// --- 开始解密流程 ---
// 最后一轮 (Round 11)
add_round_key(block, 11);
inv_shift_rows(block);
inv_round_tweak(block, 11);
for (int i = 0; i < 16; i++) block[i] = INV_SBOX[block[i]];

// 循环轮 (Round 10 -> 1)
for (int r = 10; r >= 1; r--) {
add_round_key(block, r);
inv_mix_columns(block);
inv_shift_rows(block);
inv_round_tweak(block, r);
for (int i = 0; i < 16; i++) block[i] = INV_SBOX[block[i]];
}

// 初始轮 (Round 0)
add_round_key(block, 0);
for (int i = 0; i < 16; i++) block[i] ^= PRE_XOR[i];

// 逆向 IV XOR
for (int i = 0; i < 16; i++) {
uint8_t mask = (i & 1) ? IV[i] : IV[15 - i];
block[i] ^= mask;
}

printf("Token: %.8s\n", block);
return 0;
}

PART3 flag

这里分析让 ai 寻找可能是生成 flag 的地方,最终是定位到函数 sub_A9A7C ,去花后得到

image-20260418180623787

ai 给出解读如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
char *part3_generate(const char *token8) {
pthread_mutex_lock(&stru_183470);

ensure_global_output_buffer();
memset(qword_183498, 0, dword_168310);

memcpy(unk_1836C0, token8, 8);
memcpy(unk_1836C0 + 8, token8, 8);

// 加载 0x63DA0 处的 0x1526 字节 VM bytecode
// 注册 VM 回调 e / f
register_callback('e', VMEntry);
register_callback('f', sub_AA758);

run_vm_program();

pthread_mutex_unlock(&stru_183470);
return unk_1836E0;
}

一个是 VMEntry ,去花得到

image-20260418180818227

另一个是sub_AA758 ,去花得到

image-20260418180845651

所以 part3/Trigger4 的生成闭环是:

1
2
3
4
5
6
7
8 字节 ASCII token
-> sub_A9A7C
-> native VM bytecode: unk_63DA0, size 0x1526
-> VM 通过 e(i) 取 token[i]
-> VM 算出两个 uint32
-> f(x, y) 格式化为 16 位 hex suffix
-> flag{sec2026_PART3_<suffix>}

看来最关键的就是 unk_63DA0 这个长度为 0x1526 的字节码了

image-20260418223433877

我原本的想法是通过 frida 去 trace 指令流的,但是由于代码完整性保护,以及不太熟悉 godot 的引擎实现,再加上实在是太过于混淆伪代码,最终还是放弃了这个方案。

最终使用的方案是先用 Unicorn 在本地把函数 sub_A9A7C 跑起来,再去读其中的寄存器,类似与打测信道攻击,ai给出实现过程如下

用 Unicorn 把整个 sub_A9A7C 跑起来

1.1 内存布局

Unicorn 是裸的 CPU 模拟器,按如下设置:

名称 地址 大小 用途
LIB_BASE 0x40000000 so 大小 把 libsec2026.so 直接 map 进来
STACK 0x70000000 1 MB x86 风格的下行栈
HEAP 0x80000000 64 MB sub_A9A7C 内部各种结构体使用
FAKE_LIBC 0x90000000 64 KB snprintf/memcpy/memset 等假函数 trampoline

把 ELF 用 lief 解析出 PT_LOAD,按权限 mmap,再把 GOT/PLT 里调用的 libc 符号全部改写成 FAKE_LIBC + offset 的桩函数。

1.2 FakeLibc 桩

只实现真正会被走到的几个:

  • snprintf / __snprintf_chk:自己手写 %08x 格式化,把结果写回 buf;返回写入字节数。
  • memcpy / memset / strlen / strncmp:直接读写 Unicorn 的内存。
  • __stack_chk_fail__cxa_atexit 等:直接 ret

Unicorn 用 uc.hook_add(UC_HOOK_CODE, ...) 在每条指令前检查 pc,如果落入 FAKE_LIBC 区间,就把对应函数当 Python 来执行,然后 pc = LR; SP += 0,相当于 ret

1.3 调用约定

sub_A9A7C(this, token_ptr)

  • X0 = this(一个堆里分配的状态结构体,提前 zero 出来)
  • X1 = token_ptr(指向 8 字节 token)
  • LR = MAGIC_RETURN(一个 hook 监听的非法地址,用来知道函数啥时候结束)

跑起来之后,hook 在 snprintf 处截获写入 buf 的 16 字符就是 flag 后缀。

1.4 验证

得到如下代码,此时就在 Unicorn 完成了从 token 到 flag 的生成模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
"""
Stage A: Unicorn-based offline emulator for sub_A9A7C in libsec2026.so.

Goal: given an 8-char hex token, call sub_A9A7C and return its 16-char
suffix, producing the same result as running the .so on-device. No Frida,
no phone, no anti-debug concerns.

Usage:
python emulate_sub_A9A7C.py 12345678
Expected outputs (validation set):
12345678 -> 7b84d2118e34500f
abcdef01 -> b9360ac9ff33cdc7
4aca4699 -> 224bbbfc6eadbf92
"""
import os
import sys
import struct
from unicorn import Uc, UC_ARCH_ARM64, UC_MODE_ARM, UcError
from unicorn import UC_PROT_READ, UC_PROT_WRITE, UC_PROT_EXEC
from unicorn import UC_HOOK_CODE, UC_HOOK_MEM_INVALID, UC_HOOK_MEM_READ_UNMAPPED, UC_HOOK_MEM_WRITE_UNMAPPED, UC_HOOK_INTR
from unicorn.arm64_const import *
from elftools.elf.elffile import ELFFile

HERE = os.path.dirname(os.path.abspath(__file__))
SO_PATH = os.path.join(HERE, "..", "final", "lib", "arm64-v8a", "libsec2026.so")
SO_PATH = os.path.abspath(SO_PATH)

# ---------- address map ----------
LIB_BASE = 0x40000000
STACK_BASE = 0x50000000
STACK_SIZE = 0x00040000 # 256 KB
STUB_BASE = 0x80000000 # fake libc stubs live here
STUB_SIZE = 0x00010000
HEAP_BASE = 0x60000000 # for malloc
HEAP_SIZE = 0x04000000 # 64 MB
TLS_BASE = 0x70000000 # pthread TLS for TPIDR_EL0
TLS_SIZE = 0x00001000
IO_BASE = 0x71000000 # input/output buffers
IO_SIZE = 0x00010000
RETURN_SENTINEL = 0x90000000

# offsets we need inside libsec2026.so
SUB_A9A7C = 0xA9A7C
SUB_AA6AC = 0xAA6AC # variadic wrapper around __vsprintf_chk

# Relocation types (AArch64)
R_AARCH64_ABS64 = 257
R_AARCH64_GLOB_DAT = 1025
R_AARCH64_JUMP_SLOT = 1026
R_AARCH64_RELATIVE = 1027

# ---------- helpers ----------
def page_align_up(x, page=0x1000): return (x + page - 1) & ~(page - 1)
def page_align_dn(x, page=0x1000): return x & ~(page - 1)

# ============================================================
class FakeLibc:
"""Stubs for the libc functions referenced by libsec2026.so."""
def __init__(self, emu):
self.emu = emu
self.heap_ptr = HEAP_BASE
# each stub is 8 bytes of code: RET (0xd65f03c0 little-endian).
# We don't actually rely on those bytes — we hook by address — but
# having a RET is a safe fallback if a hook ever misses.
self.names = [
"pthread_mutex_lock", "pthread_mutex_unlock",
"pthread_create",
"malloc", "free",
"memcpy", "memset", "memmove", "strlen", "strcmp", "strncmp",
"strcpy", "strncpy", "strchr",
"__memcpy_chk", "__memset_chk", "__memmove_chk",
"__strcpy_chk", "__strncpy_chk", "__strcat_chk",
"__stack_chk_fail", "__cxa_finalize", "__cxa_atexit",
"abort", "exit",
]
self.addr_of = {}
for i, n in enumerate(self.names):
self.addr_of[n] = STUB_BASE + i * 0x10
self.name_of = {v: k for k, v in self.addr_of.items()}

def install(self, uc):
# write a `RET` at each stub address as safety net
ret_code = struct.pack("<I", 0xD65F03C0) # RET
for addr in self.addr_of.values():
uc.mem_write(addr, ret_code)

def on_hit(self, uc, addr):
"""Called when PC enters a stub address. Simulate the libc call
and return by setting PC = LR."""
name = self.name_of[addr]
x0 = uc.reg_read(UC_ARM64_REG_X0)
x1 = uc.reg_read(UC_ARM64_REG_X1)
x2 = uc.reg_read(UC_ARM64_REG_X2)
x3 = uc.reg_read(UC_ARM64_REG_X3)
lr = uc.reg_read(UC_ARM64_REG_LR)

if name == "pthread_mutex_lock" or name == "pthread_mutex_unlock":
uc.reg_write(UC_ARM64_REG_X0, 0)
elif name == "pthread_create":
# X0 = thread_t *out, X2 = start_routine, X3 = arg
# Anti-debug threads - just pretend success without launching.
# Write a fake thread handle to *X0.
if x0:
try: uc.mem_write(x0, b"\x00" * 8)
except UcError: pass
uc.reg_write(UC_ARM64_REG_X0, 0)
elif name == "malloc":
size = x0
# 16-byte align
size = (size + 15) & ~15
p = self.heap_ptr
self.heap_ptr += size
if self.heap_ptr > HEAP_BASE + HEAP_SIZE:
raise RuntimeError("heap exhausted")
# zero it (malloc doesn't, but safer)
uc.mem_write(p, b"\x00" * size)
uc.reg_write(UC_ARM64_REG_X0, p)
elif name == "free":
pass # no-op
elif name == "memcpy" or name == "memmove" or name == "__memcpy_chk" or name == "__memmove_chk":
dst, src, n = x0, x1, x2
if n:
data = uc.mem_read(src, n)
uc.mem_write(dst, bytes(data))
uc.reg_write(UC_ARM64_REG_X0, dst)
elif name == "memset" or name == "__memset_chk":
dst, c, n = x0, x1 & 0xFF, x2
if n:
uc.mem_write(dst, bytes([c]) * n)
uc.reg_write(UC_ARM64_REG_X0, dst)
elif name == "strcpy" or name == "__strcpy_chk":
dst, src = x0, x1
i = 0
while True:
b = bytes(uc.mem_read(src + i, 1))[0]
uc.mem_write(dst + i, bytes([b]))
i += 1
if b == 0: break
if i > 0x10000: break
uc.reg_write(UC_ARM64_REG_X0, dst)
elif name == "strncpy" or name == "__strncpy_chk":
dst, src, n = x0, x1, x2
i = 0
hit = False
while i < n:
if not hit:
b = bytes(uc.mem_read(src + i, 1))[0]
if b == 0: hit = True
else:
b = 0
uc.mem_write(dst + i, bytes([b]))
i += 1
uc.reg_write(UC_ARM64_REG_X0, dst)
elif name == "__strcat_chk":
# dst, src, dst_buf_size
dst, src = x0, x1
# find end of dst
dl = 0
while bytes(uc.mem_read(dst + dl, 1))[0] != 0 and dl < 0x10000:
dl += 1
i = 0
while True:
b = bytes(uc.mem_read(src + i, 1))[0]
uc.mem_write(dst + dl + i, bytes([b]))
i += 1
if b == 0: break
if i > 0x10000: break
uc.reg_write(UC_ARM64_REG_X0, dst)
elif name == "strchr":
s, c = x0, x1 & 0xFF
i = 0
found = 0
while True:
b = bytes(uc.mem_read(s + i, 1))[0]
if b == c:
found = s + i; break
if b == 0: break
i += 1
if i > 0x10000: break
uc.reg_write(UC_ARM64_REG_X0, found)
elif name == "strlen":
s = x0
L = 0
while True:
b = bytes(uc.mem_read(s + L, 1))[0]
if b == 0: break
L += 1
if L > 0x10000: break
uc.reg_write(UC_ARM64_REG_X0, L)
elif name == "strcmp":
s1, s2 = x0, x1
for i in range(0x10000):
a = bytes(uc.mem_read(s1 + i, 1))[0]
b = bytes(uc.mem_read(s2 + i, 1))[0]
if a != b:
uc.reg_write(UC_ARM64_REG_X0, (a - b) & 0xFFFFFFFFFFFFFFFF)
break
if a == 0:
uc.reg_write(UC_ARM64_REG_X0, 0)
break
elif name == "strncmp":
s1, s2, n = x0, x1, x2
rv = 0
for i in range(n):
a = bytes(uc.mem_read(s1 + i, 1))[0]
b = bytes(uc.mem_read(s2 + i, 1))[0]
if a != b:
rv = (a - b) & 0xFFFFFFFFFFFFFFFF; break
if a == 0: break
uc.reg_write(UC_ARM64_REG_X0, rv)
elif name == "__stack_chk_fail":
raise RuntimeError("__stack_chk_fail invoked — canary mismatch")
elif name == "abort" or name == "exit":
raise RuntimeError(f"{name}() invoked, LR=0x{lr:x}")
elif name in ("__cxa_finalize", "__cxa_atexit"):
uc.reg_write(UC_ARM64_REG_X0, 0)
else:
raise RuntimeError("unhandled stub: " + name)

# return
uc.reg_write(UC_ARM64_REG_PC, lr)


# ============================================================
class Emulator:
def __init__(self, so_path, verbose=False):
self.so_path = so_path
self.verbose = verbose
self.uc = Uc(UC_ARCH_ARM64, UC_MODE_ARM)
self.libc = FakeLibc(self)
self.code_hook_count = 0
self.unresolved_stub_hits = {}

self._map_memory()
self._load_segments()
self._apply_relocations()
self._setup_tls_and_canary()
self.libc.install(self.uc)
self._install_hooks()

# ---------- setup ----------
def _map_memory(self):
uc = self.uc
# Library region — 16MB is plenty (file is ~1.7MB)
self.lib_region_size = 0x01000000
uc.mem_map(LIB_BASE, self.lib_region_size, UC_PROT_READ | UC_PROT_WRITE | UC_PROT_EXEC)
uc.mem_map(STACK_BASE, STACK_SIZE, UC_PROT_READ | UC_PROT_WRITE)
uc.mem_map(STUB_BASE, STUB_SIZE, UC_PROT_READ | UC_PROT_EXEC)
uc.mem_map(HEAP_BASE, HEAP_SIZE, UC_PROT_READ | UC_PROT_WRITE)
uc.mem_map(TLS_BASE, TLS_SIZE, UC_PROT_READ | UC_PROT_WRITE)
uc.mem_map(IO_BASE, IO_SIZE, UC_PROT_READ | UC_PROT_WRITE)
# return sentinel: 1 page, no exec, read-only. We detect PC==SENTINEL
uc.mem_map(RETURN_SENTINEL & ~0xFFF, 0x1000, UC_PROT_READ | UC_PROT_EXEC)
# fill sentinel page with BRK instructions
uc.mem_write(RETURN_SENTINEL & ~0xFFF, b"\x00\x00\x20\xD4" * (0x1000 // 4)) # BRK #0

def _load_segments(self):
uc = self.uc
with open(self.so_path, "rb") as f:
elf = ELFFile(f)
self.dynsym = elf.get_section_by_name(".dynsym")
# load PT_LOAD
for seg in elf.iter_segments():
if seg.header.p_type != "PT_LOAD":
continue
va = seg.header.p_vaddr
data = seg.data()
memsz = seg.header.p_memsz
# write file data
uc.mem_write(LIB_BASE + va, data)
# zero-fill up to memsz (BSS)
if memsz > len(data):
uc.mem_write(LIB_BASE + va + len(data), b"\x00" * (memsz - len(data)))

def _apply_relocations(self):
uc = self.uc
with open(self.so_path, "rb") as f:
elf = ELFFile(f)
dynsym = elf.get_section_by_name(".dynsym")
total = applied = unresolved_funcs = 0
for sec_name in (".rela.dyn", ".rela.plt"):
sec = elf.get_section_by_name(sec_name)
if sec is None: continue
for rel in sec.iter_relocations():
total += 1
r_off = rel.entry.r_offset
r_type = rel.entry.r_info_type
r_addend = rel.entry.r_addend
r_sym = rel.entry.r_info_sym
addr = LIB_BASE + r_off
if r_type == R_AARCH64_RELATIVE:
val = LIB_BASE + r_addend
uc.mem_write(addr, struct.pack("<Q", val))
applied += 1
elif r_type in (R_AARCH64_GLOB_DAT, R_AARCH64_JUMP_SLOT, R_AARCH64_ABS64):
sym = dynsym.get_symbol(r_sym)
name = sym.name
if sym.entry.st_shndx != "SHN_UNDEF":
# internal symbol
val = LIB_BASE + sym.entry.st_value + r_addend
else:
if name in self.libc.addr_of:
val = self.libc.addr_of[name]
else:
# unresolved external — write a unique stub addr so we
# detect it if ever called.
val = STUB_BASE + 0x8000 + (r_sym * 8)
if name:
self.unresolved_stub_hits[val] = name
unresolved_funcs += 1
uc.mem_write(addr, struct.pack("<Q", val))
applied += 1
else:
# unknown type — skip; not expected to matter for our target.
pass
if self.verbose:
print(f"[reloc] total={total} applied={applied} "
f"unresolved_ext_funcs={unresolved_funcs}")

def _setup_tls_and_canary(self):
# TPIDR_EL0 -> TLS_BASE; [TPIDR_EL0 + 0x28] = canary (fixed value).
# bionic puts stack canary at TLS slot 5 (+0x28 on arm64).
self.canary = 0xDEADBEEFCAFEBABE
self.uc.mem_write(TLS_BASE + 0x28, struct.pack("<Q", self.canary))
self.uc.reg_write(UC_ARM64_REG_TPIDR_EL0, TLS_BASE)

# ---------- hooks ----------
def _install_hooks(self):
uc = self.uc
# catch execution inside stub region
uc.hook_add(UC_HOOK_CODE, self._hook_code,
begin=STUB_BASE, end=STUB_BASE + STUB_SIZE)
# internal hook: intercept sub_AA6AC (variadic snprintf-wrapper)
uc.hook_add(UC_HOOK_CODE, self._hook_sub_AA6AC,
begin=LIB_BASE + SUB_AA6AC, end=LIB_BASE + SUB_AA6AC + 4)
uc.hook_add(UC_HOOK_MEM_INVALID, self._hook_mem_invalid)
uc.hook_add(UC_HOOK_INTR, self._hook_intr)

def _hook_sub_AA6AC(self, uc, address, size, user_data):
"""Emulate `sub_AA6AC(dst=X0, dst_size=X1, fmt=X2, ...varargs...)`
as a snprintf-like call."""
dst = uc.reg_read(UC_ARM64_REG_X0)
dstsize = uc.reg_read(UC_ARM64_REG_X1) & 0xFFFFFFFF
fmt_ptr = uc.reg_read(UC_ARM64_REG_X2)
lr = uc.reg_read(UC_ARM64_REG_LR)
# read up to 256 bytes of fmt
raw = bytes(uc.mem_read(fmt_ptr, 256))
nul = raw.find(b"\x00")
if nul >= 0: raw = raw[:nul]
fmt = raw.decode("latin-1", errors="replace")

# collect a pool of integer varargs: X3..X7 then from stack
ireg = [uc.reg_read(r) for r in (UC_ARM64_REG_X3, UC_ARM64_REG_X4,
UC_ARM64_REG_X5, UC_ARM64_REG_X6,
UC_ARM64_REG_X7)]
# additional varargs on stack, if any
sp = uc.reg_read(UC_ARM64_REG_SP)
stack_vals = []
try:
stack_raw = bytes(uc.mem_read(sp, 0x80))
for i in range(0, len(stack_raw), 8):
stack_vals.append(struct.unpack_from("<Q", stack_raw, i)[0])
except UcError:
pass
arg_pool = ireg + stack_vals

# Minimal printf implementation handling %s, %d, %u, %x, %X, %02x,
# %c, %% . No floats expected.
out = []
i = 0; ai = 0
while i < len(fmt):
c = fmt[i]; i += 1
if c != '%':
out.append(c); continue
# parse flags/width/precision (width/prec only)
spec = ''
while i < len(fmt) and fmt[i] in '0123456789#-+ .':
spec += fmt[i]; i += 1
if i >= len(fmt): break
conv = fmt[i]; i += 1
if conv == 'l': # skip length modifiers
if i < len(fmt) and fmt[i] == 'l': i += 1
conv = fmt[i]; i += 1
if conv == '%':
out.append('%'); continue
if conv == 's':
p = arg_pool[ai]; ai += 1
if p == 0:
out.append("(null)")
else:
buf = bytes(uc.mem_read(p, 0x200))
z = buf.find(b"\x00")
out.append(buf[:z if z >= 0 else 0x200].decode("latin-1", errors="replace"))
continue
if conv in ('d','i'):
v = arg_pool[ai]; ai += 1
# sign-extend 32-bit
v32 = v & 0xFFFFFFFF
if v32 & 0x80000000: v32 -= 0x100000000
out.append(("%"+spec+"d") % v32); continue
if conv in ('u',):
v = arg_pool[ai]; ai += 1
out.append(("%"+spec+"u") % (v & 0xFFFFFFFF)); continue
if conv in ('x','X'):
v = arg_pool[ai]; ai += 1
out.append(("%"+spec+conv) % (v & 0xFFFFFFFF)); continue
if conv == 'c':
v = arg_pool[ai]; ai += 1
out.append(chr(v & 0xFF)); continue
if conv == 'p':
v = arg_pool[ai]; ai += 1
out.append("0x%x" % (v & 0xFFFFFFFFFFFFFFFF)); continue
# unknown, emit literally
out.append('%'+spec+conv)

s = "".join(out)
enc = s.encode("latin-1", errors="replace")
enc = enc[:max(0, dstsize - 1)]
try:
uc.mem_write(dst, enc + b"\x00")
except UcError as e:
print(f"[sub_AA6AC hook] failed to write dst=0x{dst:x}: {e}")

if self.verbose:
print(f"[snprintf] dst=0x{dst:x} size={dstsize} fmt={fmt!r} -> {s!r}")

uc.reg_write(UC_ARM64_REG_X0, len(enc))
uc.reg_write(UC_ARM64_REG_PC, lr)

def _hook_code(self, uc, address, size, user_data):
if STUB_BASE <= address < STUB_BASE + STUB_SIZE:
if address in self.libc.name_of:
self.libc.on_hit(uc, address)
else:
name = self.unresolved_stub_hits.get(address, f"<stub@0x{address:x}>")
# be permissive: print a warning, set X0=0, return
lr = uc.reg_read(UC_ARM64_REG_LR)
print(f"[stub MISS] name={name} from LR=0x{lr:x} "
f"(libOff=0x{lr-LIB_BASE:x}) — returning 0")
uc.reg_write(UC_ARM64_REG_X0, 0)
uc.reg_write(UC_ARM64_REG_PC, lr)

def _hook_mem_invalid(self, uc, access, address, size, value, user_data):
pc = uc.reg_read(UC_ARM64_REG_PC)
print(f"[MEM_INVALID] access={access} addr=0x{address:x} size={size} "
f"val=0x{value:x} PC=0x{pc:x} (libOff=0x{pc-LIB_BASE:x})")
return False # stop

def _hook_intr(self, uc, intno, user_data):
pc = uc.reg_read(UC_ARM64_REG_PC)
print(f"[INTR] intno={intno} PC=0x{pc:x} (libOff=0x{pc-LIB_BASE:x})")
uc.emu_stop()

# ---------- execute ----------
def call_sub_A9A7C(self, token_bytes):
uc = self.uc
assert len(token_bytes) == 8
# write input at IO_BASE+0x100, terminate with a lot of zeros
in_addr = IO_BASE + 0x100
uc.mem_write(in_addr, token_bytes + b"\x00" * 32)
# set up registers
sp = STACK_BASE + STACK_SIZE - 0x100
uc.reg_write(UC_ARM64_REG_SP, sp)
uc.reg_write(UC_ARM64_REG_X0, in_addr)
uc.reg_write(UC_ARM64_REG_LR, RETURN_SENTINEL)

start_pc = LIB_BASE + SUB_A9A7C
# run with a big instruction budget
try:
uc.emu_start(start_pc, RETURN_SENTINEL, timeout=0, count=0)
except UcError as e:
pc = uc.reg_read(UC_ARM64_REG_PC)
print(f"[UcError] {e} PC=0x{pc:x} libOff=0x{pc-LIB_BASE:x}")
raise

# X0 holds the return pointer — read a C-string from it
out_ptr = uc.reg_read(UC_ARM64_REG_X0)
if out_ptr == 0:
return None
# read up to 64 bytes
raw = bytes(uc.mem_read(out_ptr, 64))
nul = raw.find(b"\x00")
if nul >= 0:
raw = raw[:nul]
return raw.decode("latin-1", errors="replace")


# ============================================================
def main():
if len(sys.argv) < 2:
# default test vectors
tests = [("12345678", "7b84d2118e34500f"),
("abcdef01", "b9360ac9ff33cdc7"),
("4aca4699", "224bbbfc6eadbf92")]
else:
tests = [(sys.argv[1], None)]

emu = Emulator(SO_PATH, verbose=True)
for tok, expected in tests:
print(f"\n=== token={tok} expected={expected} ===")
res = emu.call_sub_A9A7C(tok.encode("ascii"))
ok = (expected is None) or (res == expected)
print(f" suffix = {res!r} {'OK' if ok else 'WRONG'}")


if __name__ == "__main__":
main()

对比 GDScript 逻辑:输入题目给的 token,输出后缀,跟 GDScript 里硬编码的常量拼起来正好等于已知 flag。

第一阶段完成。

但是此时只是把当前函数当黑箱跑通,下一步需要看这个函数是如何解析字节码的

2. 第二阶段:识别 VM 结构

光有”能跑”不够,下一步需要解决:sub_A9A7C 到底是怎么解释字节码的?

2.1 推 opcode 编码

思路在于让 native 跑一遍,每条字节码指令前后快照 “32 个 VM 寄存器(在 heap+0x329da8)+ 整个 VM 内存(64 KB)”,然后按 (op, sub) 分组,列出每个 opcode 的代表样本(操作数 + 寄存器 diff + 内存 diff)。

使用如下代码

opcode_effects.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
"""
Derive full semantics of every (op,sub) pair by observing
register-file and memory diffs from a native trace.
For each opcode, record pre/post state of all 32 regs, all VM memory,
and the decoded operands. Then analyze.
"""
import os, sys, struct, json, pickle
from collections import defaultdict
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from emulate_sub_A9A7C import Emulator, SO_PATH, LIB_BASE, HEAP_BASE
from unicorn.arm64_const import *
from vm_disasm_v2 import disasm_one

BC_FILE_OFF = 0x63DA0
BC_LEN = 0x1526
RF_BASE = HEAP_BASE + 0x329da8
bc = open(os.path.join(os.path.dirname(os.path.abspath(__file__)), "..","unk_63DA0_bytecode.bin"), "rb").read()


def run_and_log(token, max_steps=None):
"""Step one instruction at a time in native, snapshot regs+mem."""
emu = Emulator(SO_PATH, verbose=False)
h0 = [None]; events = []
last_snapshot = [None] # (pc, ins, rf, mem)

orig = emu.libc.on_hit
def patched(uc, addr):
name = emu.libc.name_of[addr]
if name in ("memcpy","memmove","__memcpy_chk","__memmove_chk"):
dst = uc.reg_read(UC_ARM64_REG_X0)
src = uc.reg_read(UC_ARM64_REG_X1)
n = uc.reg_read(UC_ARM64_REG_X2)
if n == BC_LEN and src == LIB_BASE + BC_FILE_OFF and h0[0] is None:
h0[0] = dst
if h0[0] is not None and h0[0] <= src < h0[0] + BC_LEN:
pc = src - h0[0]
d = disasm_one(bc, pc)
if d is None: orig(uc,addr); return
ilen, ins = d
# close previous event (post-state of last ins)
if last_snapshot[0] is not None:
ppc, pins, prf, _ = last_snapshot[0]
post_rf = bytes(uc.mem_read(RF_BASE, 32*8))
# Only snapshot a window of VM mem that might change
post_mem = bytes(uc.mem_read(h0[0] + 0x10000, 0x10000))
events.append({"pc": ppc, "ins": pins, "len": pins.get("_len", ilen),
"pre_rf": prf, "post_rf": post_rf,
"pre_mem": last_snapshot[0][3], "post_mem": post_mem})
if max_steps and len(events) >= max_steps:
uc.emu_stop()
return
pre_rf = bytes(uc.mem_read(RF_BASE, 32*8))
pre_mem = bytes(uc.mem_read(h0[0] + 0x10000, 0x10000))
last_snapshot[0] = (pc, ins, pre_rf, pre_mem)
orig(uc, addr)
emu.libc.on_hit = patched
suf = emu.call_sub_A9A7C(token)
return suf, events


def diff_rf(a, b):
out = {}
for i in range(32):
va = struct.unpack_from("<Q", a, i*8)[0]
vb = struct.unpack_from("<Q", b, i*8)[0]
if va != vb:
out[i] = (va, vb)
return out


def diff_mem(a, b):
out = []
i = 0
while i < len(a):
if a[i] != b[i]:
j = i
while j < len(a) and a[j] != b[j]:
j += 1
out.append((0x10000 + i, a[i:j], b[i:j]))
i = j
else:
i += 1
return out


if __name__ == "__main__":
print("Running native trace...")
suf, events = run_and_log(b"12345678", max_steps=None)
print(f"suffix={suf} events={len(events)}")
# Group by opcode
by_op = defaultdict(list)
for e in events:
key = (e["ins"]["op"], e["ins"]["sub"])
by_op[key].append(e)

print(f"\nUnique opcodes seen: {len(by_op)}")
for key in sorted(by_op.keys()):
# pick an example that has mem changes if possible
examples = by_op[key]
ex = next((e for e in examples if diff_mem(e["pre_mem"], e["post_mem"])), examples[0])
ops = ex["ins"]["operands"]
drf = diff_rf(ex["pre_rf"], ex["post_rf"])
dmem = diff_mem(ex["pre_mem"], ex["post_mem"])
# format pre-values of operand regs too
pre_vals = {}
for t, v in ops:
if t == 'r':
pre_vals[v] = struct.unpack_from("<Q", ex["pre_rf"], v*8)[0]
rf_str = ", ".join(f"r{i:02x}:{a:x}->{b:x}" for i,(a,b) in drf.items() if i != 0x12)
mem_str = " | ".join(f"@{a:05x}:{pb.hex()}->{nb.hex()}" for a,pb,nb in dmem[:3])
print(f"\n OP_{key[0]:02x}_{key[1]:02x} count={len(by_op[key])} len={ex['ins'].get('_len','?')}")
print(f" ex-operands: {ops} pre-vals: {pre_vals}")
print(f" pc=0x{ex['pc']:04x} rf_diff: {rf_str}")
if dmem: print(f" mem_diff: {mem_str}")

通过观察 diff 反推语义。最终归纳出指令格式:

1
2
3
[op:1][sub:1][flag:1][n_operands:1][operand]*
operand = 0x01 + reg(1B) // 寄存器引用
| 0x04 + imm(4B) // 立即数

flag 字节是 0x00 或 0x20,不影响语义但解码器必须放过它。

共识别出 27 个 (op,sub) 对,命名为 OP_xx_yy

助记 语义
OP_00_01(d,a,b) d = a + b64 位,不截断)
OP_02_02(d,a,b) d = a XOR b
OP_04_02(d,a,b) d = a << b64 位,不截断)
OP_09_02(d,a,b) d = (a >> b) & 0xFFFFFFFF唯一一处显式 32 位截断
OP_06_03(d,#imm) d = imm32
OP_00_03(d,r) d = r
OP_03_03(r) push r
OP_04_03(r) pop r
OP_01_03(d,ptr) d = u32[ptr](LOAD)
OP_02_03(src,ptr) u32[ptr] = src(STORE)
OP_00_04(#tgt) JMP
OP_07_04(a,b) cmp_eq=(a==b); cmp_lt=(a<b)
OP_01_04(#tgt) if cmp_eq: jmp tgt
OP_04_04(#tgt) if cmp_lt: jmp tgt
OP_00_09(d,#n) ALLOCAd=cursor; cursor += (n+2)*8(多出 16 字节头)
OP_01_09(d,p,i) d = u64[p + 16 + i*8](带 16 字节 header 的索引读)
OP_02_00(#fn,#n) “syscall”:fn=0x65 注入 token,fn=0x66 输出并停机

2.2 写反汇编器

从 VM PC=0x10000 起线性扫描整段字节码,按上面格式打印出每条指令。

代码如下,生成 emulator/vm_disasm_v2.txt

vm_disasm_v2.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
"""
Proper VM disassembler based on the discovered instruction encoding.

Instruction format:
[op:1] [sub:1] [pad=00:1] [n_operands:1] [operand]*
operand = [type:1] [value: 1B if type=01, 4B if type=04]

We walk the whole 5414-byte bytecode starting from pc=0 and list every
instruction. We also build an instruction-set summary by (op, sub) pair.
"""
import os, sys, struct, collections

HERE = os.path.dirname(os.path.abspath(__file__))
BC_PATH = os.path.join(HERE, "..", "unk_63DA0_bytecode.bin")


def disasm_one(bc: bytes, pc: int):
"""Return (length, structured_instr) or None if malformed."""
if pc + 4 > len(bc):
return None
op, sub, pad, nops = bc[pc], bc[pc + 1], bc[pc + 2], bc[pc + 3]
# pad can be 0x00 or 0x20 (flag byte); don't gate on it
# parse operands
operands = []
cur = pc + 4
for _ in range(nops):
if cur >= len(bc):
return None
t = bc[cur]; cur += 1
if t == 0x01:
if cur >= len(bc): return None
v = bc[cur]; cur += 1
operands.append(('r', v))
elif t == 0x04:
if cur + 4 > len(bc): return None
v = struct.unpack_from("<I", bc, cur)[0]; cur += 4
operands.append(('imm', v))
elif t == 0x08:
# not yet seen — guess 8-byte value
if cur + 8 > len(bc): return None
v = struct.unpack_from("<Q", bc, cur)[0]; cur += 8
operands.append(('imm64', v))
else:
# unknown operand type
return None
return (cur - pc, {"pc": pc, "op": op, "sub": sub, "nops": nops, "operands": operands})


def fmt_instr(ins):
ops = []
for t, v in ins["operands"]:
if t == "r":
ops.append(f"r{v:02x}")
elif t == "imm":
ops.append(f"#0x{v:x}")
else:
ops.append(f"{t}:{v}")
return f"OP_{ins['op']:02x}_{ins['sub']:02x}({', '.join(ops)})"


def main():
bc = open(BC_PATH, "rb").read()
print(f"[+] bytecode: {len(bc)} bytes (expect 5414)")

# Try walking from pc=0.
# From the existing static disasm we see pc=0 starts with a 32-byte header
# that isn't a valid instruction (table-like data). Look at the
# vm_disasm listing: first valid pc with coherent instruction is ~0x009.
# Actually looking again: pc=0 seems to have "00 04 00 01" which would
# decode as op=0, sub=4, 1 operand (bad perhaps). Let me check.

# Print first 64 bytes
print(f" first64: {bc[:64].hex()}")

# Disassemble from pc=0 with auto-resync.
all_ins = []
pc = 0
bad = 0
while pc < len(bc):
r = disasm_one(bc, pc)
if r is None:
# skip a byte and resync
bad += 1
pc += 1
continue
length, ins = r
all_ins.append(ins)
pc += length

print(f"[+] decoded {len(all_ins)} instructions (bad sync bytes skipped: {bad})")

# Count unique (op, sub) pairs
freq = collections.Counter()
sizes_per_op = collections.defaultdict(list)
for ins in all_ins:
freq[(ins["op"], ins["sub"])] += 1
sizes_per_op[(ins["op"], ins["sub"])].append(ins["nops"])

print("\n[+] instruction-set summary ((op, sub) : count, typical_nops)")
for (o, s), cnt in sorted(freq.items(), key=lambda x: -x[1]):
nops_set = sorted(set(sizes_per_op[(o, s)]))
print(f" 0x{o:02x} 0x{s:02x} x{cnt:5d} nops={nops_set}")

# Dump full listing
out = os.path.join(HERE, "vm_disasm_v2.txt")
with open(out, "w") as f:
pc = 0
for ins in all_ins:
f.write(f"pc=0x{ins['pc']:04x} {fmt_instr(ins)}\n")
print(f"[+] full disasm written: {out} ({len(all_ins)} instructions)")


if __name__ == "__main__":
main()

3. 第三阶段:用纯 Python 重写 VM

3.1 为什么要重写?

两个原因:

  1. Unicorn 太慢,几十毫秒一次,做大规模实验吃不消。
  2. 能在每条 VM 指令前 hook,方便后续做差分调试。

重写得到代码如下

vm_interpret.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
"""
Pure-Python VM interpreter for the bytecode in unk_63DA0_bytecode.bin.
Validated by diffing against the native Unicorn emulator.

Opcode semantics (empirically derived):

ARITHMETIC (sub=01, sub=02)
00_01(d,a,b) d = (a + b) & 0xFFFFFFFF ADD
02_01(d,a,b) d = (a - b) & 0xFFFFFFFF SUB
01_02(d,a,b) d = (a | b) OR
02_02(d,a,b) d = (a ^ b) XOR
04_02(d,a,b) d = (a << (b & 0x3f)) & 0xFFFFFFFF SHL
09_02(d,a,b) d = a & 0xFFFFFFFF TRUNC32

MOV / IMM / CTRL (sub=03, sub=04)
00_03(d,s) d = s MOV
06_03(d,imm32) d = imm32 LOAD_IMM
03_03(r) SP -= 8; mem[SP] = r PUSH
04_03(r) r = mem[SP]; SP += 8 POP
01_03(src,ptr) *(u32*)ptr = src STORE_U32
02_03(src,ptr) *(u32*)ptr = src STORE_U32 (alternate)
00_04(imm) PC = imm JMP
01_04(imm) (?) CALL imm; r14=retPC; PC=imm
04_04(imm) PC = imm JMP (back-edge)
07_04(a,b) flags = a ? b : b CMP/branch?

SPECIAL (sub=05, sub=07, sub=08, sub=09, sub=00)
00_05() no-op / loop-head marker
01_05() no-op / loop-exit marker
05_05(imm) no-op?
06_05(r) (?) set-condition
00_07() no-op (called at pc=0x5a and 0x9)
01_07() no-op (called at pc=0x32)
00_09(d,imm) d = alloca(imm) — first call returns 0x17008, incremented by the imm size
01_09(d,ptr,idx) d = *(u8*)(ptr + idx) LOAD_U8
04_09(ptr,val) *(u8*)ptr = val & 0xff STORE_U8 (but destination is a POP address?)
03_08(r,imm) (?)
02_00(imm,imm) native syscall (token in / result out)

We will refine via diff with the native trace.
"""
import os, sys, struct
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from vm_disasm_v2 import disasm_one

HERE = os.path.dirname(os.path.abspath(__file__))
BC = open(os.path.join(HERE, "..", "unk_63DA0_bytecode.bin"), "rb").read()
VM_BASE = 0x10000
VM_SIZE = 0x10000


class VM:
def __init__(self, token: bytes):
assert len(token) == 8
self.mem = bytearray(VM_SIZE)
# load bytecode at VM 0x10000
self.mem[0x0:len(BC)] = BC
# initial state
self.reg = [0] * 32
self.reg[0x10] = 0x1d000 # SP
self.reg[0x11] = 0x1d000 # FP / stack base
self.reg[0x12] = 0x10000 # PC
self.alloca_cursor = 0x17008
self.token = token
self.halted = False
self.output = None # (x3, x4)
self.flag = False # NE-flag: true if last compare "not-equal"
self.cmp_eq = False
self.cmp_lt = False

# -- memory helpers (VM offset, always within [0, 0x10000) ) --
def _off(self, addr):
return addr - VM_BASE # we keep mem indexed by VM-offset

def rd_u8(self, a): return self.mem[self._off(a)]
def rd_u32(self, a):
o = self._off(a)
return struct.unpack_from("<I", self.mem, o)[0]
def rd_u64(self, a):
o = self._off(a)
return struct.unpack_from("<Q", self.mem, o)[0]
def wr_u8(self, a, v): self.mem[self._off(a)] = v & 0xff
def wr_u32(self, a, v):
struct.pack_into("<I", self.mem, self._off(a), v & 0xFFFFFFFF)
def wr_u64(self, a, v):
struct.pack_into("<Q", self.mem, self._off(a), v & 0xFFFFFFFFFFFFFFFF)

# operand resolution
def val(self, operand, regs=None):
t, v = operand
if t == 'r':
return (regs or self.reg)[v]
return v # imm

# NATIVE syscall emulation
def syscall(self, fn_id, nargs):
if fn_id == 0x65:
# inject token bytes to VM 0x17018, 0x17020, ..., 0x17050 (stride 8, only LSB)
for i, b in enumerate(self.token):
self.wr_u64(0x17018 + 8*i, b)
elif fn_id == 0x66:
# read output from registers: X3 = r0c (u32), X4 = r0e (u32)
self.output = (self.reg[0x0c] & 0xFFFFFFFF, self.reg[0x0e] & 0xFFFFFFFF)
self.halted = True # stop after this
else:
raise NotImplementedError(f"syscall fn={fn_id:#x}")

def step(self, trace=False):
pc = self.reg[0x12]
bc_off = pc - VM_BASE
d = disasm_one(bytes(self.mem[:len(BC)]), bc_off)
if d is None:
raise RuntimeError(f"bad ins at pc=0x{pc:x}")
ilen, ins = d
op, sub = ins["op"], ins["sub"]
ops = ins["operands"]
next_pc = pc + ilen
key = (op, sub)
# default: advance PC
if key == (0x00, 0x01): # ADD (64-bit, no truncate)
d_, a, b = ops
self.reg[d_[1]] = (self.val(a) + self.val(b)) & 0xFFFFFFFFFFFFFFFF
elif key == (0x02, 0x01): # CMP (old variant, also sets eq/lt)
d_, a, b = ops
av, bv = self.val(a), self.val(b)
self.cmp_eq = (av == bv)
self.cmp_lt = (av < bv)
elif key == (0x01, 0x02): # OR
d_, a, b = ops
self.reg[d_[1]] = (self.val(a) | self.val(b))
elif key == (0x02, 0x02): # XOR
d_, a, b = ops
self.reg[d_[1]] = (self.val(a) ^ self.val(b))
elif key == (0x04, 0x02): # SHL (64-bit, no truncate)
d_, a, b = ops
self.reg[d_[1]] = (self.val(a) << (self.val(b) & 0x3f)) & 0xFFFFFFFFFFFFFFFF
elif key == (0x09, 0x02): # SHR32: d = (a >> b) & 0xFFFFFFFF
d_, a, b = ops
self.reg[d_[1]] = (self.val(a) >> (self.val(b) & 0x3f)) & 0xFFFFFFFF
elif key == (0x00, 0x03): # MOV
d_, s = ops
self.reg[d_[1]] = self.val(s)
elif key == (0x06, 0x03): # LOAD_IMM
d_, imm = ops
self.reg[d_[1]] = self.val(imm)
elif key == (0x03, 0x03): # PUSH
r, = ops
self.reg[0x10] -= 8
self.wr_u64(self.reg[0x10], self.reg[r[1]])
elif key == (0x04, 0x03): # POP
r, = ops
self.reg[r[1]] = self.rd_u64(self.reg[0x10])
self.reg[0x10] += 8
elif key == (0x01, 0x03): # LOAD_U32
d_, ptr = ops
self.reg[d_[1]] = self.rd_u32(self.val(ptr))
elif key == (0x02, 0x03): # STORE_U32
src, ptr = ops
self.wr_u32(self.val(ptr), self.val(src))
elif key == (0x00, 0x04): # JMP
tgt, = ops
next_pc = self.val(tgt)
elif key == (0x01, 0x04): # BRANCH_IF_EQ
tgt, = ops
if self.cmp_eq:
next_pc = self.val(tgt)
elif key == (0x04, 0x04): # BRANCH_IF_LT (unsigned)
tgt, = ops
if self.cmp_lt:
next_pc = self.val(tgt)
elif key == (0x07, 0x04): # CMP: eq = (a==b); lt = (a<b)
a, b = ops
av, bv = self.val(a), self.val(b)
self.cmp_eq = (av == bv)
self.cmp_lt = (av < bv)
elif key == (0x00, 0x05): # NOP
pass
elif key == (0x01, 0x05): # NOP (loop-exit marker)
pass
elif key == (0x05, 0x05): # NOP (imm)
pass
elif key == (0x06, 0x05): # TESTZ (sets NE-flag = r != 0)
r, = ops
self.flag = (self.val(r) != 0)
elif key == (0x05, 0x06): # NOP (paired with 05_05)
pass
elif key == (0x00, 0x07): # NOP
pass
elif key == (0x01, 0x07): # NOP
pass
elif key == (0x00, 0x09): # ALLOCA
d_, sz = ops
self.reg[d_[1]] = self.alloca_cursor
self.alloca_cursor += (self.val(sz) + 2) * 8 # imm+2 slots
elif key == (0x01, 0x09): # LOAD slot: addr = ptr + 16 + idx*8
d_, ptr, idx = ops
addr = self.val(ptr) + 16 + self.val(idx) * 8
self.reg[d_[1]] = self.rd_u64(addr)
elif key == (0x04, 0x09): # NOP (slot init marker)
pass
elif key == (0x03, 0x08): # NOP (post-loop-iter marker)
pass
elif key == (0x02, 0x00): # SYSCALL
fn, n = ops
self.syscall(self.val(fn), self.val(n))
else:
raise NotImplementedError(f"unimplemented op {key} at pc=0x{pc:x}")

self.reg[0x12] = next_pc
if trace:
return ins, ilen

def run(self, max_steps=100000):
for _ in range(max_steps):
if self.halted: break
self.step()
return self.output


def format_suffix(out):
x3, x4 = out
return f"{x3:08x}{x4:08x}"


if __name__ == "__main__":
vectors = [
(b"12345678", "7b84d2118e34500f"),
(b"abcdef01", "b9360ac9ff33cdc7"),
(b"4aca4699", "224bbbfc6eadbf92"),
]
for tok, exp in vectors:
vm = VM(tok)
try:
out = vm.run()
got = format_suffix(out) if out else "(no output)"
ok = (got == exp)
print(f"{'OK ' if ok else 'FAIL'} token={tok!r} got={got} exp={exp}")
except Exception as e:
print(f"ERROR token={tok!r} {e}")
import traceback; traceback.print_exc()

3.2 初始状态

  • 按 native 跑出来的初始内存比对:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    class VM:
    def __init__(self, token: bytes):
    assert len(token) == 8
    self.mem = bytearray(VM_SIZE)
    self.mem[0:len(BC)] = BC # 字节码加载到 VM 0x10000
    self.reg = [0] * 32
    self.reg[0x10] = 0x1d000 # SP
    self.reg[0x11] = 0x1d000 # FP
    self.reg[0x12] = 0x10000 # PC
    self.alloca_cursor = 0x17008 # OP_00_09 用
    self.token = token
    self.halted = False
    self.output = None
    self.cmp_eq = self.cmp_lt = False

3.3 lockstep 验证

让 native Unicorn 和 Python VM 同步逐条执行同一段字节码,每条指令后比对所有寄存器。

跑 token=12345678,第一次出现的偏差就是 bug。

代码实现如下

vm_lockstep.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
"""Lockstep-compare pure-Python VM against native emulator.
Report first divergence of regfile state."""
import os, sys, struct
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from emulate_sub_A9A7C import Emulator, SO_PATH, LIB_BASE, HEAP_BASE
from unicorn.arm64_const import *
from vm_disasm_v2 import disasm_one
from vm_interpret import VM

BC_FILE_OFF = 0x63DA0
BC_LEN = 0x1526
RF_BASE = HEAP_BASE + 0x329da8


def run_lockstep(token: bytes, max_steps=50):
vm = VM(token)
emu = Emulator(SO_PATH, verbose=False)
h0 = [None]
step_idx = [0]
diverged = [False]

orig = emu.libc.on_hit
def patched(uc, addr):
name = emu.libc.name_of[addr]
if name in ("memcpy","memmove","__memcpy_chk","__memmove_chk") and not diverged[0]:
dst = uc.reg_read(UC_ARM64_REG_X0)
src = uc.reg_read(UC_ARM64_REG_X1)
n = uc.reg_read(UC_ARM64_REG_X2)
if n == BC_LEN and src == LIB_BASE + BC_FILE_OFF and h0[0] is None:
h0[0] = dst
if h0[0] is not None and h0[0] <= src < h0[0] + BC_LEN:
pc_native = src - h0[0]
# fetch the NATIVE register state BEFORE this instruction executes
rf_native = bytes(uc.mem_read(RF_BASE, 32*8))
reg_native = [struct.unpack_from("<Q", rf_native, i*8)[0] for i in range(32)]
# step python VM to match
pc_py = vm.reg[0x12] - 0x10000
if pc_py != pc_native:
print(f"[step {step_idx[0]}] PC mismatch: native=0x{pc_native:x} python=0x{pc_py:x}")
diverged[0] = True
uc.emu_stop()
return
# check regfile
for i in range(32):
if i == 0x12: continue # PC
if vm.reg[i] != reg_native[i]:
d = disasm_one(bytes(vm.mem[:BC_LEN]), pc_py)
print(f"[step {step_idx[0]}] reg r{i:02x} mismatch BEFORE pc=0x{pc_native:x}: py=0x{vm.reg[i]:x} native=0x{reg_native[i]:x} next_ins={d[1] if d else '?'}")
diverged[0] = True
uc.emu_stop()
return
# step python VM
try:
ins_copy = disasm_one(bytes(vm.mem[:BC_LEN]), pc_py)[1]
pre_r00 = vm.reg[0]
vm.step()
post_r00 = vm.reg[0]
if ins_copy["op"] == 0x01 and ins_copy["sub"] == 0x09 and pre_r00 != post_r00:
ops = ins_copy["operands"]
# print operand values for debugging
op_vals = []
for t, v in ops:
if t == 'r': op_vals.append(f"r{v:02x}=0x{vm.reg[v]:x}/nat=0x{reg_native[v]:x}")
else: op_vals.append(f"#0x{v:x}")
print(f" [trace] step {step_idx[0]} pc=0x{pc_py:x} OP_01_09 {op_vals} -> r00 py:0x{pre_r00:x}->0x{post_r00:x}")
except Exception as e:
d = disasm_one(bytes(vm.mem[:BC_LEN]), pc_py)
print(f"[step {step_idx[0]}] python exception at pc=0x{pc_native:x}: {e} ins={d[1] if d else '?'}")
diverged[0] = True
uc.emu_stop()
return
step_idx[0] += 1
if step_idx[0] >= max_steps:
uc.emu_stop()
return
orig(uc, addr)
emu.libc.on_hit = patched

try:
emu.call_sub_A9A7C(token)
except Exception as e:
pass
print(f"Completed {step_idx[0]} steps without divergence" if not diverged[0] else "diverged above")


if __name__ == "__main__":
run_lockstep(b"12345678", max_steps=int(sys.argv[1]) if len(sys.argv) > 1 else 50)

实际遇到 7 处发散,每个都教训一条:

# 现象(Python 错) 真正语义 教训
1 OP_01_09ptr+i*8 读,得到的值跟 native 差了 16 字节偏移 OP_00_09 ALLOCA 实际预留了 16 字节头部,OP_01_09 必须 ptr+16+i*8 ALLOCA 的形状要从访存模式反推
2 ADD 和 SHL 我顺手 &0xFFFFFFFF,329 步后偏 这两条 op 是 64 位的;只有 OP_09_02 才显式截断 32 位 别假设位宽,用差分锁定
3 OP_01_03 写成了 STORE OP_01_03 是 LOAD,OP_02_03 是 STORE 取名顺序坑人
4 OP_09_02 写成了”纯截断” d = (a >> b) & 0xFFFFFFFF逻辑右移 + 32 位截断 SHR 的截断点很关键
5 反汇编器在 pad=0x20 处 panic flag byte 可以是 0x00 或 0x20 解码器要宽容
6 OP_07_04 / 01_04 / 04_04 完全猜错 三路比较:07_04 设两个 flag,01_04=branch_if_eq,04_04=branch_if_lt 通过强制走每个分支单独 trace 才看清
7 OP_00_09 申请大小写成 imm*8 实际 (imm+2)*8,多出来的 16 字节就是上面那个 header 跨 ALLOCA 的间距告诉你真相

接着运行多一点 token 来验证

代码实现如下

bulk_validate.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"""Validate vm_interpret.py against native emulator with many random tokens."""
import os, sys, random, string
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from vm_interpret import VM, format_suffix
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from compute_suffix import compute_suffix

random.seed(2026)
ok = 0; bad = 0
chars = string.hexdigits.lower()[:16]
for i in range(30):
tok = ''.join(random.choices(chars, k=8)).encode()
vm = VM(tok)
got = format_suffix(vm.run())
exp = compute_suffix(tok.decode())
if got == exp:
ok += 1
else:
bad += 1
print(f"DIFF token={tok} got={got} exp={exp}")
print(f"\n{ok}/{ok+bad} passed")

30 条随机 token 全部和 native 一致。Python VM 验毕。

此时已经可以”完全脱离 Unicorn”运行。但这还是解释器,不是算法。

4. 第四阶段:识别循环结构

4.1 找循环头

代码实现如下

pc_hist.py

1
2
3
4
5
6
7
8
9
from collections import Counter
from vm_interpret import VM
vm = VM(b"12345678")
counts = Counter()
while not vm.halted:
counts[vm.reg[0x12]] += 1
vm.step()
for pc, n in counts.most_common(20):
print(f" pc=0x{pc:x} count={n}")

输出第一行:pc=0x107bf count=29。读 vm_disasm_v2.txtpc=0x07bf 附近:

1
2
3
4
5
6
7
8
pc=0x07bf   OP_06_03(r0f, #0x14088)   ; r0f = &counter
pc=0x07ca OP_01_03(r00, r0f) ; r00 = counter
pc=0x07d2 OP_06_03(r0e, #28) ; r0e = 28
pc=0x07dd OP_07_04(r00, r0e) ; cmp counter, 28
pc=0x07e5 OP_01_04(#0x114b7) ; if eq -> 出口
pc=0x07ee OP_04_04(#0x10892) ; if lt -> 进 body
...
pc=0x10889 OP_00_04(#0x107bf) ; jmp 回循环头

for counter in 0..27: body() = 28 次迭代,循环出口在 pc=0x114b7

4.2 看每轮寄存器演化

代码实现如下

loop_trace.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"""
Trace the high-level algorithm: dump relevant registers at each iteration of
the 28-round loop (pc=0x07bb is the loop head).
"""
import os, sys
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from vm_interpret import VM, format_suffix


def trace_loop(token: bytes):
vm = VM(token)
iter_log = []
while not vm.halted:
pc = vm.reg[0x12]
# log entry to loop head at pc=0x107bb (VM addr 0x07bb)
if pc == 0x107bf:
snap = {i: vm.reg[i] & 0xFFFFFFFF for i in range(0x20)}
iter_log.append(snap)
vm.step()
return iter_log, format_suffix(vm.output)


for tok in [b"12345678", b"4aca4699"]:
log, suf = trace_loop(tok)
print(f"\n===== token={tok} suffix={suf} rounds={len(log)} =====")
print(f"{'round':>5} {'r00':>9} {'r08':>9} {'r09':>9} {'r0a':>9} {'r0b':>9} {'r0c':>9} {'r0d':>9} {'r0e':>9}")
for i, s in enumerate(log):
print(f"{i:>5} {s[0x00]:08x} {s[0x08]:08x} {s[0x09]:08x} {s[0x0a]:08x} {s[0x0b]:08x} {s[0x0c]:08x} {s[0x0d]:08x} {s[0x0e]:08x}")

输出(节选):

1
2
3
4
5
token=b'12345678'  suffix=7b84d2118e34500f  rounds=29
0 r0b=00000000 r0c=34333231 r0d=00000000 r0e=38373635
1 r0b=29e59c9f r0c=a9418a3b r0d=29e59c9f r0e=a9418a3b
2 r0b=53cb393e r0c=... r0d=53cb393e r0e=...
...

立刻看出三件事:

  • r0b == r0d,每轮加 0x29E59C9F。这就是 TEA 风格的 sum / round-key
  • 第 0 轮之后 r0c == r0e,输入两半被首轮”混合”,后面联动。
  • 起手 r0c = u32_le(token[0..3])(”1234”→0x34333231),r0e = u32_le(token[4..7])(”5678”→0x38373635)。

把变量重命名:X = r0b/r0d(轮密钥累加器),Y = r0e(高半),Z = r0c(低半)。

5. 第五阶段:把循环体反编译成数学

把反汇编里 pc=0x0892..0x14ae 这一段(一整轮 body)单独抠出来得到 round_body.txt

5.1 看出”宏指令”模式

观察发现 body 全是同一种七连击在重复:

1
2
3
4
5
6
7
8
9
10
11
OP_03_03(r14)..r0d        ; push 全部寄存器
OP_00_03(r00, r0X) ; 把要算的值放 r00
... 一两条具体运算 ...
OP_06_03(r14, #ret_addr) ; 设返回地址
OP_00_04(#0x10009) ; call 子程序 0x10009 ← 这就是 trunc32(x)
ret_addr:
OP_00_03(r0e, r00) ; 接收返回值
OP_04_03(r14)..r0d ; pop 全部寄存器
OP_00_03(r00, r0e)
OP_06_03(r0f, #scratch) ; scratch 地址
OP_02_03(r00, r0f) ; *scratch = 结果

子程序 0x10009 实际就是 r00 &= 0xFFFFFFFF(32 位截断)。也就是说每个”宏”都是一条 32 位操作,结果写到 vm[0x14000 .. 0x14080] 这块临时槽里。

5.2 把每条宏翻译

按地址顺序读 round_body.txt,把每个 push…call trunc…pop…store 的整段折成一行 Python 表达式。结果如下(v[k] 表示 vm[0x14000+k*8] 这个 u32 槽):

微步 等价 Python 备注
1 X = trunc32(X + 0x29E59C9F) 更新轮密钥
2 v[0] = trunc32(Y << 4) F 第一项
3 v[1] = trunc32(v[0] + 0xF95D664A)
4 v[2] = trunc32(Y + X) F 第二项
5 v[3] = v[1] XOR v[2]
6 v[4] = (Y >> 7) & 0xFFFFFFFF F 第三项
7 v[5] = trunc32(v[4] + 0x12AA364C)
8 v[6] = v[3] XOR v[5] = F(Y, X)
9 v[7] = trunc32(Z + v[6]) Z’ = Z + F
10 v[8] = trunc32(Z' << 6) G 第一项
11 v[9] = trunc32(v[8] + 0x33AD3CEE)
12 v[10] = trunc32(Z' + X) G 第二项
13 v[11] = v[9] XOR v[10]
14 v[12] = (Z' >> 5) & 0xFFFFFFFF G 第三项
15 v[13] = trunc32(v[12] + 0xAABBCCDD)
16 v[14] = v[11] XOR v[13] = G(Z’, X)
17 Y' = trunc32(Y + v[14]) Y 更新
末尾 r0b ← X, r0c ← Z', r0e ← Y',counter+=1 写回循环变量

整理成的两个 round-function:

1
2
3
4
5
6
7
8
F(Y, X) = ((Y<<4)+0xF95D664A) ^ (Y+X) ^ ((Y>>7)+0x12AA364C)
G(Z, X) = ((Z<<6)+0x33AD3CEE) ^ (Z+X) ^ ((Z>>5)+0xAABBCCDD)

def round(Y, Z, X):
X = (X + 0x29E59C9F) & MASK
Z = (Z + F(Y, X)) & MASK
Y = (Y + G(Z, X)) & MASK
return Y, Z, X

这是一个两步 Feistel(先用 Y, XZ,再用新 Z, XY),跟经典 TEA 同根同源。

5.3 出口 pc=0x114b7 之后

读出口段的几条:

1
2
3
4
5
6
7
8
9
10
11
12
13
pc=0x14b7 OP_01_05()             ; nop
pc=0x14bb OP_00_09(r00, #0x10) ; alloca 头部
pc=0x14cc OP_06_03(r0f, #0x14078); 取 Y 的最终槽
pc=0x14d7 OP_01_03(r0e, r0f)
pc=0x14df OP_04_03(r00) ; pop
pc=0x14e5 OP_04_09(r00, r0e) ; 把 r0e 当 X3 写入 alloca[0]
pc=0x14ed OP_03_03(r00) ; push
pc=0x14f3 OP_06_03(r0f, #0x14038); 取 Z 的最终槽
pc=0x14fe OP_01_03(r0e, r0f)
pc=0x1506 OP_04_03(r00) ; pop
pc=0x150c OP_04_09(r00, r0e) ; 把 r0e 当 X4 写入 alloca[1]
pc=0x1514 OP_02_00(#0x66, #0x1) ; syscall 输出 (X3, X4)
pc=0x1522 OP_01_00() ; halt

也就是说:输出就是循环结束时的 (Y, Z),没有任何后处理。

到这里,整套算法的数学形式就完全闭合了。

算法实现

至此,PART3 flag 的算法也是摆在眼前了,写出代码如下

token2flag3.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define DELTA 0x29E59C9F
#define C_F1 0xF95D664A
#define C_F2 0x12AA364C
#define C_G1 0x33AD3CEE
#define C_G2 0xAABBCCDD
#define ROUNDS 28

uint32_t F(uint32_t Y, uint32_t X) {
return ((Y << 4) + C_F1) ^ (Y + X) ^ ((Y >> 7) + C_F2);
}

uint32_t G(uint32_t Z, uint32_t X) {
return ((Z << 6) + C_G1) ^ (Z + X) ^ ((Z >> 5) + C_G2);
}

int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <8-char-token>\n", argv[0]);
return 1;
}

char *token = argv[1];
if (strlen(token) != 8) {
printf("Error: Token must be exactly 8 characters.\n");
return 1;
}

// 将 8 字节 Token 转换为两个 32 位小端整数
uint32_t Z = 0, Y = 0;
memcpy(&Z, token, 4);
memcpy(&Y, token + 4, 4);

uint32_t X = 0;

// 28 轮正向加密
for (int i = 0; i < ROUNDS; i++) {
X = X + DELTA;
Z = Z + F(Y, X);
Y = Y + G(Z, X);
}

// 输出结果,Y 对应 X3 (前8位),Z 对应 X4 (后8位)
printf("flag{sec2026_PART3_%08x%08x}\n", Y, Z);

return 0;
}

flag2token3.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

// 加上 U 后缀,强制作为无符号整数处理,允许 32 位回绕
#define DELTA 0x29E59C9FU
#define C_F1 0xF95D664AU
#define C_F2 0x12AA364CU
#define C_G1 0x33AD3CEEU
#define C_G2 0xAABBCCDDU
#define ROUNDS 28

uint32_t F(uint32_t Y, uint32_t X) {
return ((Y << 4) + C_F1) ^ (Y + X) ^ ((Y >> 7) + C_F2);
}

uint32_t G(uint32_t Z, uint32_t X) {
return ((Z << 6) + C_G1) ^ (Z + X) ^ ((Z >> 5) + C_G2);
}

int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <16-char-hex-suffix>\n", argv[0]);
return 1;
}

char *hex_input = argv[1];

// 处理可能的完整 flag 格式: flag{sec2026_PART3_...}
if (strlen(hex_input) > 16) {
char *p = strstr(hex_input, "PART3_");
if (p) {
hex_input = p + 6;
}
}

// 检查长度,并处理可能带有的 '}'
static char clean_hex[17];
strncpy(clean_hex, hex_input, 16);
clean_hex[16] = '\0';
for(int i=0; i<16; i++) {
if(clean_hex[i] == '}') {
clean_hex[i] = '\0';
break;
}
}

if (strlen(clean_hex) < 16) {
printf("Error: Invalid hex suffix length.\n");
return 1;
}

uint32_t Y, Z;
char hex_part[9] = {0};

// 解析 Y (前8位) 和 Z (后8位)
memcpy(hex_part, clean_hex, 8);
Y = (uint32_t)strtoul(hex_part, NULL, 16);
memcpy(hex_part, clean_hex + 8, 8);
Z = (uint32_t)strtoul(hex_part, NULL, 16);

// 显式执行 32 位乘法回绕,不再触发警告
uint32_t X = (uint32_t)(DELTA * (uint32_t)ROUNDS);

// 28 轮逆向变换
for (int i = 0; i < ROUNDS; i++) {
Y = Y - G(Z, X);
Z = Z - F(Y, X);
X = X - DELTA;
}

// 将 uint32 转换回字节流 (小端存储)
char token[9] = {0};
memcpy(token, &Z, 4);
memcpy(token + 4, &Y, 4);

printf("Token: %s\n", token);

return 0;
}

flag游戏内验证

PART1 flag

image-20260418103711960

我依然按照初赛的思路来完成flag的游戏内验证,即通过 trigger1 这个触发点去触发其他的 trigger。

但实现过程与初赛不同的是

初赛思路如下

GDScript 编译后的常量池中, 字符串字面量作为 String 对象存在因此可以直接修改堆上的 char32_t 数据

这次由于代码完整性保护的问题,我更进一步,在 godot 引擎加载时,将 trigger1.gd/gdc 重定向到 trigger2.gd/gdc 上,这样就可以不触发反调试的情况下完成对 flag 的获取

frida 代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
"use strict";

var LIB_NAME = "libsec2026.so";
var ANTI_DEBUG_THREAD_OFFSETS = [0x9C654, 0x9CDC4, 0x9B7D8];
var TIMESTAMP_OFFSET = 0x1834B8;

var libBase = null;
var blockedCount = 0;
var patchCount = 0;
var scanCount = 0;

var libc = {
pthread_create: Module.findExportByName("libc.so", "pthread_create"),
clock_gettime: Module.findExportByName("libc.so", "clock_gettime"),
read: Module.findExportByName("libc.so", "read"),
pread64: Module.findExportByName("libc.so", "pread64"),
open: Module.findExportByName("libc.so", "open"),
openat: Module.findExportByName("libc.so", "openat"),
fopen: Module.findExportByName("libc.so", "fopen"),
access: Module.findExportByName("libc.so", "access"),
faccessat: Module.findExportByName("libc.so", "faccessat"),
stat: Module.findExportByName("libc.so", "stat")
};

var libandroid = {
AAssetManager_open: Module.findExportByName("libandroid.so", "AAssetManager_open"),
AAssetManager_openDir: Module.findExportByName("libandroid.so", "AAssetManager_openDir")
};

var clock_gettime_fn = new NativeFunction(libc.clock_gettime, "int", ["int", "pointer"]);
var tsBuf = Memory.alloc(16);

var dummyThread = new NativeCallback(function (_arg) {
return ptr(0);
}, "pointer", ["pointer"]);

var REDIRECT_TARGET = "trigger2";
var redirectedCount = 0;
var redirectedStrings = [];

function nowMonotonicUs() {
clock_gettime_fn(1, tsBuf);
var sec = tsBuf.readU64().toNumber();
var nsec = tsBuf.add(8).readU64().toNumber();
return sec * 1000000 + Math.floor(nsec / 1000);
}

function updateTimestamp() {
if (libBase === null) return;
try {
libBase.add(TIMESTAMP_OFFSET).writeU64(nowMonotonicUs());
} catch (_e) {
}
}

function setLibBase(base) {
if (libBase !== null) return;
libBase = base;
console.log("[*] " + LIB_NAME + " base: " + libBase);
updateTimestamp();
}

function hookAntiDebug() {
setInterval(updateTimestamp, 2000);

Interceptor.attach(libc.pthread_create, {
onEnter: function (args) {
var startRoutine = args[2];

if (libBase !== null) {
var offFast = startRoutine.sub(libBase).toInt32();
if (offFast > 0 && offFast < 0x200000 &&
ANTI_DEBUG_THREAD_OFFSETS.indexOf(offFast) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + offFast.toString(16));
args[2] = dummyThread;
return;
}
}

try {
var mod = Process.findModuleByAddress(startRoutine);
if (mod !== null && mod.name === LIB_NAME) {
setLibBase(mod.base);
var off = startRoutine.sub(mod.base).toInt32();
if (ANTI_DEBUG_THREAD_OFFSETS.indexOf(off) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + off.toString(16));
args[2] = dummyThread;
}
}
} catch (_e) {
}
}
});

var libPoll = setInterval(function () {
var mod = Process.findModuleByName(LIB_NAME);
if (mod !== null) {
setLibBase(mod.base);
clearInterval(libPoll);
}
}, 1000);
}

function redirectPath(path) {
if (path === null) return null;
if (path.indexOf("trigger1.gd") === -1 && path.indexOf("trigger1.gdc") === -1) return null;
return path.replace(/trigger1/g, REDIRECT_TARGET);
}

function redirectCStringArg(args, index, apiName) {
try {
var oldPath = args[index].readCString();
var newPath = redirectPath(oldPath);
if (newPath === null || newPath === oldPath) return;

var p = Memory.allocUtf8String(newPath);
redirectedStrings.push(p);
args[index] = p;
redirectedCount++;
console.log("[redirect][" + apiName + "] " + oldPath + " -> " + newPath);
} catch (_e) {
}
}

function attachPathRedirect(name, address, pathArgIndex) {
if (address === null) return;
Interceptor.attach(address, {
onEnter: function (args) {
redirectCStringArg(args, pathArgIndex, name);
}
});
}

function hookPathRedirects() {
attachPathRedirect("open", libc.open, 0);
attachPathRedirect("openat", libc.openat, 1);
attachPathRedirect("fopen", libc.fopen, 0);
attachPathRedirect("access", libc.access, 0);
attachPathRedirect("faccessat", libc.faccessat, 1);
attachPathRedirect("stat", libc.stat, 0);
attachPathRedirect("AAssetManager_open", libandroid.AAssetManager_open, 1);
attachPathRedirect("AAssetManager_openDir", libandroid.AAssetManager_openDir, 1);
console.log("[+] lightweight path redirects installed: trigger1 -> " + REDIRECT_TARGET);
}

function hexPattern(s) {
var out = [];
for (var i = 0; i < s.length; i++) {
out.push(("0" + s.charCodeAt(i).toString(16)).slice(-2));
}
return out.join(" ");
}

function hexPatternUtf32(s) {
var out = [];
for (var i = 0; i < s.length; i++) {
out.push(("0" + s.charCodeAt(i).toString(16)).slice(-2));
out.push("00");
out.push("00");
out.push("00");
}
return out.join(" ");
}

var RESOURCE_PATCHES = [
{
name: "ascii trigger1.gdc",
pattern: hexPattern("trigger1.gdc"),
oneOffset: 7,
write: function (addr) { addr.add(7).writeU8(0x32); }
},
{
name: "ascii trigger1.gd",
pattern: hexPattern("trigger1.gd"),
oneOffset: 7,
write: function (addr) { addr.add(7).writeU8(0x32); }
},
{
name: "utf32 trigger1.gdc",
pattern: hexPatternUtf32("trigger1.gdc"),
oneOffset: 7 * 4,
write: function (addr) { addr.add(7 * 4).writeU32(0x32); }
},
{
name: "utf32 trigger1.gd",
pattern: hexPatternUtf32("trigger1.gd"),
oneOffset: 7 * 4,
write: function (addr) { addr.add(7 * 4).writeU32(0x32); }
}
];

function patchRange(base, size, source) {
if (size <= 0 || size > 0x2000000) return 0;
var local = 0;
for (var i = 0; i < RESOURCE_PATCHES.length; i++) {
var p = RESOURCE_PATCHES[i];
try {
var hits = Memory.scanSync(base, size, p.pattern);
for (var j = 0; j < hits.length; j++) {
try {
p.write(hits[j].address);
local++;
patchCount++;
console.log("[patch] " + p.name + " -> trigger2 @ " + hits[j].address + " (" + source + ")");
} catch (_e) {
}
}
} catch (_e) {
}
}
return local;
}

function patchReadableMemory(source) {
var total = 0;
scanCount++;
var ranges = Process.enumerateRanges({ protection: "rw-", coalesce: true });
for (var i = 0; i < ranges.length; i++) {
var r = ranges[i];
if (r.size < 16 || r.size > 0x2000000) continue;
total += patchRange(r.base, r.size, source);
}
if (total > 0) {
console.log("[+] patched resource bytes: " + total + " total=" + patchCount);
}
return total;
}

function hookResourceReads() {
function afterRead(buf, n, api) {
if (n <= 0 || n > 0x2000000) return;
patchRange(buf, n, api);
}

if (libc.read !== null) {
Interceptor.attach(libc.read, {
onEnter: function (args) {
this.buf = args[1];
},
onLeave: function (retval) {
afterRead(this.buf, retval.toInt32(), "read");
}
});
}

if (libc.pread64 !== null) {
Interceptor.attach(libc.pread64, {
onEnter: function (args) {
this.buf = args[1];
},
onLeave: function (retval) {
afterRead(this.buf, retval.toInt32(), "pread64");
}
});
}

if (libc.openat !== null) {
Interceptor.attach(libc.openat, {
onEnter: function (args) {
try {
var path = args[1].readCString();
if (path.indexOf("assets.sparsepck") !== -1 ||
path.indexOf(".gdc") !== -1 ||
path.indexOf(".scn") !== -1) {
console.log("[openat] " + path);
}
} catch (_e) {
}
}
});
}
}

function startPatchLoop() {
// Try immediately; then keep watching early scene/resource load.
patchReadableMemory("initial-scan");

var attempts = 0;
var timer = setInterval(function () {
attempts++;
patchReadableMemory("scan-" + attempts);
if (attempts >= 40) {
clearInterval(timer);
console.log("[*] resource patch loop stopped; total patches=" + patchCount);
console.log("[*] Now collide with the original Trigger1 block. If patchCount > 0, it should display PART1 via game logic.");
}
}, 500);
}

rpc.exports = {
patch: function () {
return patchReadableMemory("rpc");
},
stats: function () {
return {
libBase: libBase === null ? null : libBase.toString(),
blockedThreads: blockedCount,
patches: patchCount,
scans: scanCount,
redirects: redirectedCount
};
}
};

function main() {
console.log("[*] PART1 real-trigger script: no libsec2026.so inline hooks");
console.log("[*] Redirecting trigger1.gd/gdc resource path to trigger2.gd/gdc during load");
hookAntiDebug();
hookPathRedirects();
console.log("[*] Heavy memory/read-buffer scanning is disabled during startup.");
console.log("[*] If no redirect appears after the scene loads, run: await rpc.exports.patch()");
}

main();

PART2 flag

image-20260418103722535

part2 的 flag 获取思路与 part1 一致,只是重定向被改到 trigger3.gd/gdc 了

frida 代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
"use strict";


var LIB_NAME = "libsec2026.so";
var ANTI_DEBUG_THREAD_OFFSETS = [0x9C654, 0x9CDC4, 0x9B7D8];
var TIMESTAMP_OFFSET = 0x1834B8;

var libBase = null;
var blockedCount = 0;
var patchCount = 0;
var scanCount = 0;

var libc = {
pthread_create: Module.findExportByName("libc.so", "pthread_create"),
clock_gettime: Module.findExportByName("libc.so", "clock_gettime"),
read: Module.findExportByName("libc.so", "read"),
pread64: Module.findExportByName("libc.so", "pread64"),
open: Module.findExportByName("libc.so", "open"),
openat: Module.findExportByName("libc.so", "openat"),
fopen: Module.findExportByName("libc.so", "fopen"),
access: Module.findExportByName("libc.so", "access"),
faccessat: Module.findExportByName("libc.so", "faccessat"),
stat: Module.findExportByName("libc.so", "stat")
};

var libandroid = {
AAssetManager_open: Module.findExportByName("libandroid.so", "AAssetManager_open"),
AAssetManager_openDir: Module.findExportByName("libandroid.so", "AAssetManager_openDir")
};

var clock_gettime_fn = new NativeFunction(libc.clock_gettime, "int", ["int", "pointer"]);
var tsBuf = Memory.alloc(16);

var dummyThread = new NativeCallback(function (_arg) {
return ptr(0);
}, "pointer", ["pointer"]);

var REDIRECT_TARGET = "trigger3";
var redirectedCount = 0;
var redirectedStrings = [];

function nowMonotonicUs() {
clock_gettime_fn(1, tsBuf); // CLOCK_MONOTONIC
var sec = tsBuf.readU64().toNumber();
var nsec = tsBuf.add(8).readU64().toNumber();
return sec * 1000000 + Math.floor(nsec / 1000);
}

function updateTimestamp() {
if (libBase === null) return;
try {
libBase.add(TIMESTAMP_OFFSET).writeU64(nowMonotonicUs());
} catch (_e) {
}
}

function setLibBase(base) {
if (libBase !== null) return;
libBase = base;
console.log("[*] " + LIB_NAME + " base: " + libBase);
updateTimestamp();
}

function hookAntiDebug() {
setInterval(updateTimestamp, 2000);

Interceptor.attach(libc.pthread_create, {
onEnter: function (args) {
var startRoutine = args[2];

if (libBase !== null) {
var offFast = startRoutine.sub(libBase).toInt32();
if (offFast > 0 && offFast < 0x200000 &&
ANTI_DEBUG_THREAD_OFFSETS.indexOf(offFast) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + offFast.toString(16));
args[2] = dummyThread;
return;
}
}

try {
var mod = Process.findModuleByAddress(startRoutine);
if (mod !== null && mod.name === LIB_NAME) {
setLibBase(mod.base);
var off = startRoutine.sub(mod.base).toInt32();
if (ANTI_DEBUG_THREAD_OFFSETS.indexOf(off) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + off.toString(16));
args[2] = dummyThread;
}
}
} catch (_e) {
}
}
});

var libPoll = setInterval(function () {
var mod = Process.findModuleByName(LIB_NAME);
if (mod !== null) {
setLibBase(mod.base);
clearInterval(libPoll);
}
}, 1000);
}

function redirectPath(path) {
if (path === null) return null;
if (path.indexOf("trigger1.gd") === -1 && path.indexOf("trigger1.gdc") === -1) return null;
return path.replace(/trigger1/g, REDIRECT_TARGET);
}

function redirectCStringArg(args, index, apiName) {
try {
var oldPath = args[index].readCString();
var newPath = redirectPath(oldPath);
if (newPath === null || newPath === oldPath) return;

var p = Memory.allocUtf8String(newPath);
redirectedStrings.push(p);
args[index] = p;
redirectedCount++;
console.log("[redirect][" + apiName + "] " + oldPath + " -> " + newPath);
} catch (_e) {
}
}

function attachPathRedirect(name, address, pathArgIndex) {
if (address === null) return;
Interceptor.attach(address, {
onEnter: function (args) {
redirectCStringArg(args, pathArgIndex, name);
}
});
}

function hookPathRedirects() {
attachPathRedirect("open", libc.open, 0);
attachPathRedirect("openat", libc.openat, 1);
attachPathRedirect("fopen", libc.fopen, 0);
attachPathRedirect("access", libc.access, 0);
attachPathRedirect("faccessat", libc.faccessat, 1);
attachPathRedirect("stat", libc.stat, 0);
attachPathRedirect("AAssetManager_open", libandroid.AAssetManager_open, 1);
attachPathRedirect("AAssetManager_openDir", libandroid.AAssetManager_openDir, 1);
console.log("[+] lightweight path redirects installed: trigger1 -> " + REDIRECT_TARGET);
}

function hexPattern(s) {
var out = [];
for (var i = 0; i < s.length; i++) {
out.push(("0" + s.charCodeAt(i).toString(16)).slice(-2));
}
return out.join(" ");
}

function hexPatternUtf32(s) {
var out = [];
for (var i = 0; i < s.length; i++) {
out.push(("0" + s.charCodeAt(i).toString(16)).slice(-2));
out.push("00");
out.push("00");
out.push("00");
}
return out.join(" ");
}

var RESOURCE_PATCHES = [
{
name: "ascii trigger1.gdc",
pattern: hexPattern("trigger1.gdc"),
write: function (addr) { addr.add(7).writeU8(0x33); } // '1' -> '3'
},
{
name: "ascii trigger1.gd",
pattern: hexPattern("trigger1.gd"),
write: function (addr) { addr.add(7).writeU8(0x33); }
},
{
name: "utf32 trigger1.gdc",
pattern: hexPatternUtf32("trigger1.gdc"),
write: function (addr) { addr.add(7 * 4).writeU32(0x33); }
},
{
name: "utf32 trigger1.gd",
pattern: hexPatternUtf32("trigger1.gd"),
write: function (addr) { addr.add(7 * 4).writeU32(0x33); }
}
];

function patchRange(base, size, source) {
if (size <= 0 || size > 0x2000000) return 0;
var local = 0;

for (var i = 0; i < RESOURCE_PATCHES.length; i++) {
var p = RESOURCE_PATCHES[i];
try {
var hits = Memory.scanSync(base, size, p.pattern);
for (var j = 0; j < hits.length; j++) {
try {
p.write(hits[j].address);
local++;
patchCount++;
console.log("[patch] " + p.name + " -> trigger3 @ " + hits[j].address + " (" + source + ")");
} catch (_e) {
}
}
} catch (_e) {
}
}

return local;
}

function patchReadableMemory(source) {
var total = 0;
scanCount++;

var ranges = Process.enumerateRanges({ protection: "rw-", coalesce: true });
for (var i = 0; i < ranges.length; i++) {
var r = ranges[i];
if (r.size < 16 || r.size > 0x2000000) continue;
total += patchRange(r.base, r.size, source);
}

if (total > 0) {
console.log("[+] patched resource bytes: " + total + " total=" + patchCount);
}
return total;
}

function hookResourceReads() {
function afterRead(buf, n, api) {
if (n <= 0 || n > 0x2000000) return;
patchRange(buf, n, api);
}

if (libc.read !== null) {
Interceptor.attach(libc.read, {
onEnter: function (args) {
this.buf = args[1];
},
onLeave: function (retval) {
afterRead(this.buf, retval.toInt32(), "read");
}
});
}

if (libc.pread64 !== null) {
Interceptor.attach(libc.pread64, {
onEnter: function (args) {
this.buf = args[1];
},
onLeave: function (retval) {
afterRead(this.buf, retval.toInt32(), "pread64");
}
});
}

if (libc.openat !== null) {
Interceptor.attach(libc.openat, {
onEnter: function (args) {
try {
var path = args[1].readCString();
if (path.indexOf("assets.sparsepck") !== -1 ||
path.indexOf(".gdc") !== -1 ||
path.indexOf(".gd") !== -1 ||
path.indexOf(".scn") !== -1) {
console.log("[openat] " + path);
}
} catch (_e) {
}
}
});
}
}

function startPatchLoop() {
patchReadableMemory("initial-scan");

var attempts = 0;
var timer = setInterval(function () {
attempts++;
patchReadableMemory("scan-" + attempts);
if (attempts >= 50) {
clearInterval(timer);
console.log("[*] resource patch loop stopped; total patches=" + patchCount);
console.log("[*] Collide with the original Trigger1 block. If patchCount > 0, it should display PART2 through game logic.");
}
}, 500);
}

rpc.exports = {
patch: function () {
return patchReadableMemory("rpc");
},
stats: function () {
return {
libBase: libBase === null ? null : libBase.toString(),
blockedThreads: blockedCount,
patches: patchCount,
scans: scanCount,
redirects: redirectedCount
};
}
};

function main() {
console.log("[*] PART2 real-trigger script: no libsec2026.so inline hooks");
console.log("[*] Redirecting trigger1.gd/gdc resource path to trigger3.gd/gdc during load");
hookAntiDebug();
hookPathRedirects();
console.log("[*] Heavy memory/read-buffer scanning is disabled during startup.");
console.log("[*] If no redirect appears after the scene loads, run: await rpc.exports.patch()");
}

main();

PART3 flag

image-20260418231255344

在 gd 源码里可以看到,Trigger4 对应的可见 flag 逻辑不在 GDScript 层。即使把 Trigger1 重定向成 trigger4.gd,也只是让 Trigger1 每帧调用 _gx.Tick() 和动画函数,也不会显示 flag。

而且该脚本只是将 token 传入,主动触发 flag 生成函数 sub_A9A7C 得到 flag ,我并没有了解 part3 中从 token 到 flag 生成的全逻辑,所以无法做到在游戏内显示 flag 。

frida 代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
"use strict";

/*
* part3_verify_no_text.js
*
* Fourth trigger / source-named PART3 verifier.
*
* Strategy:
* - Reuse the bypass_v9 data-only anti-debug bypass.
* - Do not Interceptor.attach(), replace(), or patch anything inside libsec2026.so.
* - Call the hidden native VM generator directly:
* sub_A9A7C(token_ascii_8) -> char *suffix
*
* Usage:
* frida -U -f com.tencent.ACE.gamesec2026.final -l part3_verify_no_text.js --no-pause
*
* Manual REPL:
* await rpc.exports.part3("12345678")
* await rpc.exports.scan()
*/

var LIB_NAME = "libsec2026.so";

var OFFSETS = {
antiThreads: [0x9C654, 0x9CDC4, 0x9B7D8],
timestamp: 0x1834B8,
part3VmGenerator: 0xA9A7C
};

var libBase = null;
var blockedCount = 0;
var part3Core = null;
var shown = {};

var libc = {
pthread_create: Module.findExportByName("libc.so", "pthread_create"),
clock_gettime: Module.findExportByName("libc.so", "clock_gettime")
};

var clock_gettime_fn = new NativeFunction(libc.clock_gettime, "int", ["int", "pointer"]);
var tsBuf = Memory.alloc(16);

var dummyThread = new NativeCallback(function (_arg) {
return ptr(0);
}, "pointer", ["pointer"]);

function nowMonotonicUs() {
clock_gettime_fn(1, tsBuf); // CLOCK_MONOTONIC
var sec = tsBuf.readU64().toNumber();
var nsec = tsBuf.add(8).readU64().toNumber();
return sec * 1000000 + Math.floor(nsec / 1000);
}

function updateTimestamp() {
if (libBase === null) return;
try {
libBase.add(OFFSETS.timestamp).writeU64(nowMonotonicUs());
} catch (_e) {
}
}

function setLibBase(base) {
if (libBase !== null) return;
libBase = base;
part3Core = new NativeFunction(libBase.add(OFFSETS.part3VmGenerator), "pointer", ["pointer"]);
updateTimestamp();
console.log("[*] " + LIB_NAME + " base: " + libBase);
console.log("[+] part3/PART3 VM generator ready");
console.log(" sub_A9A7C @ " + libBase.add(OFFSETS.part3VmGenerator));
}

function waitForLibrary() {
var mod = Process.findModuleByName(LIB_NAME);
if (mod !== null) setLibBase(mod.base);
}

function computepart3Suffix(token) {
if (libBase === null || part3Core === null) {
throw new Error("libsec2026.so is not ready yet");
}
if (!/^[0-9a-fA-F]{8}$/.test(token)) {
throw new Error("token must be exactly 8 hex chars");
}

/*
* sub_A9A7C copies exactly 8 bytes from this pointer into its native
* scratch buffer. NUL termination is harmless and convenient here.
*/
var inBuf = Memory.allocUtf8String(token);
var ret = part3Core(inBuf);
if (ret.isNull()) throw new Error("sub_A9A7C returned NULL");
return ret.readCString();
}

function buildFlag(token) {
/*
* Source uses Trigger1..4 mapped to PART0..PART3. The fourth trigger is
* therefore PART3 even if we call it "part3" in notes.
*/
return "flag{sec2026_PART3_" + computepart3Suffix(token) + "}";
}

function showToast(text) {
Java.perform(function () {
var Toast = Java.use("android.widget.Toast");
var ActivityThread = Java.use("android.app.ActivityThread");
var app = ActivityThread.currentApplication();
if (app === null) return;
Java.scheduleOnMainThread(function () {
Toast.makeText(app.getApplicationContext(), Java.use("java.lang.String").$new(text), 1).show();
});
});
}

function emitFlag(token, source) {
token = token.toLowerCase();
if (shown[token]) return;
shown[token] = true;

try {
var suffix = computepart3Suffix(token);
var flag = "flag{sec2026_PART3_" + suffix + "}";
console.log("[part3/PART3][" + source + "] token=" + token);
console.log("[part3/PART3] suffix=" + suffix);
console.log("[part3/PART3] " + flag);
showToast(flag);
} catch (e) {
console.log("[part3/PART3] compute failed for token=" + token + ": " + e);
}
}

function hookPthreadCreate() {
Interceptor.attach(libc.pthread_create, {
onEnter: function (args) {
var startRoutine = args[2];

if (libBase !== null) {
var offFast = startRoutine.sub(libBase).toInt32();
if (offFast > 0 && offFast < 0x200000 && OFFSETS.antiThreads.indexOf(offFast) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + offFast.toString(16));
args[2] = dummyThread;
return;
}
}

try {
var mod = Process.findModuleByAddress(startRoutine);
if (mod !== null && mod.name === LIB_NAME) {
setLibBase(mod.base);
var off = startRoutine.sub(mod.base).toInt32();
if (OFFSETS.antiThreads.indexOf(off) !== -1) {
blockedCount++;
console.log("[pthread_create] BLOCKED #" + blockedCount + " @ 0x" + off.toString(16));
args[2] = dummyThread;
}
}
} catch (_e) {
}
}
});
}

function scanTokenInReadableMemory() {
waitForLibrary();
var found = [];
var ranges = Process.enumerateRanges({ protection: "r--", coalesce: true })
.concat(Process.enumerateRanges({ protection: "rw-", coalesce: true }));
var pattern = "54 6f 6b 65 6e 3a 20 ?? ?? ?? ?? ?? ?? ?? ??"; // "Token: " + 8 bytes

for (var i = 0; i < ranges.length; i++) {
if (found.length >= 8) break;
try {
var hits = Memory.scanSync(ranges[i].base, ranges[i].size, pattern);
for (var j = 0; j < hits.length && found.length < 8; j++) {
var s = hits[j].address.readCString(32);
var m = /Token:\s*([0-9a-fA-F]{8})/.exec(s);
if (m) {
found.push(m[1].toLowerCase());
emitFlag(m[1], "memory-scan");
}
}
} catch (_e) {
}
}
return found;
}

rpc.exports = {
part3: function (token) {
waitForLibrary();
var flag = buildFlag(token);
showToast(flag);
return flag;
},
scan: function () {
return scanTokenInReadableMemory();
}
};

function main() {
console.log("[*] part3 verifier: no libsec2026.so .text patching");
hookPthreadCreate();
setInterval(updateTimestamp, 2000);
setInterval(waitForLibrary, 500);

// Give Godot time to create the label, then scan once.
setTimeout(function () {
if (Object.keys(shown).length === 0) scanTokenInReadableMemory();
}, 10000);

console.log("[+] Hooks installed");
}

main();

注意该脚本需要手动将 token 传入

image-20260419213833125

未实现的点

  1. 反调试部分有遗漏或分析错误
  2. 不太熟悉 godot ,导致没有让 part3 隐形方块显形

总结

第一次参赛就拿奖了,感谢腾讯,我们游戏安全行业峰会见!