핵심 메시지 — 시작하기 전에 The message — before everything

RULE
No.2

Don't save tokens. Protect context.
토큰을 아끼려다 컨텍스트를 잃으면, 더 비싸진다. Don't save tokens. Protect context.
Trimming tokens to save context costs you more.

입력 100K 토큰 절감Save 100K input tokens

+$0.25 절감+$0.25 saved

Opus 4.8 기준 · 2026-06Opus 4.8 · 2026-06

출력 100K 토큰 절감Save 100K output tokens

+$1.25 절감+$1.25 saved

같은 노력, 5배 효과Same effort, 5x impact

출력/입력 가격 배율은 모든 현행 모델에서 동일하게 5배. 실측 32일 출력/입력 볼륨 비율: 6.64배 실측. 비용의 86.9%는 출력이 만든다. Output/input price ratio is exactly 5x across all current models. Measured 32-day output/input volume ratio: 6.64x real data. 86.9% of cost comes from output.

토큰의 일생 — 5단계 파이프라인 A token's life — 5-stage pipeline

단계마다 다른 비용, 다른 개입점. Each stage has a different cost, different lever.

01

세션 시작Session start

도구 스키마
~8–10K 고정
CLAUDE.md 상시Tool schemas
~8–10K fixed
CLAUDE.md always

챕터 01Chapter 01

02

캐시 / 입력Cache / input

히트 0.1x
쓰기 1.25x
히트율 98.7%Hit 0.1x
Write 1.25x
Hit rate 98.7%

챕터 02Chapter 02

03

도구 결과Tool results

사라지지
않는 누적
-48% 가능Permanent
accumulation
-48% possible

챕터 03Chapter 03

04

출력 생성Output gen

가격 5x
볼륨 6.64x
캐시 없음Price 5x
Volume 6.64x
No cache

챕터 04Chapter 04

05

다음 턴Next turn

컨텍스트 로트
어텐션 희석
품질 저하Context rot
Attention dilution
Quality drop

챕터 05Chapter 05

비용이 어디서 나오는지 알면 — 어디를 눌러야 할지가 보인다. Know where cost comes from — you know where to push.

챕터 01 — 세션이 열리는 순간 Chapter 01 — The moment a session opens

~10K

도구 스키마만으로 선점되는 토큰 · 첫 메시지 전에 (2026-06 기준) Tokens occupied by tool schemas alone — before your first message (2026-06)

전체 세션 시작 비용 ~24K ÷ Sonnet 4.6 컨텍스트 200K = ~12% 첫 턴부터 이미 사용 Total session start ~24K ÷ Sonnet 4.6 context 200K = ~12% consumed before turn 1

구성 요소Component	토큰Tokens	성격Type
tool_choice 오버헤드overhead	497–589	고정fixed
bash 도구 정의tool def	245	고정fixed
text_editor 도구 정의tool def	700	고정fixed
나머지 도구 ~17개~17 other tools	~6,800	고정fixed
CLAUDE.md	가변variable	상시always
MCP 서버 스키마server schemas	가변variable	조건부conditional
스킬 디스크립션Skill descriptions	0 (요청 전before call)	on-demandon-demand

첫 턴 캐시 쓰기 1.25x → 이후 0.1x 캐시 읽기. 도구를 줄이지 말고, 무엇을 추가로 얹나를 조심하라. 점진적 공개 원칙. First turn: cache write 1.25x → all subsequent: 0.1x. Don't remove tools — be careful about what you add on top. Progressive disclosure principle.

챕터 02 — 캐싱 경제학 Chapter 02 — Cache economics

32일 $758 절감 — 캐시 히트 98.7% 실측 32 days, $758 saved — 98.7% cache hit real data

유형Type	배율Multiplier	Opus 4.8 단가Opus 4.8 price	손익분기Break-even
표준 입력Standard input	1.0x	$5.00 / MTok	—
5분 캐시 쓰기5-min cache write	1.25x	$6.25	1회 히트1 hit
1시간 캐시 쓰기1-hr cache write	2.0x	$10.00	2회 히트2 hits
캐시 읽기 (히트)Cache read (hit)	0.1x	$0.50	90% 절감90% saved

캐시 무효화 트리거Cache invalidation triggers

모델 전환 · effort 변경 · MCP 연결/해제
bare tool deny 변경 · /compact · CC 업그레이드 Model switch · effort change · MCP toggle
bare tool deny change · /compact · CC upgrade

실측 절감 (32일)Measured savings (32 days)

$758.63 절감$758.63 saved

캐싱 없었으면 $2,174 → 실측 $1,415 → 35% 절감Without cache: $2,174 → actual: $1,415 → 35% saved

[실측: 실험E] ccusage 32일 분석. 출처: ccusage v20.0.6 · 2026-06-01. 최소 캐시 길이: Opus 4.x 4,096 토큰, Sonnet 4.6 1,024 토큰. [real data: Exp-E] ccusage 32-day analysis. Source: ccusage v20.0.6 · 2026-06-01. Min cacheable: Opus 4.x 4,096 tokens, Sonnet 4.6 1,024 tokens.

챕터 03 — 도구 결과 누적 Chapter 03 — Tool result accumulation

사라지지 않는 결과, -48% 가능. Results don't disappear — -48% possible.

Baseline (clearing 없음)Baseline (no clearing)

335,279 토큰tokens

도구 결과 무한 누적 · 컨텍스트 로트 가속Permanent accumulation · context rot accelerates

tool result clearing (베타) tool result clearing (beta)

173,137 토큰tokens

-48.4% · 서버사이드 · 클라이언트 히스토리 보존-48.4% · server-side · client history preserved

로그 통째 출력 — cat bigfile.logDumping full logs — cat bigfile.log
큰 출력은 head / grep / tail로 제한Limit big outputs with head / grep / tail
1회 출력 5,000+ 토큰이면 다른 도구로 분할If one output is 5,000+ tokens, split across tools
20턴 이상 끌고 가지 않는다Don't drag sessions past 20 turns

[출처: Anthropic Cookbook] clearing은 캐시 프리픽스 무효화 → 재쓰기 1.25x. 누적 비용 공식: 결과 토큰 × 남은 턴 수. 손익분기 필수 계산. [Source: Anthropic Cookbook] Clearing invalidates cache prefix → rewrite 1.25x. Accumulation formula: result tokens × remaining turns. Always calculate break-even.

챕터 04 — 출력 토큰 경제학 Chapter 04 — Output token economics

6.64x 실측

실측 출력/입력 볼륨 비율 · 32일 · 가격은 5배 · 실질 효과는 훨씬 크다Measured output/input volume ratio · 32 days · price is 5x · real impact is much bigger

비교Comparison	입력Input	출력Output	배율Ratio
단가 (전 모델)Price (all models)	1x	5x	5x
실측 볼륨 비율Measured volume ratio	1x	6.64x	6.64x
실측 비용 기여Measured cost share	14%	86%	—
캐시 적용 여부Cache applicable	가능 (0.1x)Yes (0.1x)	불가No	—
Edit vs Write (500줄 파일 3줄 수정)(3-line fix in 500-line file)	—	—	~15x

출력 다이어트는 입력 다이어트의 13배 효과. 입력 최적화 하한선: 0.1x (캐시 히트). 출력 최적화 하한선: 0 (생성 안 함). Output diet is 13x more effective than input diet. Input optimization floor: 0.1x (cache hit). Output floor: 0 (don't generate).

챕터 05 — 컨텍스트 로트 Chapter 05 — Context rot

긴 세션은 왜 멍청해지나. Why long sessions go dumb.

Attention dilution — 어텐션 분산 Attention dilution

1/10

N=10k → N=100k 시 토큰당 어텐션Token attention at N=10k vs N=100k

Lost-in-the-middle — 중간 정보 손실 Lost-in-the-middle

-30%+

중간 정보 회상 정확도 하락 [Chroma 18개 모델]Middle-context recall accuracy drop [Chroma 18 models]

마라톤 세션 — 100턴+ → 어텐션 효율 50%Marathon session — 100+ turns → 50% attention efficiency
작업 1개 = 세션 1개. 끝나면 /clear1 task = 1 session. Done → /clear
큰 그림 전환 → /compact (lossy OK)Big-picture switch → /compact (lossy OK)
이전 결정 폐기 → /rewindDiscard a decision → /rewind

자동 컴팩션 트리거 ~95%. 즉 950k 도달 전 자발적 정리 필요. 중요 수치는 반드시 파일로 저장. Auto-compaction at ~95%. Self-clear before 950k. Always save key numbers to files.

챕터 06 · Layer 1 — 요청 레벨 Chapter 06 · Layer 1 — Request level

라우팅 결정이 청구서의 75%를 결정한다. Routing decides 75% of your bill.

Opus 4.x

74.8% 실측

Sonnet 4.6

16.2%

Haiku 4.5

9.0%

작업 유형Task type	추천 모델Recommended	Opus 대비 절감Savings vs Opus
복잡한 추론 · 심층 분석 · 1M+ 입력Complex reasoning · deep analysis · 1M+ input	Opus 4.8	—
코드 생성 · 리팩터링 · 일반 작업Code gen · refactoring · general tasks	Sonnet 4.6	입력 40% · 출력 40%input 40% · output 40%
분류 · 포맷 변환 · 단순 필터링Classification · format convert · simple filter	Haiku 4.5	입력 80% · 출력 80%input 80% · output 80%

Opus 74.8% → 50%로 라우팅하면 43% 절감 가능. Opus 손익분기: Sonnet 대비 1.67배 이상 품질 향상 필요. Opus 4.7+ 신규 토크나이저 → 동일 텍스트 최대 ~35% 토큰 증가. Routing Opus 74.8% → 50% saves 43%. Opus break-even: 1.67x quality gain over Sonnet needed. Opus 4.7+ new tokenizer → same text up to ~35% more tokens.

챕터 07 · Layer 2 — 세션 레벨 Chapter 07 · Layer 2 — Session level

/clear vs /compact vs /rewind — 언제 무엇을. /clear vs /compact vs /rewind — when to use what.

/clear

언제: 작업 완료 후, 새 주제로 전환When: task done, switching topics

컨텍스트 완전 리셋. lossy 없음.Full context reset. No loss.

/compact

언제: 큰 그림 전환, context rot 방지When: big-picture switch, context rot prevention

요약 압축. lossy — 디테일 소실.Summary compression. Lossy — details lost.

/rewind

언제: 결정 폐기, 특정 시점으로 복원When: discard decision, restore to point

시점 이후 드롭. 코드 스냅샷 복원.Drop turns after point. Code snapshot restored.

마라톤 세션 = 어텐션 효율 50%. 후반부 작업은 사실상 2배 비싸다. Marathon session = 50% attention efficiency. Later tasks are effectively 2x more expensive.

세션 중 모델 전환 · MCP 변경 → 캐시 무효화. 반드시 세션 경계에서 설정 변경. Model switch or MCP change mid-session → cache invalidation. Always change settings at session boundaries.

챕터 08 · Layer 3 — 시스템 레벨 Chapter 08 · Layer 3 — System level

서브에이전트 = 컨텍스트 방화벽. Subagent = context firewall.

직접 실행 (메인 컨텍스트)Direct execution (main context)

~3,000 토큰tokens

결과 전체가 메인에 적재Full results load into main

Agent 위임 (메인 격리)Agent delegation (isolated)

~250 토큰tokens

메인 컨텍스트 -92%. 결과는 서브에이전트 안에서 소화.Main context -92%. Results stay inside subagent.

35%

캐시 설계Cache design

캐시를 깨지 않는 세션 아키텍처. 32일 실측 $758 절감. 세션 중 설정 변경 금지.Session architecture that preserves cache. $758 saved in 32 days. No mid-session config changes.

92%

서브에이전트 격리Subagent isolation

메인 컨텍스트 오염 방지. 결과×남은턴 > 10K일 때 위임. 총 토큰 절감 도구가 아님.Prevents main context pollution. Delegate when result×remaining_turns > 10K. Not a total-token saver.

100%

훅 자동화Hook automation

결정론적 작업(lint·통계·포맷)을 PostToolUse 훅으로. 모델 호출 없음 = 토큰 0.Deterministic tasks (lint·stats·format) via PostToolUse hooks. No model call = 0 tokens.

점진적 공개: CLAUDE.md (상시) → 스킬 (on-demand) → 파일 직접 읽기 (필요시). 손익분기: 결과 × 남은 턴 > 5,000~10,000 토큰이면 위임이 이득. Progressive disclosure: CLAUDE.md (always) → skills (on-demand) → direct file reads (when needed). Break-even: result × remaining turns > 5,000–10,000 tokens → delegate.

챕터 09 — 안티패턴 TOP 5 Chapter 09 — Anti-patterns TOP 5

같은 실수가 반복된다. 이름을 붙이면 피할 수 있다. The same mistakes repeat. Name them to avoid them.

01

무조건 OpusAlways Opus

Opus 74.8% 점유. 50%로 낮추면 43% 절감.Opus at 74.8%. Lower to 50% → 43% savings.

→ Sonnet 4.6 우선, 1M+ 분석만 Opus→ Sonnet 4.6 first, Opus only for 1M+ analysis

02

마라톤 세션Marathon sessions

100턴+ → 어텐션 효율 50% → 후반 작업 2x 비쌈.100+ turns → 50% attention → later tasks 2x cost.

→ 작업 1개 = 세션 1개 + /clear→ 1 task = 1 session + /clear

03

로그 통째 출력Full log dumps

5,000 토큰 × 20턴 = 100K 어텐션 부담.5,000 tokens × 20 turns = 100K attention burden.

→ head / grep / tail 필터 우선→ head / grep / tail filters first

04

Write 남용Overusing Write

Edit=200 tok vs Write=3,000 tok. 500줄 파일 3줄 수정 = 15배 낭비.Edit=200 tok vs Write=3,000 tok. 3-line fix in 500-line file = 15x waste.

→ 수정은 Edit, 신규 파일만 Write→ Edit for changes, Write for new files only

05

거대 CLAUDE.mdBloated CLAUDE.md

5k 토큰 = 매 턴 어텐션 1/5. 규칙 1,000개 ≈ 규칙 0개.5k tokens = 1/5 attention per turn. 1,000 rules ≈ 0 rules.

→ CLAUDE.md ≤ 1k, 나머지는 스킬로→ CLAUDE.md ≤ 1k, rest to skills

[실측: 실험E] $1,415/월 기준 — 5개 안티패턴 제거 시 $373~$1,015/월 절감 가능 (26~72%). 출처: ccusage 32일 분석. [real data: Exp-E] $1,415/month baseline — removing 5 anti-patterns saves $373–$1,015/month (26–72%). Source: ccusage 32-day analysis.

챕터 10 — 의사결정 치트시트 Chapter 10 — Decision cheatsheet

상황 → 레버, 30초 안에. Situation → lever in 30 seconds.

$758 실측

캐시 히트 98.7%
한 달 절감98.7% cache hit
one-month savings

2순위: 캐시 보존Priority 2: Preserve cache

13x

출력 다이어트
vs 입력 다이어트 효과Output diet vs
input diet impact

1순위: 출력 다이어트Priority 1: Output diet

-92%

서브에이전트 격리
메인 컨텍스트 절감Subagent isolation
main context savings

3순위: 컨텍스트 관리Priority 3: Context mgmt

상황Situation	1순위 도구Top lever	절감 규모Impact	근거Source
파일 일부 수정Partial file edit	Edit (diff만)(diff only)	~15x 출력 절감output saved	출력 5x 단가output 5x price
컨텍스트 과부하Context overload	tool result clearing	피크 -48%peak -48%	Anthropic Cookbook
세션 전환 (연속)Session continue	/compact (lossylossy)	-50%	실측measured
세션 전환 (새 작업)New task	/clear	완전 초기화full reset	—
대형 탐색 격리Large exploration	서브에이전트 위임Subagent delegation	메인 -92%main -92%	실측measured
단순 분류 · 포맷Simple classify / format	Haiku 4.5	Opus 대비vs -80%	가격 구조pricing

Rule No.2 — Don't save tokens. Protect context. · Claude Code 102 · 2026-06 · 모든 수치 출처: reference/, experiments/ Rule No.2 — Don't save tokens. Protect context. · Claude Code 102 · 2026-06 · All figures: reference/, experiments/