Claude Code Conversation Features¶
This document describes the structure of Claude Code's conversation logs stored in ~/.claude/ and how to extract them for training data generation.
Directory Structure¶
~/.claude/
├── projects/ # Primary conversation logs (JSONL per project)
│ ├── -home-user-project-a/
│ │ ├── <session-uuid>.jsonl # Main conversation threads
│ │ ├── agent-<id>.jsonl # Subagent/Task logs
│ │ └── <uuid>/ # Session subdirectories (plans, etc.)
│ └── -home-user-project-b/
├── history.jsonl # User input history (prompts only)
├── debug/ # Debug logs (timestamps, errors, LSP events)
├── todos/ # Session todo lists
├── plans/ # Planning mode artifacts
├── session-env/ # Session environment snapshots
├── file-history/ # File edit history
├── __store.db # SQLite database (additional metadata)
└── settings.json # User settings
JSONL Message Types¶
Each line in a project's JSONL file is a JSON object with a type field:
type: "user" - User Messages¶
{
"type": "user",
"parentUuid": null,
"uuid": "59aa974c-9d31-4f6e-a557-28cec9d76aae",
"sessionId": "81adc7f6-f9aa-4c3d-8b25-bfe26e9dfd02",
"timestamp": "2026-01-08T20:16:04.050Z",
"cwd": "/home/user/projects/myapp",
"userType": "external",
"isSidechain": false,
"message": {
"role": "user",
"content": "Can you help me fix this bug?"
},
"thinkingMetadata": {
"level": "high",
"disabled": false,
"triggers": [{"start": 8, "end": 18, "text": "ultrathink"}]
},
"todos": []
}
| Field | Description |
|---|---|
uuid |
Unique message identifier |
parentUuid |
Links to previous message (conversation threading) |
sessionId |
Session identifier (groups messages) |
cwd |
Working directory when message was sent |
isSidechain |
true if message is in a branched conversation |
thinkingMetadata |
Extended thinking triggers (ultrathink, megathink, etc.) |
type: "assistant" - Assistant Responses¶
{
"type": "assistant",
"parentUuid": "59aa974c-9d31-4f6e-a557-28cec9d76aae",
"uuid": "e6f27af8-21bd-467f-8af1-d0d1a67cb5a9",
"sessionId": "81adc7f6-f9aa-4c3d-8b25-bfe26e9dfd02",
"timestamp": "2026-01-08T20:16:09.634Z",
"requestId": "req_011CWvXT4MU98Vms9qEQtimT",
"message": {
"model": "claude-opus-4-5-20251101",
"id": "msg_01Miof2hsFjpDwyyDSJaoEVf",
"role": "assistant",
"content": [
{"type": "thinking", "thinking": "Let me analyze this bug..."},
{"type": "text", "text": "I'll investigate the issue."},
{"type": "tool_use", "id": "toolu_013...", "name": "Read", "input": {"file_path": "/path/to/file"}}
],
"usage": {
"input_tokens": 10,
"output_tokens": 6,
"cache_read_input_tokens": 12942
}
}
}
Content Block Types¶
| Block Type | Description |
|---|---|
thinking |
Claude's extended thinking/reasoning (when triggered) |
text |
Regular text response |
tool_use |
Tool invocation (Bash, Read, Edit, Write, Grep, etc.) |
type: "summary" - Session Summaries¶
{
"type": "summary",
"summary": "Fix authentication bug in login flow",
"leafUuid": "c232ccac-d870-4f2b-896d-85b67105f2ac"
}
Auto-generated summaries that describe the conversation topic.
type: "file-history-snapshot" - File State¶
{
"type": "file-history-snapshot",
"messageId": "59aa974c-9d31-4f6e-a557-28cec9d76aae",
"snapshot": {
"trackedFileBackups": {},
"timestamp": "2026-01-08T20:16:04.054Z"
}
}
Tracks file states for undo/restore functionality.
Tool Use Reference¶
Common tools found in tool_use blocks:
| Tool | Description |
|---|---|
Read |
Read file contents |
Write |
Create/overwrite files |
Edit |
Edit existing files (find/replace) |
Bash |
Execute shell commands |
Grep |
Search file contents |
Glob |
Find files by pattern |
Task |
Launch subagent for complex tasks |
WebFetch |
Fetch and process web content |
WebSearch |
Search the web |
TodoWrite |
Manage task lists |
AskUserQuestion |
Ask user for clarification |
Extraction Script¶
Use scripts/extract_conversations.py to extract training data:
Basic Usage¶
# View statistics about your conversation data
python scripts/extract_conversations.py --stats
# Extract all messages to JSONL
python scripts/extract_conversations.py -o training_data.jsonl
# Extract conversation pairs (user prompt + assistant response)
python scripts/extract_conversations.py -f pairs -o pairs.jsonl
# Export in ShareGPT format for fine-tuning
python scripts/extract_conversations.py -f sharegpt -o sharegpt.jsonl
Filtering Options¶
# Filter by project name
python scripts/extract_conversations.py -p cbos -o cbos_only.jsonl
# Filter by date range
python scripts/extract_conversations.py --after 2025-01-01 --before 2025-02-01
# Include Claude's thinking blocks (extended reasoning)
python scripts/extract_conversations.py --include-thinking -o with_thinking.jsonl
# Include branched/sidechain conversations
python scripts/extract_conversations.py --include-sidechains
Output Formats¶
| Format | Description | Use Case |
|---|---|---|
jsonl |
One message per line | Raw data analysis |
pairs |
User/assistant pairs | Supervised fine-tuning |
conversations |
Full threaded conversations | Context-aware training |
sharegpt |
ShareGPT format | Compatible with training frameworks |
Example Output¶
Pairs Format:
{
"user_message": "How do I implement authentication?",
"assistant_response": "I'll help you implement authentication...",
"thinking": "Let me analyze the current auth setup...",
"tool_uses": [{"tool_name": "Grep", "input_data": {"pattern": "auth"}}],
"project": "home/user/myapp",
"session_id": "abc123",
"timestamp": "2025-01-08T20:16:04.050Z"
}
ShareGPT Format:
{
"conversations": [
{"from": "human", "value": "How do I implement authentication?"},
{"from": "gpt", "value": "I'll help you implement authentication..."}
],
"source": "claude-code:home/user/myapp",
"metadata": {
"session_id": "abc123",
"has_thinking": true,
"tool_count": 3
}
}
Training Data Considerations¶
Quality Signals¶
- Thinking blocks: Higher quality reasoning, good for chain-of-thought training
- Tool uses: Demonstrates agentic behavior patterns
- Session summaries: Useful for task classification
- Project context: Group by domain for specialized models
Filtering Recommendations¶
- Exclude sidechains - These are abandoned conversation branches
- Include thinking - Valuable for reasoning capabilities
- Filter by project - Create domain-specific datasets
- Date filtering - Use recent data for current patterns
Privacy Notes¶
- Conversation logs may contain sensitive code and paths
- Review extracted data before sharing
- Consider anonymizing project paths and personal identifiers
Related Files¶
history.jsonl- Simple input history (prompts only, no responses)debug/*.txt- Debug logs with timestamps, errors, performance data__store.db- SQLite database (can query with standard tools)
Statistics Example¶
$ python scripts/extract_conversations.py --stats -v
=== Claude Code Conversation Statistics ===
Total Messages: 45,231
User: 12,847
Assistant: 32,384
With Thinking: 2,156
Projects: 134
home/user/projects/myapp: 8,432
home/user/projects/api: 5,221
...
Tool Uses: 89,432
Read: 23,456
Bash: 18,234
Edit: 15,678
Grep: 12,345
...
Date Range: 2024-12-01 to 2025-01-08