SafeDisk AI

AI Agent Transcript JSONL Retention Policy

Long-running agent sessions, group chats, and topic-bound workers can append every message, tool call, and tool result into active transcript `.jsonl` files. When those files are uncapped, startup, maintenance, and gateway reads can burn CPU on an ever-growing tail.

No private transcript contents, tokens, prompts, or chat logs. File paths, sizes, config keys, and failure symptoms are enough to scope the policy.

$99 AI tool storage policy

Choose the raw transcript cap, compaction, and disk-budget behavior before the next outage.

Use this when `.jsonl` session transcripts, tool results, replay state, or topic-bound group sessions grow past the point where ordinary session pruning can protect the gateway.

cap transcript bytes + truncate tool output + test oversized replay
Read-only evidence

Measure transcript size, line count, tool-result growth, and replay cost first.

These checks keep the discussion public-safe. They do not require transcript contents, secrets, prompts, or user messages.

find sessions -name '*.jsonl' -size +10M -print
Request $99 policy Request $29 incident read

Runbook: Cap The Transcript, Not Just The Session Index

  1. Separate index retention from transcript retention. A `sessions.json` rotation or entry cap does not prove active `.jsonl` transcripts are bounded.
  2. Choose a product policy: raw transcript rotation, compaction-only with successor transcript, or both.
  3. Truncate or summarize large tool results before writing them to transcript files.
  4. Protect active preserved sessions in the disk budget. Long-lived human topic sessions are often exactly the files that ordinary pruning skips.
  5. Add a migration path for existing oversized transcripts: archive, summarize, split, or mark as cold before gateway startup scans them repeatedly.
  6. Test with a deterministic oversized `.jsonl` fixture and assert maintenance/startup work is bounded.
Copy-ready issue reply

Use this when agent transcripts grow without bound.

This keeps the maintainer conversation on product decisions and acceptance tests rather than local cleanup alone.

I would split the fix into a product decision and a bounded-maintenance regression.

Acceptance checks I would want before closing this:

- Make the config surface explicit: sessions.json retention is separate from active transcript .jsonl retention.
- Pick one supported policy: raw transcript rotation, compaction-only successor transcripts, or both.
- Add an oversized .jsonl fixture and prove startup/maintenance does not repeatedly scan the full unbounded tail.
- Truncate or summarize large tool results before append, so one noisy tool call cannot create a 100 MB active transcript.
- Make the disk-budget path account for protected long-lived topic/group sessions, not only evictable inactive sessions.
- Document migration behavior for existing oversized transcripts: archive, summarize, split, or leave intentionally uncapped.
Request policy review

Do Not Delete First