Ingestion
Platform adapters transform scattered digital footprints into a unified timeline. Each adapter knows how to authenticate, paginate, and deduplicate its source — you just point Syke at your data and it handles the rest.
Privacy by design. The content filter runs before events enter SQLite. Credentials and private messages never touch the database. Content that never enters the timeline can never leak to an LLM.
Adapter Pattern
All adapters inherit from BaseAdapter:
class BaseAdapter(ABC):
source: str # Override in subclass: "chatgpt", "github", etc.
def __init__(self, db: SykeDB, user_id: str):
...
@abstractmethod
def ingest(self, **kwargs) -> IngestionResult:
...Events are stored in SQLite with deduplication by external_id. Re-ingesting the same source is safe — duplicates are skipped.
Content Filter
Runs before events enter the database. Two checks:
- Credential patterns: API keys, tokens, passwords, SSH keys stripped via regex
- Private messaging: WhatsApp logs, DMs detected and skipped entirely
Adapters
Claude Code
Dual-store adapter. Reads both:
- Project-level transcripts (
.claude/in each project) - Global transcript store (
~/.claude/)
Uses DFS path resolution to map transcript IDs to project directories. Produces one event per session — not per message.
ChatGPT
Parses the ZIP export from ChatGPT’s data export feature. Reads conversations.json from the ZIP, extracts conversations, timestamps, and topics.
GitHub
REST API with pagination. Reads repos, issues, PRs, stars, README content, and activity events (which include push events with commit data). There is no dedicated commit API call — commit information comes from PushEvent payloads in the events feed. Works with public data by default; add GITHUB_TOKEN for private repos.
Gmail
OAuth adapter with two auth strategies:
gogCLI tool (if installed)- Python
google-auth-oauthlibfallback
Reads subjects, snippets, and labels. Message bodies are not stored — only metadata.