Skills · Workflow
researcher
/researcherDeep-research workflow for UGC reference material — turns one or more URLs / handles / trend queries into a single cited research report (report.md + sources.json) that a scenar…
SKILL.mdTrigger refinements
ALSO FIRE if the message contains a URL on tiktok / instagram / youtube / youtu.be / x / twitter / reddit / facebook AND asks anything analytical about it (in any language).
DO NOT FIRE for rendered-mp4 quality checks (that is /evaluator), for raw downloads-only (use ralphy ref pull directly), or once a scenario is already locked and the user wants prompts / assets (handback to scenarist / art-director).
Hard invariants
- Never ask the user to "send the file" if the URL is on a social platform — WebFetch returns a JS shell on those, but
ralphy ref pull(yt-dlp) gets the mp4. - All LLM / vision calls route through
cli/lib/providers/llm.ts → callLLM()via the CLI. Don't paste raw OpenRouter / yt-dlp code into ad-hoc scripts.
researcher
You take open-ended reference material — URLs, handles, trend queries — and produce a single deep-research document with cited sources. The contract is: the report is the handoff. A scenarist reading report.md should not need to open the source URLs again to write the scenario.
What this skill is not
- Not a quality checker for rendered mp4s. For "is this video good / find issues" use
/evaluator. - Not a one-off downloader. For "just give me the mp4 from this URL" use
ralphy ref pull <url>directly. - Not a scenario writer. The report ends in handoff — the scenarist consumes
sources.jsonand writes from there.
The workflow
Four CLI verbs cover the loop. Don't skip steps — the synthesis step depends on the per-source ref chain having run.
# 1. Start a topic (creates .ralphy/research/<slug>/state.json)
ralphy research start <topic-slug> --question "<the research question>"
# 2. Add each source (full ref chain: pull → frames → transcribe → analyze → audio-describe → blueprint)
ralphy research add-source <url> --topic <slug>
# 3. Cross-source LLM synthesis → report.md + sources.json
ralphy research synthesize <slug>
# 4. Inspect at any time
ralphy research show <slug>
ralphy research list
add-source is idempotent: if a URL was already pulled into .ralphy/references/<refSlug>/, the chain detects the existing artifacts and skips them. Re-running synthesize after adding more sources updates report.md in place.
Useful flags:
--meta-onlyonadd-source— only record the URL + yt-dlp metadata, skip the heavy chain (use when the source is text-only or you only need title / author).--frames <n>onadd-source— cap sampled frames (default 12). Bump higher for long-form video.--model <id>onsynthesize— override the synthesis model (defaultgoogle/gemini-2.5-flash).
How to read the report
Two files written to .ralphy/research/<slug>/:
report.md— narrative deep-research doc. Sections: Executive Summary, Key Findings (with[^N]footnotes), Patterns Across Sources, Actionable Recommendations (split for scenarist / art-director), Open Questions, Sources. Show this one to the user.sources.json— machine contract for downstream skills. Each source hasid(footnote),url,title,refSlug(pointing at raw artifacts in.ralphy/references/<refSlug>/),blueprintPath, andkeyFindings[](3–8 short bullets distilled by the synthesis step). Schema:references/report-schema.md.
The report cites every claim. If the model produces an uncited bullet, the synthesis is suspect — re-run with a different model or check that all sources reached status: "analyzed" via ralphy research show.
When to keep the playbook in mind
docs/playbooks/researcher.md is the tool-deep-dive companion. Read it when:
- The source URL needs a non-standard pull (Playwright for JS-heavy landing pages, manual yt-dlp flags for region-locked content) — see
docs/playbooks/researcher/yt-dlp.mdandsite-extract.md. - The user asks for trend discovery, not a fixed source list —
discover-trendssub-task usesralphy ref scrape-trendsto find hashtag candidates, then loops them throughadd-source. - The user wants viral-moment extraction from a long-form video — that is a different sub-task with its own sub-doc (
viral-moments.md); the topic-level synthesis here is not the right fit.
This skill is the workflow contract. The playbook is the tool encyclopedia. Don't duplicate; cross-reference.
Workflow
- Confirm the question. A topic without a research question produces a vague report. If the user dropped a URL with no framing, ask one short clarifying question ("what are you trying to learn from this?") before starting.
research startwith a slug derived from the question and a--questionflag.- For each URL / handle:
research add-source. If the URL is a profile/handle (no direct video), first useralphy ref scrape-trendsor list the top videos and add them one by one. Stop adding sources at 6–8 — diminishing returns and the synthesis context fills up. research synthesizeonce all sources reachanalyzed. If any source failed (status: "failed"), surface the error to the user and decide whether to retry or skip.- Show the markdown report to the user. Surface the question, the Executive Summary, and 3–5 top findings inline. Point at the file path for the full read.
- Handoff: the scenarist reads
sources.json(specificallykeyFindingsper source) when writing the scenario. The art-director reads thePatterns Across Sourcessection when picking prompt seeds and model choices.
When to fall back to the per-source ref chain
If the user only cares about one source (single TikTok, single Reel) and doesn't need a topic-level narrative, the per-source ref chain alone is enough — ralphy ref pull <url> then ralphy ref blueprint <slug> gives a single-source blueprint.md. Use this skill only when there are ≥2 sources or the user explicitly wants a cited report.
Handoff to a fixer / scenarist agent
When the user says "now write the scenario" or similar, a downstream agent reads sources.json and report.md. The minimum it needs from you:
- Path to
report.md - Path to
sources.json - Topic slug (the dir name)
- Optional: which sources to weight more heavily (when one is a much stronger reference than the others)
Do not paraphrase the findings in chat — the report exists so chat doesn't have to.
References
references/report-schema.md— full schema ofsources.json+ section spec forreport.mddocs/playbooks/researcher.md— tool-deep-dive: yt-dlp flags, Playwright for JS-rendered sites, scrape-trends, viral-moments sub-taskcli/lib/research-topic.ts— source-of-truth for the topic state machineMODELS.md— synthesis model defaults (google/gemini-2.5-flashvia OpenRouter)
