F2S Workshop @ ICML 2026 — From Frames to Stories

Overview

Video generation has advanced rapidly for short clips, yet generating long, multi-shot videos that remain coherent, controllable, and reliable is still an open challenge. Across minutes of generation, current systems often suffer from identity drift, scene inconsistency, narrative breakdown, and weak responsiveness to user intent. These challenges make long-horizon video generation a compelling testbed for long-context multimodal modeling, structured generation, interactive systems, and evaluation.

F2S brings together researchers working on the core scientific and practical questions behind this transition from frames to stories. We are broadly interested in methods that maintain consistency over time, support richer forms of control and revision, and enable rigorous evaluation of long-form generation. The workshop welcomes work spanning model design, memory and state tracking, planning, editing, multimodal interaction, datasets, benchmarks, and real-world systems for long-horizon video creation.

Our goal is to foster a shared research agenda around reliable, controllable, and trustworthy long-horizon video generation, while creating space for perspectives from generative modeling, multimodal learning, interactive machine learning, and creative applications.

Key Questions

Q1 — Persistent State: What minimal, compressible, compact state representation (entities, relations, events) must be carried and updated across minutes of generation, and how can models maintain it under tight compute/memory budgets?
Q2 — Interactive Control: How to enable multi-modal, creator-facing interaction with rich, compositional control signals (shot plans, localized edits, multimodal constraints, actions) over minutes-long generation?
Q3 — Evaluation: What kinds of benchmarks and protocols can separate long-horizon state/narrative consistency from short-term visual quality, and measure drift and constraint/control satisfaction robustly and reproducibly?

Call for Papers

We invite submissions on all aspects of long-horizon video generation, with a focus on reliability, controllability, and evaluation. Topics include but are not limited to:

Models, Memory & Long-Context Generation

Long-horizon and multi-shot video generation models
Long-context architectures, memory mechanisms, retrieval, and state tracking
Consistency maintenance across identities, scenes, events, and narrative structure
Hierarchical generation, planning, and inference-time refinement for long videos

Control, Editing & Interactive Systems

Controllable video generation with text, keyframes, layouts, trajectories, sketches, audio, or other multimodal signals
Storyboarding, shot planning, structured prompting, and script-guided generation
Video editing, localized revision, iterative refinement, and human-in-the-loop creation tools
Interactive, agentic, or creator-facing systems for long-form video generation

Benchmarks, Data & Trustworthy Evaluation

Datasets and annotations for long-form, story-level, or multi-shot video generation
Benchmarks and protocols for coherence, consistency, controllability, and narrative fidelity
Automatic metrics, VLM-based evaluation, and scalable human evaluation for long-horizon video
Reliability, robustness, safety, provenance, and reproducibility in generative video systems

Submission URL: OpenReview

Format: All submissions must be in PDF format and anonymized. Submissions are limited to four content pages, including all figures and tables; unlimited additional pages containing references and supplementary materials are allowed. Reviewers may choose to read the supplementary materials but will not be required to. Camera-ready versions may go up to five content pages.

Style file: You must format your submission using the ICML 2026 LaTeX style file. Please include the references and supplementary materials in the same PDF as the main paper. The maximum file size for submissions is 50MB. Submissions that violate the ICML style (e.g., by decreasing margins or font sizes) or page limits may be rejected without further review.

Dual-submission policy: We welcome ongoing and unpublished work. We will also accept papers that are under review at the time of submission, or that have been recently accepted without published proceedings.

Non-archival: The workshop is a non-archival venue and will not have official proceedings. Workshop submissions can be subsequently or concurrently submitted to other venues.

Visibility: Submissions and reviews will not be public. Only accepted papers will be made public.

Schedule

All times are in Korea Standard Time (KST, GMT+9). This is the tentative schedule of the workshop.

Time	Session
08:00 – 08:10	Opening Remarks
08:10 – 08:55	Keynote / Invited Talk 1
08:55 – 09:25	Invited Talk 2
09:25 – 09:45	Coffee Break
09:45 – 10:15	Invited Talk 3
10:15 – 10:45	Invited Talk 4
10:45 – 11:25	Oral Presentations (Selected Papers)
11:25 – 12:10	Poster Session 1
12:10 – 13:40	Lunch Break
13:40 – 14:10	Invited Talk 5
14:10 – 14:50	Panel Discussion + Audience Q&A
14:50 – 15:50	Poster Session 2
15:50 – 16:05	Coffee Break
16:05 – 16:35	Invited Talk 6
16:35 – 16:55	Breakout Session + Report-back
16:55 – 17:00	Closing Remarks

Submission deadline	April 30, 2026 (AoE)
Notification to authors	May 15, 2026 (AoE)
Camera-ready deadline	June 1, 2026
Workshop date	July 10, 2026 (Friday; Seoul, South Korea)

From Frames to Stories (F2S)

Overview

Key Questions

Call for Papers

Models, Memory & Long-Context Generation

Control, Editing & Interactive Systems

Benchmarks, Data & Trustworthy Evaluation

Important Dates

Schedule

Invited Speakers

Vincent Sitzmann

Xihui Liu

Daquan Zhou

Pinar Yanardag

Bohyung Han

Alexandre Alahi

Panelists

Accepted Papers

Organizers

Yu Lu

Junhao Dong

Enis Simsar

Hila Chefer

Ismini Lourentzou

Piotr Koniusz

Yi Yang

Program Committee

Contact