摘要
arXiv:2604.23703v2 Announce Type: replace-cross Abstract: Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose instructor presence, narrative continuity, and expressive framing that help learners connect with course content. Full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study presents a practice-based implementation and analytic reflection of an open-source workflow for creating talking slide avatars. The workflow integrates OpenVoice for text-to-speech and authorized voice-style conversion with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a short script and an authorized or synthetic portrait image into a narrated video for slide decks or HTML-based lecture materials.