视频翻译新形态

Python 87.5%
TypeScript 10.4%
Cuda 0.7%
JavaScript 0.5%
C 0.4%
Other 0.4%

Find a file

附子🫡 65f6ca38af 随便写写		2026-05-05 09:48:49 +08:00
asset	随便写写	2026-05-05 09:48:49 +08:00
backend	自述	2026-05-05 09:38:09 +08:00
ui	自述	2026-05-05 09:38:09 +08:00
.gitignore	自述	2026-05-05 09:38:09 +08:00
package.json	readme.md	2026-01-15 04:34:57 +08:00
README.md	自述	2026-05-05 09:38:09 +08:00
README_EN.md	readme	2026-01-15 05:35:36 +08:00
requirements.txt	大更新	2026-02-01 23:35:46 +08:00
start.bat	node.js	2026-01-23 22:35:00 +08:00

README_EN.md

🎬 VideoSync - AI Video Localization Tool

One-Click Local AI Video Dubbing & Translation Tool

中文文档 | English

VideoSync is a fully automated AI video dubbing tool designed for Windows and Linux. It orchestrates state-of-the-art open-source models into a seamless workflow for "one-click" video localization.

No cloud APIs, no subscription fees. Use your local GPU to perform ASR (Speech Recognition) -> Text Translation -> Voice Cloning -> Audio-Video Alignment.

✨ Features

🎯 Accurate Recognition (ASR)
- Powered by WhisperX, featuring accurate VAD (Voice Activity Detection) and Forced Alignment.
- Eliminates hallucinations and missing words common in traditional Whisper, with millisecond-level precision.
🗣️ One-Shot Voice Cloning
- Integrated with MaskGCT (IndexTTS), requiring no fine-tuning.
- Clones source voices instantly using reference audio.
- Perfectly preserves tone, emotion, and speech rhythm.
🌏 Powerful LLM (Translation)
- Built-in Qwen 2.5-7B-Instruct large language model.
- Currently supports high-quality English <-> Chinese translation.
- Produces natural, subtitle-group quality translations.
⚡ Extreme Optimization
- Unique sequential VRAM management: Unloads LLM during TTS generation and vice versa.
- Runs smoothly on consumer-grade GPUs
🖥️ Modern UI
- Beautiful interface built with Electron + React.
- Real-time log monitoring, visual subtitle editing, and instant video preview.

📸 Screenshots

Main Interface

Subtitle Editor

🛠️ Requirements

For optimal performance, we recommend the following hardware:

OS: Windows 10/11 (x64) or Linux (Preview)
GPU: NVIDIA GeForce RTX 3060 or better (VRAM ≥ 6GB)
Driver: NVIDIA Studio/Game Ready Driver (CUDA 11.8+)
Runtime: Python 3.10+, Node.js 16+ (Required for source execution)

🚀 Quick Start

1. Clone Repository

git clone https://github.com/TianDongL/VideoSyncMaster.git
cd VideoSyncMaster

2. Backend Setup

We strongly recommend using Conda to manage environments.

# Create and activate environment
conda create -n videosync python=3.11
conda activate videosync

# Install core dependencies
pip install -r requirements.txt

# Install PyTorch (Check pytorch.org for your specific CUDA version)
# Example (CUDA 12.1):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3. Frontend Setup

cd ui
npm install

4. Download Models

Due to their large size, models are not included in the repo. Please download them and place them in the models/ directory:

Download Sources: HuggingFace or ModelScope.

VideoSync/
  ├── models/
  │   ├── faster-whisper-large-v3-turbo-ct2/  # ASR Model
  │   ├── index-tts/                          # MaskGCT / TTS Model files
  │   │   ├── config.yaml
  │   │   ├── gpt.pth ...
  │   └── Qwen2.5-7B-Instruct/                # LLM Translation Model

📖 Usage

Option 1: Run from Source (Recommended for Devs/Linux)

Start Application: Run the following command in the project root. The UI will launch and automatically spawn the python backend:
```
npm run dev
```
Note

: Ensure your Python environment is correctly set up in backend/ or valid in system PATH.
(Optional) Test Backend Manually: To debug the backend independently:
```
python backend/main.py --help
```

Option 2: Build Installer (Windows)

To generate an .exe installer:

cd ui
npm run build

The installer will be generated in ui/release/.

🤝 Acknowledgements

This project stands on the shoulders of giants. Special thanks to:

IndexTTS: For the voice cloning support.
WhisperX: For precise alignment and VAD.
Qwen: For powerful multilingual capabilities.

If you like this project, please give us a Star 🌟! PRs and Issues are welcome.

📜 License

✅ Non-Commercial: You are free to copy, modify, and distribute the code for non-commercial purposes only.
❌ No Commercial Use: Use of this project or its derivatives for commercial gain is strictly prohibited without prior authorization.
🔄 ShareAlike: If you modify the code, you must distribute your contributions under the same license.