
Open-source 32B MoE foundation model that generates synchronized video and audio in one inference step with precise dual-tower lip-sync.
Open-source 32B MoE foundation model that generates synchronized video and audio in one inference step with precise dual-tower lip-sync.
moss-video-and-audioPOST /v1/videos/generationsLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
MOSS Video and Audio runs through POST /v1/videos/generations. The request returns a job_id right away; poll GET /v1/jobs/{job_id} until the job completes and read the output URLs from the result. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/videos/generations \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "moss-video-and-audio",
"prompt": "Describe what you want MOSS Video and Audio to generate."
}'curl https://api.empiriolabs.ai/v1/jobs/JOB_ID \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY"import requests
response = requests.post(
"https://api.empiriolabs.ai/v1/videos/generations",
headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
json={
"model": "moss-video-and-audio",
"prompt": "Describe what you want MOSS Video and Audio to generate.",
},
)
job = response.json()
# Generation runs as an async job. Poll until it completes.
import time
while True:
status = requests.get(
f"https://api.empiriolabs.ai/v1/jobs/{job['job_id']}",
headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
).json()
if status.get("status") in ("completed", "failed"):
print(status)
break
time.sleep(5)Request parameters supported by the MOSS Video and Audio API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| prompt | string | - | - | Scene description. With image attached, becomes an image-to-video prompt. |
| mode | enum | t2v | t2v, i2v | t2v: pure text-to-video. i2v: animate the attached image. |
| resolution | enum | 720p | 360p, 720p | 720p uses a separate higher-VRAM endpoint. |
| aspect_ratio | enum | landscape | landscape, portrait | MOSS only supports landscape (16:9) and portrait (9:16). |
| duration | number | 8 | 2 to 8 | Clip length in seconds. The upstream model is hard-capped at 8s. |
| t2v_quality | enum | quality | fast, quality | Text-to-video only. fast trades fidelity for ~2× speed. |
| num_inference_steps | number | 25 | 10 to 50 | Diffusion steps. More = higher fidelity, slower. |
| cfg_scale | number | 5 | 1 to 10 | Classifier-free guidance. Higher = follows prompt more strictly. |
| sigma_shift | number | 5 | 1 to 10 | Schedule shift. Only valid when resolution=360p. |
| image | string | - | - | Reference image URL for i2v mode. |
| negative_prompt | string | - | - | What to avoid. |
| seed | number | - | - | Reproducibility seed. |
32B-parameter MoE with synchronized lip-sync video + audio in a single inference.
On EmpirioLabs, MOSS Video and Audio is billed pay as you go: 360p Video $0.17 per video; 720p Video $2.82 per video; T2V Fast $0.065 additional fee. The live rate card on this page always matches what the API charges.
MOSS Video and Audio is served through POST /v1/videos/generations on api.empiriolabs.ai with standard bearer-token authentication.
Yes. The EmpirioLabs playground runs MOSS Video and Audio in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Explore our models, or contact us about business inquiries, custom deployments, or anything else.