Renting a GPU has always involved more steps than it should. You pick a provider, wait for capacity, configure an image, expose a port, and hope the box is reachable. GPU Cloud removes that friction. You deploy a managed GPU instance in one click, and EmpirioLabs handles provisioning, networking, and the secure connection back to you.
GPU Cloud is now generally available to every EmpirioLabs account. Here is what you can do with it.
Deploy a model in one click
The fastest way to use GPU Cloud is to serve a model. Paste any Hugging Face repository id, choose a GPU, and deploy. EmpirioLabs loads the weights and serves the model from an OpenAI-compatible endpoint, so your existing client code works without changes.
curl https://api.empiriolabs.ai/v1/gpu/instances \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"gpu_slug": "rtx-4090",
"mode": "model",
"hf_id": "Qwen/Qwen2.5-7B-Instruct"
}'
The call returns an instance id with a status of provisioning. Poll the instance until it is running, then send requests to its connect path. For a model, that path is a standard OpenAI base URL:
curl https://api.empiriolabs.ai/v1/gpu/connect/$INSTANCE_ID/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [{ "role": "user", "content": "Hello" }]
}'
Chat with your model right in the dashboard
You do not have to write any code to try a model you just deployed. Every instance that serves an OpenAI-compatible API gets a built-in chat page in the dashboard. Open the instance from the GPU Cloud page, click Chat with this model, and start typing. The chat page streams responses, supports a system prompt and the usual sampling controls, and lets you attach images or audio for multimodal models. It runs over the same secure connection as the API, so there is nothing extra to set up and no separate billing, because the instance is already metered by the second.
Templates and custom containers
Not every workload is a chat model. GPU Cloud ships one-click templates for the most common environments, including PyTorch with JupyterLab, ComfyUI, a web terminal, and Ollama. Pick a template, choose a GPU, and the environment opens in your browser through the dashboard.
If you need full control, deploy your own CUDA Docker image. Point GPU Cloud at the image, list the ports it serves, and reach it through the same connect path a model would use.
curl https://api.empiriolabs.ai/v1/gpu/instances \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"gpu_slug": "rtx-4090",
"mode": "template",
"template_slug": "pytorch-jupyter",
"disk_gb": 150
}'
Pricing
GPU Cloud is billed per second of running time. The hourly rate is shown before you deploy and is locked for the life of that instance, so a price change never affects a rental that is already running. Stopping or destroying an instance releases the GPU and stops billing immediately, and the ephemeral storage is deleted when the instance stops, so treat it as scratch space. Lifetime and per-instance spend are visible on the GPU Cloud page and through /v1/account/usage.
Get started
Open GPU Cloud in the dashboard to deploy your first instance, or read the GPU Cloud documentation for the full API.



