Backed by Y Combinator

Run inference, agents, and task queues on our cloud, or bring your own AWS, GCP, or bare metal. Sub-second cold starts, no rate limits. 4090s from $0.69/hr.

from beam import Image, endpoint

# Serve a model on an H100, autoscaled for you
@endpoint(gpu="H100", image=Image().add_python_packages(["vllm"]))
def generate(prompt: str):
    llm = vllm.LLM(model="openai/gpt-oss-20b")
    return {"output": llm.generate(prompt)[0].text}

$ beam deployRunning on H100 in US East

TRUSTED BY THE BEST AI COMPANIES

Case study

Case study

Performance

The runtime for AI workloads at scale

Engineered from the ground up for heavy AI workloads. Sub-second cold starts, massive parallelism, and full observability.

Live Usage

Containers 179Utilization 98%

AWS Account

74 Containers

GCP Account

39 Containers

Beam

BURSTING

14 Containers

Use your credits

Run AI workloads across clouds

Connect all your cloud accounts to Beam, and run workloads across all of them. Achieve the highest cloud utilization and maximum scale.

Boot time (s)

Beam

Beam (memory snapshots)0.0s

Beam0.0s

35× faster

Other runtimes

Provider A0.0s

Provider B0.0s

Kubernetes + EC20.0s

AI-native runtime

Sub-second cold starts

Memory snapshots restore GPU containers in seconds — up to 35× faster than a traditional cold boot.

30+ Regions

us-westeu-westap-southca-easteu-centralus-east

Globally distributed

Run near your agents

Workloads route across clouds and regions in real time, for low-latency execution wherever your users are.

sandbox.snapshot_memory()

PROMPT

BUILD

EVAL

SAVED

PROMPT

BUILD

EVAL

SAVED

PROMPT

BUILD

EVAL

SAVED

PROMPT

BUILD

EVAL

SAVED

Massive parallelization

Snapshot, branch, restore

Snapshot a running sandbox, then restore it into thousands of concurrent isolated runs, each with realtime streaming output.

The SDK

From serverless inference to sandboxes

Logic and hardware in one place — no YAML, no Dockerfiles, no infra to manage.

// GPU Infrastructure

GPU Inference

Sub-second cold starts

We provide distributed storage layer, memory snapshotting, and GPU checkpoint restore, resulting in lightning fast container boot times.

Scale down to zero, burst to thousands

Only pay for what you use

app.py

from beam import QueueDepthAutoscaler

# Scale out when queue size > 30 tasks
autoscaling_config = QueueDepthAutoscaler(
    tasks_per_container=30,
    max_containers=300,
)

// Composable Primitives

Durable Task Queues

Retries, Callbacks, and Scheduled Jobs

Control the full lifecycle of a task with automated retries and event-based callbacks to your application.

Logging & Monitoring

Secrets Management

app.py

from beam import task_queue

@task_queue(gpu="A10G", callback_url="ngrok.io")
def transcribe():
    model = whisper.load_model("small")
    model.transcribe("./meeting-notes")

// Long-Running Environments

Sandboxes for AI Agents

Stateful, Persistent Runtimes

Sandboxes are stateful. You can connect to a running process, attach persistent storage volumes, and snapshot the file system to create reusable templates.

File System Operations

Run Docker-in-Docker

app.py

from beam import Image, Sandbox

sb = Sandbox().create()
image_id = sb.create_image_from_filesystem()
sb.terminate()

sb = Sandbox(image=Image.from_id(image_id)).create()

Run Anywhere

Connect any VM, anywhere

Bring your own cloud. No lock-in

Switch Hardware in Seconds

Run your code on any hardware in seconds — just change one line of Python to switch hardware.

Easy Local Debugging

Test your code before deploying it, using the exact configuration you'll run in production.

Multiple Workers Per Container

Scale vertically by running multiple workers on the same container.

Run Docker-in-Docker

Run the full Docker daemon in your containers.

Deploy from GitHub Actions

Deploy your APIs automatically by adding Beam to your existing CI/CD pipeline.

Use Cases

One platform for inference, sandboxes, and training

Custom Model Inference

Host any custom model on GPU or CPU. Bring your own image.

Sandboxed Code Execution

Run LLM-generated code in secure execution environments.

RL Environments

Fork sandboxes from snapshots to run RL rollouts in parallel.

Training & Fine-Tuning

Train and fine-tune, from SLMs and LLMs to diffusion models.

Audio Processing Pipelines

Deploy task queues to process large amounts of data.

Streamlit and Gradio UIs

Run frontend apps, from Streamlit and Gradio apps to Notebooks.

Examples

What will you build?

All examples

Inference

Serve LLMs at high throughput

Run vLLM for fast, batched inference with maximum tokens per second.

Audio

Transcribe audio at scale

Deploy Whisper to turn speech into text across thousands of files.

Image

Run ComfyUI image pipelines

Spin up node-based image generation workflows on serverless GPUs.

Speech

Generate natural speech

Deploy Parler-TTS to turn text into expressive, lifelike audio.

Training

Fine-tune Gemma models

Train and customize Gemma on your own data with GPU-backed jobs.

Community

Join our community

Beam is powering hands-down the best developer experience to run models on GPUs easily at scale. Best decision on the infra side for us this year so far.

Louis MorgnerCo-founder, AI lead @ Jamie

@beam_cloud is 🔥. Such a huge workflow improvement over AWS Sagemaker / Google vertex ai

Eric Meier@bitphinix

One of the better developer experiences I've had in a while was with @beam_cloud - a serverless GPU and API infra platform. Check them out 👇

Deploy an open source model on hugging face running on GPUs in a few minutes with 6 lines of code.

Keep your eyes on these guys 👀

Brandon Garcia@__BCG__

I can't recommend Beam highly enough. Their developer experience is top notch.

We never could have shipped Happy Accidents as quickly as we did without them. We were able to build the GPU portion of our app in hours instead of weeks.

Not only is the platform great, we loved working with the Beam team. They're extremely responsive, so we had a high level of confidence in the reliability of the platform.

James BonnerFounder at Happy Accidents

Beam has been a huge time-saver by eliminating the need to monitor and manage my own VM infrastructure.

I no longer worry about unexpected bugs or outages which means less downtime and fewer headaches.

This lets me provide a significantly more reliable service to my users, and it's been surprisingly more cost-efficient than my prior solution.

Liam EloieMachine Learning Engineer

Time is the biggest thing Beam has helped us with. I went from spending 6 hours developing an API to pressing a button and deploying instantly

Benjamin SmithMLE at Shippabo

Spun up a new app today and realized just how it easy it was. Took me only 15 mins to organize and deploy on Beam.

Realizing that quick python apps on Beam is a cheat code

Brandon BrisbonCTO at Shop Galaxy

Beam has been a revelation in terms of making it simple to build an ML application on GPU

Devon PeroutkySoftware Engineer

Frase is running language models exclusively on Beam and it was surprisingly easy to migrate, less maintenance, and is saving us money because unlike Google and other cloud providers, Beam is able to provide us with an on-demand solution that scales immediately with our traffic, and we don’t need to worry about any of the clunky tooling around GPUs.

Frankie L.CTO and AI Researcher @ Frase

Beam is amazing. I tested the CLI and in 5 minutes had something running on the cloud.

And the Slack community is a game changer because when we get stuck we get responses quickly

Leonardo CucoCTO at Ween.ai

If you're looking to dip your toes into building something with AI, definitely take a look at http://beam.cloud.

Serverless functions with access to GPUs so you can run jobs on-demand and pay only for what you use.

And it's *much* easier than setting up a VM somewhere!

Joshua Clanton@joshuacc

$30 free creditrefreshed monthly

Start shipping on infra
you won’t outgrow.

Run sandboxes and GPU workloads on your cloud, and scale out to ours when you need to. No infra to manage.

Start Building Read the docs

Serverless GPUs

and Sandboxes

TRUSTED BY THE BEST AI COMPANIES

The runtime for AI workloads at scale

Run AI workloads across clouds

Sub-second cold starts

Run near your agents

Snapshot, branch, restore

From serverless inference to sandboxes

GPU Inference

Durable Task Queues

Sandboxes for AI Agents

Connect any VM, anywhere

Bring your own cloud. No lock-in

One platform for inference, sandboxes, and training

Custom Model Inference

Sandboxed Code Execution

RL Environments

Training & Fine-Tuning

Audio Processing Pipelines

Streamlit and Gradio UIs

What will you build?

Join our community

Start shipping on infrayou won’t outgrow.

Start shipping on infra
you won’t outgrow.