The situation
A marketing-analytics team runs in AWS on SageMaker Notebook Instances. Six months ago there were three; today there are fourteen, and the estate looks like this:
- Fourteen Notebook Instances,
ml.t3.mediumthroughml.g5.xlarge, named things likealex-dev,priya-prototype-2,shared-etl-legacy,gpu-thursday. - Each instance has its own EBS volume holding the user’s notebooks, datasets, conda environments, and a not-small amount of “I’ll delete that later” files.
- Each instance has its own IAM role, pieced together with whatever permissions were needed at the time.
- Starting and stopping is manual. Monthly bill contains a long tail of instances left running over weekends by people on leave.
- Code sharing happens via git, but lifecycle scripts and kernel environments don’t, every instance has its own conda venvs, and reproducing “the environment where that model trained” means asking the person who did it.
The team lead wants a cleaner model before the team doubles. SageMaker Studio is the obvious candidate but nobody is quite sure what it is relative to what they already have, beyond “it’s the newer one.”
What actually matters
Two different development shapes live under the SageMaker umbrella, and they trade against each other along a few axes worth naming before comparing features.
The first axis is who owns the compute. A Notebook Instance is an EC2 instance the account owns: it has an instance ID, an EBS volume, a lifecycle, a security group. Stop it and it keeps its disk; terminate it and it loses everything not backed up to S3 or git. The team member who started it is usually the only one who uses it, but nothing in the model enforces that, it’s a shared compute resource that happens to be used by one person. In Studio, the compute is provisioned on demand per-user per-kernel: the user opens a notebook, Studio launches a container with the chosen instance type, the notebook runs, the user closes it, the container stops. The compute’s identity is “a kernel running for this user right now” not “a persistent box someone started last month.”
The second axis is where the files live. On a Notebook Instance, files live on the instance’s EBS volume. Stopping the instance preserves them; the user’s terminal shell, conda environments, installed apt packages, downloaded datasets all sit on that volume until the instance is terminated. In Studio, each user has an EFS-backed home directory that persists independently of any kernel. Kernels come and go; the home directory stays. Files are the user’s; compute is ephemeral.
The third axis is the multi-user story. Notebook Instances have no native multi-user model, each instance is a single-user box, and “the team” is a convention enforced by tagging and naming. Studio has an explicit multi-user model: a Studio domain contains user profiles, each profile has its own EFS home and its own IAM role (or inherits a shared one), and admins manage the domain centrally (authentication via IAM or IAM Identity Center, lifecycle configs, default kernel images).
The fourth axis is what you can do without leaving the UI. Notebook Instances are a Jupyter server and not much else, data exploration, model training via the SDK, deployment via the SDK. Studio bundles the Jupyter experience with a tab-based IDE, a Data Wrangler for visual feature engineering, Experiments for tracking training runs, Pipelines for authoring pipelines, Debugger and Profiler for inspecting training jobs, and Canvas if you want no-code. Whether you use any of those depends on the team; they’re there regardless.
The fifth is cost. Notebook Instances bill while running, full stop; a ml.g5.xlarge left on for a long weekend is on the next bill. Studio bills per-kernel: the kernel is running only while a notebook is open and executing, and Studio idle-shuts kernels by default (configurable idle timeout). The auto-shutdown is the real cost lever; it’s much easier to enforce in Studio because the kernel is the thing, not a whole instance.
What we’ll filter on
Five filters that separate the two shapes:
- Who provisions compute, the user explicitly, or the platform on-demand?
- File persistence. EBS on an instance, or EFS per user profile?
- Multi-user story, one box per person or one domain for the team?
- Bundled tooling, plain Jupyter or an IDE with Data Wrangler, Experiments, Pipelines, Debugger?
- Idle-cost story, manual stop, or auto-shutdown per kernel?
The SageMaker development landscape
1. SageMaker Notebook Instance. An EC2 instance pre-configured with Jupyter and the SageMaker SDK. Single-user in practice; billed while running. Lifecycle configuration scripts run at start and create; the user’s notebooks and files live on the instance’s EBS volume. Integrates with the SageMaker API so training jobs, processing jobs, and endpoints can be created from it. Mature, well-understood, and the default shape before Studio launched.
2. SageMaker Studio. An integrated development environment for ML, running in a managed VPC, accessible via a web URL. Multi-user by design: a Studio domain contains one or more user profiles; each profile has its own EFS home and IAM role. Kernels run in containers on on-demand compute (CPU or GPU), billed per-second while running, idle-timeout configurable. The IDE bundles JupyterLab plus Data Wrangler, Experiments, Pipelines, Debugger, Model Registry, Feature Store browser, Canvas, and, as of the Studio V2 refresh. Code Editor (VS Code OSS) alongside JupyterLab.
3. SageMaker Studio Classic. The earlier version of Studio, still available in existing domains. UI differences from V2; underlying model (domain + user profiles + EFS + on-demand kernels) is the same. New work should target the current Studio, not Classic.
4. SageMaker Studio Lab (not part of commercial SageMaker pricing). A free, AWS-account-independent Studio-flavoured environment aimed at education and experimentation. Mentioned for completeness; not a fit for a corporate team.
5. SageMaker Canvas. A no-code UI for building models on top of Studio infrastructure. Used by business analysts; not a replacement for notebook-driven development by a data-science team. Covered separately in its own post.
6. Plain EC2 with Jupyter. Roll-your-own. Maximum flexibility, zero managed anything. Included here because it’s the thing Notebook Instances replaced; rarely the correct answer once Notebook Instances exist.
Side by side
| Shape | Compute ownership | File persistence | Multi-user | Bundled tooling | Idle cost |
|---|---|---|---|---|---|
| Notebook Instance | User starts a box | EBS on instance | ✗ (one per person) | JupyterLab + SDK | ✗ manual stop |
| Studio | Platform launches kernel | EFS per user profile | ✓ domain + profiles | JupyterLab + Data Wrangler + Experiments + Pipelines + Debugger + Registry + Canvas + Code Editor | ✓ auto-shutdown idle kernels |
| Studio Classic | Platform launches kernel | EFS per user profile | ✓ domain + profiles | Older UI; same underlying model | ✓ auto-shutdown idle kernels |
| Plain EC2 | User provisions | Whatever they mount | ✗ | None (self-installed) | ✗ manual stop |
Reading the table against the 14-instance sprawl:
- The team is hitting the single-user assumption of Notebook Instances hard, “ask Priya which notebook” is the smell. Studio’s domain + user-profile model is the direct answer.
- The mixed instance types (CPU for exploration, GPU for training) are a better fit for Studio’s per-kernel compute selection, pick the instance type when opening a notebook, not when provisioning a box.
- The idle-cost tail is the cost angle: Studio’s per-kernel auto-shutdown means leaving a notebook open on Friday costs the kernel-idle-timeout amount of compute, not a whole weekend.
- The onboarding story is Studio’s strongest pitch: a new hire gets a user profile in the existing domain, inherits the domain’s default image and lifecycle config, has an EFS home ready, and is productive in minutes, not “ask someone to give them SSH.”
Two architectures, one team
The pick in depth
Studio is the default for a team of more than one or two. The migration is real work, it’s not just “start using Studio”, but the end state solves the fourteen problems that the Notebook-Instance sprawl has accumulated.
The migration plays out in roughly this order.
Create the Studio domain. A single domain per Region per account (that’s the hard limit as of 2027; multiple domains were added a while back but most teams use one). The domain’s VPC is the team’s VPC; the domain’s default IAM execution role is the team’s ML role; the default image is whichever AWS-published image matches the team’s stack (SageMaker Distribution image 1.x is the current default, carrying Python, common ML libraries, and the SageMaker SDK).
Add user profiles. One per team member. Each profile inherits the domain’s execution role unless overridden; each gets a 5GB-sized (adjustable) EFS home. Authentication: IAM Identity Center if the team is federated through it (the common choice in organisations on AWS SSO), otherwise IAM users or an external SAML provider.
Port notebooks. This is the part that takes real time. Notebooks come across as .ipynb files, usually fine. What doesn’t come across cleanly is conda environments baked into the Notebook Instance’s EBS, the team must either rebuild them in Studio (via a lifecycle configuration or a custom image) or standardise on the SageMaker Distribution image and install per-notebook deps in the first cell. The latter is the recommended pattern; the former is what happens when someone has a twelve-package conda env they genuinely can’t live without.
Set lifecycle configs. JupyterServer LCCs run at Studio login; KernelGateway LCCs run at kernel launch. Typical contents: pre-install team libraries, set git config, clone a shared utils repo. One lifecycle config defined at the domain level runs for everybody, the 14 idiosyncratic on-start.sh scripts collapse to one.
Turn on idle shutdown. The Studio idle shutdown extension (or the built-in Studio auto-shutdown, depending on how the domain is provisioned) stops kernels that have been idle beyond a threshold, usually 60 or 120 minutes. This is the single biggest cost-saver in the migration.
Decommission the Notebook Instances. One at a time, after the corresponding user has moved. Back up any irreplaceable on-instance state to S3; confirm git has everything else; stop the instance; terminate it. 14 bill lines vanish.
When Notebook Instances are still the correct answer. Not often, but sometimes. A long-running process that needs a persistent shell session (not a notebook). Studio kernels are notebooks, not long-lived VMs. A specific OS-level configuration that conflicts with Studio’s container model. A specific networking configuration where Studio’s VPC-mode has friction and a single EC2 instance doesn’t. Edge cases; worth knowing they exist but not worth defaulting to.
A worked domain setup
# Create the domain (once, per account + Region)
aws sagemaker create-domain \
--domain-name marketing-analytics \
--auth-mode IAM \
--vpc-id vpc-0abc123 \
--subnet-ids subnet-0aaa subnet-0bbb subnet-0ccc \
--default-user-settings file://default-user-settings.json
# default-user-settings.json:
# {
# "ExecutionRole": "arn:aws:iam::111122223333:role/team-ds-role",
# "KernelGatewayAppSettings": {
# "DefaultResourceSpec": {
# "InstanceType": "ml.t3.medium",
# "SageMakerImageArn": "arn:aws:sagemaker:eu-west-1:...:image/sagemaker-distribution"
# }
# }
# }
# Add a user profile per team member
for user in alice bob carol dave eve ...; do
aws sagemaker create-user-profile \
--domain-id d-abcd1234 \
--user-profile-name $user
done
# Attach a lifecycle config (the team's shared setup)
aws sagemaker create-studio-lifecycle-config \
--studio-lifecycle-config-name team-startup \
--studio-lifecycle-config-app-type JupyterServer \
--studio-lifecycle-config-content $(base64 team-startup.sh)
Day-one onboarding shrinks from “here’s how you SSH into your instance” to “your Studio URL is X, you have a user profile, here’s the team’s notebook repo, clone it and go.” The 14-instance sprawl becomes one domain, one default image, one set of IAM conventions, one idle-shutdown policy.
What’s worth remembering
- Notebook Instance = EC2 + Jupyter. Per-user EBS, manual stop, no multi-user model, billed while running.
- Studio = managed IDE with per-user profiles. EFS per user, on-demand kernels, bundled tooling (Data Wrangler, Experiments, Pipelines, Debugger, Registry, Canvas, Code Editor), auto-shutdown on idle.
- Studio’s multi-user story is the differentiator. A domain holds the team’s conventions once, default image, default IAM role, lifecycle configs, networking, KMS, and each user profile inherits. Onboarding is a single API call.
- Compute is ephemeral in Studio; files are persistent. Kernels come and go; EFS home stays. Pick instance type per-kernel, not per-user.
- Idle auto-shutdown is the cost lever. Enable it. Weekend-leftover bills were most of what made the 14-instance estate expensive.
- The SageMaker Distribution image is the default. Standardise on it; install per-notebook deps in the first cell. Custom images are possible but only necessary when the standard one genuinely can’t carry the workload.
- Studio Classic still exists, in legacy domains. The current Studio is what new work should target; Classic is a known-older UI with the same underlying domain/profile/EFS model.
- Migration is files + environments + mindset. Notebooks copy across; conda envs usually rebuild; the team has to let go of “I start my box and it’s mine.” The payoff is central governance and a much smaller monthly bill.
The prompt “notebook or Studio?” is really the prompt “are you one person with one workload, or a team that needs a shared development environment?” Once that’s clear, the shape picks itself, and the fourteen-instance sprawl stops being a rite of passage.