Building a Churn Model in SageMaker Canvas Without Python

April 12, 2028 · 14 min read

The situation

A mid-size SaaS company’s customer-success team includes an analyst, Morgan, who has been chasing a churn question for a year. Morgan can build complex SQL, understands the business, and has a clear hypothesis: subscribers who reduce their active-seat count two months in a row are more likely to cancel. They want a model that scores each subscriber weekly with a probability of cancellation in the next 60 days so the retention team can prioritise.

The constraints:

Morgan doesn’t write Python. They write SQL and use Tableau. A Jupyter notebook with pandas.read_csv is already out of their comfort zone.
The data team is booked until Q4. A request to “build me a churn model” would queue behind a pile of other asks.
The trainingTrainingThe process of fitting a model’s weights to data by minimising a loss function. data is in Redshift, 200k subscribers, 36 months of monthly activity snapshots, labelled with whether each subscriber cancelled in the following 60 days.
The predictions need to flow back to Salesforce nightly so the retention team sees them in their existing dashboards.
The company has Bedrock and SageMaker enabled in its AWS account; IAM and VPC already exist.

Morgan has three options: queue with the data team, pay a consultant, or learn enough AWS ML to do it themselves. Canvas is the third option’s enabler.

What actually matters

Before reaching for Canvas, it helps to be honest about what “no-code ML” means. It doesn’t mean “zero ML knowledge”. Morgan still needs to understand train/test splitting, basic metrics (AUC, precision, recall), and why a model trained on one time window might behave badly on another. It means the mechanics of training, tuning, and deploying a model don’t require Python code.

Canvas is an interface on top of SageMaker Autopilot (AWS’s AutoML engine), plus a data-import layer on top of Data Wrangler, plus a deployment layer on top of SageMaker endpoints, plus (in recent versions) Bedrock integration for foundation-model use cases. Under the covers, Canvas launches the same Processing jobs, training jobs, and endpoints that a Python-using ML engineer would launch directly via the SageMaker SDK.

What that means in practice:

What Morgan does: picks a dataset, picks a target column, picks a model type (standard tabular, time-series forecasting, image classification, document classification, or a foundation-model option), clicks “build,” reviews the result, clicks “deploy” or “export.”

What AWS does: picks a model family (XGBoost, linear learner, deep ensembles, depending on the data), trains a handful of candidates with automated hyperparameter tuning, cross-validates, picks the best, emits feature-importance reports and explainability metrics, and either hosts the winner as an endpoint or exports the winning model and its preprocessing pipeline as a set of artefacts.

What Morgan still owns: deciding the target is meaningful, deciding the features are complete, interpreting the feature-importance report, deciding whether the model’s accuracy is good enough for production, and monitoring for drift.

The pitch isn’t “no ML knowledge required”; it’s “the ML mechanics are taken care of, so the domain expert can focus on the domain.”

There’s a second framing worth naming: Canvas is also AWS’s answer to “citizen data scientist” workflows, teams where business analysts want to do ML, and where giving them a tool that emits models an ML engineer can later productionise (or rebuild) is better than either blocking on the data team or letting the analysts build shadow-IT stacks in Excel + Jupyter. Canvas produces artefacts (model, recipe, training dataset definition) that can be picked up by an ML engineer in Studio for further work.

What we’ll filter on

Five filters for “is Canvas the correct tool here”:

User technical level. SQL-literate analyst, Python-literate data scientist, neither?
Problem type, tabular classification/regression, time series, text, image, multi-modal?
Data location. Redshift, Snowflake, S3, Salesforce, other SaaS connectors?
Where the predictions need to go, batch file, Salesforce, another SaaS, a SageMaker endpoint?
Handoff story, does an ML engineer need to take over later?

The no-code-ML landscape

1. SageMaker Canvas. The AWS-native no-code ML tool. Lives in SageMaker Studio (the analyst launches it from the SageMaker console); supports tabular classification/regression, time-series forecasting, image classification, document classification, and, via Bedrock integration, natural-language tasks (generation, extraction, Q&A) powered by foundation models. Data import from S3, Redshift, Snowflake, Salesforce, Databricks, SageMaker Data Wrangler, and local upload. Output: a model, hosted on a Canvas-managed SageMaker endpoint, with optional export to Studio for handoff.

2. SageMaker Autopilot (direct). The AutoML engine Canvas sits on top of. Can be invoked from the SageMaker SDK or the console with fewer guardrails. Better for data scientists who want AutoML’s results with more control; overkill for Morgan’s case.

3. Amazon QuickSight Q. Natural-language analytics tool; good at “why did revenue drop last quarter” kinds of questions on prepared datasets. Adjacent to the space Canvas occupies but not a model-building tool, it’s a querying and visualisation tool.

4. Bedrock with a prompt. For problems that are actually foundation-model problems (classification of free-text, extraction from documents, summarisation), calling a Bedrock model via the console or a Lambda can short-circuit the “do I need a custom model at all” question. Morgan’s churn case isn’t this; foundation models aren’t the correct tool for tabular 200k-row classification.

5. Third-party AutoML (DataRobot, H2O.ai, Dataiku, etc.). Competing in the same space. Available on AWS Marketplace; outside AWS’s native integration story. Mature tools, often more polished UIs, but another vendor relationship.

6. “Just learn Python.” Always on the list. Morgan’s time is the constraint; a six-week Python bootcamp is not cheaper than using Canvas.

Side by side

Option	User level	Problem types	Data sources	Deployment	Handoff
Canvas	SQL-literate analyst	Tabular, time series, image, text, FM tasks	Redshift, Snowflake, Salesforce, S3, local	Canvas endpoint, export to Studio, Salesforce, batch	Exportable to Studio
Autopilot direct	Data scientist	Tabular primarily	S3	SageMaker endpoint	Studio-native
QuickSight Q	Analyst	Queries on prepared data	QuickSight data sets	Dashboards	n/a
Bedrock direct	Any	FM-shaped (text gen/classification/extraction)	n/a	Bedrock API	n/a
Third-party AutoML	Analyst	Broader	Varies	Varies	Third-party tooling

Reading the table against Morgan’s case: Canvas fits cleanly. Tabular classification, Redshift data source, Salesforce deployment target, handoff to Studio if the data team wants to take the model over later. None of the others fit as well: Autopilot direct is the same engine with less support; QuickSight Q doesn’t build predictive models; Bedrock isn’t the correct tool for structured tabular classification; third-party AutoML introduces a vendor.

From SQL to scored subscribers

Canvas is a workflow, not a magic button. Morgan does the domain work (data, target, feature review, result interpretation); Canvas does the ML mechanics; the handoff path keeps the work portable if the data team takes over.

The pick in depth

Morgan’s Canvas workflow, end to end.

Open Canvas. From the SageMaker console, Studio launches with the Canvas app. Morgan’s IAM role needs canvas:* plus the Redshift connector permissions (redshift-data:ExecuteStatement, redshift-data:GetStatementResult, access to a Secrets Manager secret with the Redshift credentials).

Import the dataset. Canvas’s data-import wizard connects to Redshift via the connector, lets Morgan write a SQL query:

SELECT subscriber_id, tenure_months, seat_count, seat_count_lag_1, seat_count_lag_2,
       active_days_last_30, support_tickets_90d, mrr_usd, plan_tier,
       churned_next_60d
FROM analytics.subscriber_monthly_features
WHERE month_end BETWEEN '2024-09-01' AND '2027-08-31';

Canvas runs the query, imports the result as a named dataset, shows row count, column types, distribution summaries, null counts.

Create a model. Morgan clicks “New model,” picks “Predictive analysis” (as opposed to “Image,” “Text,” or “Forecasting”). Canvas reads the dataset, detects column types, and asks for the target column. Morgan picks churned_next_60d. Canvas auto-detects it as binary classification.
Review the feature set. Canvas shows each column’s impact-on-target summary. Morgan removes subscriber_id (not a feature), confirms plan_tier should be treated as categorical, confirms others.
Build. Canvas offers “Quick build” (2-15 min, smaller search) or “Standard build” (2-4 hours, larger search). For a 200k-row dataset, Quick build is usually good enough for the first iteration. Canvas kicks off the Autopilot job.
Review the model. Canvas reports accuracy, F1, precision, recall, AUC. Shows feature importance (SHAP-based): seat_count_lag_1 and seat_count_lag_2 at the top, which matches Morgan’s hypothesis. Confusion matrix shows the model catches 68% of churners at a 30% false-positive rate. Morgan iterates once, rebuilds after dropping three irrelevant features, and is satisfied.
Generate predictions. Morgan picks “Predict” and selects a weekly snapshot from Redshift via the same connector. Canvas batch-scores and writes to an S3 output, which a pre-configured Salesforce sync pulls nightly.

Timeline: a focused week, not a quarter.

When Morgan should hand off to the data team. When the model needs to scale to dozens of variants, when it needs to be retrained on a strict schedule with validation gates, when the deployment is mission-critical, when feature engineeringFeature (ML)An input variable to a model – the numeric or categorical signals you compute from raw data and feed in. requires code. Canvas is for the first working version; production-scale ownership should belong to the team with the tooling to own it.

When not to use Canvas. When the problem isn’t tabular at Canvas’s supported scale. When the analyst is comfortable in Python and would prefer Studio. When the organisation has policies against business-unit teams deploying production models independently. When the model needs capabilities Canvas doesn’t offer (custom loss functions, specific architectures, particular feature-engineering techniques).

A worked example

Morgan’s model, six weeks after go-live:

Scored subscribers weekly: 205,000, including new sign-ups.
Retention team actioned: top-scoring 500/week, outreach, check-in calls, discount offers.
Measured retention improvement: +3.4 percentage points in the “likely to churn” cohort vs a control group unreached.
Morgan’s new role: owns the model; reviews monthly performance; iterates with new features quarterly.

When the data team unblocks, Morgan hands off the Canvas export. The ML engineer opens the exported notebook in Studio, inspects the model, notes that Canvas chose an XGBoost candidate, and rebuilds with additional feature engineering (lag features computed in Redshift rather than relying on the SQL query’s lags) and a monthly retraining pipeline. The domain work Morgan did carries forward.

What’s worth remembering

Canvas sits on top of Autopilot. No-code UI on top of AWS’s AutoML engine. Same model family choices, same cross-validation, same metrics, different interface.
Canvas is for citizen data scientists. SQL-literate analysts, domain experts, teams without ML capacity. Not a replacement for data scientists; an enabler for the step before one.
Five problem types supported. Tabular classification/regression, time-series forecasting, image classification, document classification, and Bedrock-powered natural-language tasks.
Data-source connectors are the value. Redshift, Snowflake, Salesforce, Databricks, and S3 are first-class; data imports happen in the UI without Python.
Exports keep work portable. A Canvas model exports to Studio as a notebook with the training dataset, recipe, and artefact; an ML engineer can take it over without starting from scratch.
Deployment options are tiered. Canvas endpoint for quick use, batch predictions for scheduled scoring, Salesforce sync for CRM integration, export to Studio for MLOps-grade production.
Still need to interpret results. Canvas automates the training; it doesn’t automate the judgement about whether the model is good enough, the features are complete, or the target is meaningful. That’s still the analyst’s job.
Use responsibly. No-code ML can deploy bad models as easily as good ones. Monitoring, drift detection, and clear sunset criteria belong in the workflow from day one.

Canvas isn’t magic. It’s a pragmatic bridge for analysts who know their domain better than anyone else on the team, have a real question, and have been blocked on engineering capacity. Get the model live; prove the value; hand it to an ML engineer if it’s mission-critical. The analyst’s domain knowledge is the scarce resource; Canvas makes it productive without a bootcamp.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.