Building a Churn Model in SageMaker Canvas Without Python

April 12, 2028 · 14 min read

ML Engineer · MLA-C01 · part of The Exam Room

The situation

A mid-size SaaS company’s customer-success team includes an analyst, Morgan, who has been chasing a churn question for a year. Morgan can build complex SQL, understands the business, and has a clear hypothesis: subscribers who reduce their active-seat count two months in a row are more likely to cancel. They want a model that scores each subscriber weekly with a probability of cancellation in the next 60 days so the retention team can prioritise.

The constraints:

  • Morgan doesn’t write Python. They write SQL and use Tableau. A Jupyter notebook with pandas.read_csv is already out of their comfort zone.
  • The data team is booked until Q4. A request to “build me a churn model” would queue behind a pile of other asks.
  • The TrainingThe process of fitting a model’s weights to data by minimising a loss function. data is in Redshift, 200k subscribers, 36 months of monthly activity snapshots, labelled with whether each subscriber cancelled in the following 60 days.
  • The predictions need to flow back to Salesforce nightly so the retention team sees them in their existing dashboards.
  • The company has Bedrock and SageMaker enabled in its AWS account; IAM and VPC already exist.

Morgan has three options: queue with the data team, pay a consultant, or learn enough AWS ML to do it themselves. Canvas is the third option’s enabler.

What actually matters

Before reaching for Canvas, it helps to be honest about what “no-code ML” means. It doesn’t mean “zero ML knowledge”. Morgan still needs to understand train/test splitting, basic metrics (AUC, precision, recall), and why a model trained on one time window might behave badly on another. It means the mechanics of training, tuning, and deploying a model don’t require Python code.

Canvas is an interface on top of SageMaker Autopilot (AWS’s AutoML engine), plus a data-import layer on top of Data Wrangler, plus a deployment layer on top of SageMaker endpoints, plus (in recent versions) Bedrock integration for foundation-model use cases. Under the covers, Canvas launches the same Processing jobs, training jobs, and endpoints that a Python-using ML engineer would launch directly via the SageMaker SDK.

What that means in practice:

What Morgan does: picks a dataset, picks a target column, picks a model type (standard tabular, time-series forecasting, image classification, document classification, or a foundation-model option), clicks “build,” reviews the result, clicks “deploy” or “export.”

What AWS does: picks a model family (XGBoost, linear learner, deep ensembles, depending on the data), trains a handful of candidates with automated hyperparameter tuning, cross-validates, picks the best, emits feature-importance reports and explainability metrics, and either hosts the winner as an endpoint or exports the winning model and its preprocessing pipeline as a set of artefacts.

What Morgan still owns: deciding the target is meaningful, deciding the features are complete, interpreting the feature-importance report, deciding whether the model’s accuracy is good enough for production, and monitoring for drift.

The pitch isn’t “no ML knowledge required”; it’s “the ML mechanics are taken care of, so the domain expert can focus on the domain.”

There’s a second framing worth naming: Canvas is also AWS’s answer to “citizen data scientist” workflows, teams where business analysts want to do ML, and where giving them a tool that emits models an ML engineer can later productionise (or rebuild) is better than either blocking on the data team or letting the analysts build shadow-IT stacks in Excel + Jupyter. Canvas produces artefacts (model, recipe, training dataset definition) that can be picked up by an ML engineer in Studio for further work.

What we’ll filter on

Five filters for “is Canvas the correct tool here”:

  1. User technical level. SQL-literate analyst, Python-literate data scientist, neither?
  2. Problem type, tabular classification/regression, time series, text, image, multi-modal?
  3. Data location. Redshift, Snowflake, S3, Salesforce, other SaaS connectors?
  4. Where the predictions need to go, batch file, Salesforce, another SaaS, a SageMaker endpoint?
  5. Handoff story, does an ML engineer need to take over later?

The no-code-ML landscape

1. SageMaker Canvas. The AWS-native no-code ML tool. Lives in SageMaker Studio (the analyst launches it from the SageMaker console); supports tabular classification/regression, time-series forecasting, image classification, document classification, and, via Bedrock integration, natural-language tasks (generation, extraction, Q&A) powered by foundation models. Data import from S3, Redshift, Snowflake, Salesforce, Databricks, SageMaker Data Wrangler, and local upload. Output: a model, hosted on a Canvas-managed SageMaker endpoint, with optional export to Studio for handoff.

2. SageMaker Autopilot (direct). The AutoML engine Canvas sits on top of. Can be invoked from the SageMaker SDK or the console with fewer guardrails. Better for data scientists who want AutoML’s results with more control; overkill for Morgan’s case.

3. Amazon QuickSight Q. Natural-language analytics tool; good at “why did revenue drop last quarter” kinds of questions on prepared datasets. Adjacent to the space Canvas occupies but not a model-building tool, it’s a querying and visualisation tool.

4. Bedrock with a prompt. For problems that are actually foundation-model problems (classification of free-text, extraction from documents, summarisation), calling a Bedrock model via the console or a Lambda can short-circuit the “do I need a custom model at all” question. Morgan’s churn case isn’t this; foundation models aren’t the correct tool for tabular 200k-row classification.

5. Third-party AutoML (DataRobot, H2O.ai, Dataiku, etc.). Competing in the same space. Available on AWS Marketplace; outside AWS’s native integration story. Mature tools, often more polished UIs, but another vendor relationship.

6. “Just learn Python.” Always on the list. Morgan’s time is the constraint; a six-week Python bootcamp is not cheaper than using Canvas.

Side by side

Option User level Problem types Data sources Deployment Handoff
Canvas SQL-literate analyst Tabular, time series, image, text, FM tasks Redshift, Snowflake, Salesforce, S3, local Canvas endpoint, export to Studio, Salesforce, batch Exportable to Studio
Autopilot direct Data scientist Tabular primarily S3 SageMaker endpoint Studio-native
QuickSight Q Analyst Queries on prepared data QuickSight data sets Dashboards n/a
Bedrock direct Any FM-shaped (text gen/classification/extraction) n/a Bedrock API n/a
Third-party AutoML Analyst Broader Varies Varies Third-party tooling

Reading the table against Morgan’s case: Canvas fits cleanly. Tabular classification, Redshift data source, Salesforce deployment target, handoff to Studio if the data team wants to take the model over later. None of the others fit as well: Autopilot direct is the same engine with less support; QuickSight Q doesn’t build predictive models; Bedrock isn’t the correct tool for structured tabular classification; third-party AutoML introduces a vendor.

From SQL to scored subscribers

Morgan's path: SQL to scored subscribers Redshift subscriber_activity 200k rows × 36 months target: churned_next_60d features: seats, usage, tenure 1. Import in Canvas SQL query in UI preview rows null & distribution summary Morgan: minutes 2. Review features spot data issues drop id columns confirm target Morgan: minutes 3. Quick build 2-15 min on ~200k rows Autopilot trains candidates picks the best Canvas: hands off 4. Review results model F1, accuracy, AUC feature-importance chart sanity-check top features confusion matrix Morgan: interprets 5. Deploy or export 1-click → Canvas endpoint or → SageMaker Studio or → download model.tar.gz Morgan: picks 6. Batch predictions weekly schedule read weekly snapshot write scored output Canvas: automated Salesforce churn_score field populated on Account retention team dashboard Salesforce connector Handoff path (if data team takes over) Canvas exports: the trained model artefact, the training dataset definition, and the preprocessing recipe. An ML engineer opens the export in SageMaker Studio, inspects the model, rebuilds it with more control (feature engineering, model family choice, hyperparameter tuning), and deploys via the team's MLOps pipeline. The domain logic Morgan encoded in the SQL query and the feature choices becomes the starting point, not throwaway work.
Canvas is a workflow, not a magic button. Morgan does the domain work (data, target, feature review, result interpretation); Canvas does the ML mechanics; the handoff path keeps the work portable if the data team takes over.

The pick in depth

Morgan’s Canvas workflow, end to end.

  1. Open Canvas. From the SageMaker console, Studio launches with the Canvas app. Morgan’s IAM role needs canvas:* plus the Redshift connector permissions (redshift-data:ExecuteStatement, redshift-data:GetStatementResult, access to a Secrets Manager secret with the Redshift credentials).

  2. Import the dataset. Canvas’s data-import wizard connects to Redshift via the connector, lets Morgan write a SQL query:
    SELECT subscriber_id, tenure_months, seat_count, seat_count_lag_1, seat_count_lag_2,
           active_days_last_30, support_tickets_90d, mrr_usd, plan_tier,
           churned_next_60d
    FROM analytics.subscriber_monthly_features
    WHERE month_end BETWEEN '2024-09-01' AND '2027-08-31';
    

    Canvas runs the query, imports the result as a named dataset, shows row count, column types, distribution summaries, null counts.

  3. Create a model. Morgan clicks “New model,” picks “Predictive analysis” (as opposed to “Image,” “Text,” or “Forecasting”). Canvas reads the dataset, detects column types, and asks for the target column. Morgan picks churned_next_60d. Canvas auto-detects it as binary classification.

  4. Review the feature set. Canvas shows each column’s impact-on-target summary. Morgan removes subscriber_id (not a feature), confirms plan_tier should be treated as categorical, confirms others.

  5. Build. Canvas offers “Quick build” (2-15 min, smaller search) or “Standard build” (2-4 hours, larger search). For a 200k-row dataset, Quick build is usually good enough for the first iteration. Canvas kicks off the Autopilot job.

  6. Review the model. Canvas reports accuracy, F1, precision, recall, AUC. Shows feature importance (SHAP-based): seat_count_lag_1 and seat_count_lag_2 at the top, which matches Morgan’s hypothesis. Confusion matrix shows the model catches 68% of churners at a 30% false-positive rate. Morgan iterates once, rebuilds after dropping three irrelevant features, and is satisfied.

  7. Generate predictions. Morgan picks “Predict” and selects a weekly snapshot from Redshift via the same connector. Canvas batch-scores and writes to an S3 output, which a pre-configured Salesforce sync pulls nightly.

Timeline: a focused week, not a quarter.

When Morgan should hand off to the data team. When the model needs to scale to dozens of variants, when it needs to be retrained on a strict schedule with validation gates, when the deployment is mission-critical, when Feature (ML)An input variable to a model – the numeric or categorical signals you compute from raw data and feed in. requires code. Canvas is for the first working version; production-scale ownership should belong to the team with the tooling to own it.

When not to use Canvas. When the problem isn’t tabular at Canvas’s supported scale. When the analyst is comfortable in Python and would prefer Studio. When the organisation has policies against business-unit teams deploying production models independently. When the model needs capabilities Canvas doesn’t offer (custom loss functions, specific architectures, particular feature-engineering techniques).

A worked example

Morgan’s model, six weeks after go-live:

  • Scored subscribers weekly: 205,000, including new sign-ups.
  • Retention team actioned: top-scoring 500/week, outreach, check-in calls, discount offers.
  • Measured retention improvement: +3.4 percentage points in the “likely to churn” cohort vs a control group unreached.
  • Morgan’s new role: owns the model; reviews monthly performance; iterates with new features quarterly.

When the data team unblocks, Morgan hands off the Canvas export. The ML engineer opens the exported notebook in Studio, inspects the model, notes that Canvas chose an XGBoost candidate, and rebuilds with additional feature engineering (lag features computed in Redshift rather than relying on the SQL query’s lags) and a monthly retraining pipeline. The domain work Morgan did carries forward.

What’s worth remembering

  1. Canvas sits on top of Autopilot. No-code UI on top of AWS’s AutoML engine. Same model family choices, same cross-validation, same metrics, different interface.
  2. Canvas is for citizen data scientists. SQL-literate analysts, domain experts, teams without ML capacity. Not a replacement for data scientists; an enabler for the step before one.
  3. Five problem types supported. Tabular classification/regression, time-series forecasting, image classification, document classification, and Bedrock-powered natural-language tasks.
  4. Data-source connectors are the value. Redshift, Snowflake, Salesforce, Databricks, and S3 are first-class; data imports happen in the UI without Python.
  5. Exports keep work portable. A Canvas model exports to Studio as a notebook with the training dataset, recipe, and artefact; an ML engineer can take it over without starting from scratch.
  6. Deployment options are tiered. Canvas endpoint for quick use, batch predictions for scheduled scoring, Salesforce sync for CRM integration, export to Studio for MLOps-grade production.
  7. Still need to interpret results. Canvas automates the training; it doesn’t automate the judgement about whether the model is good enough, the features are complete, or the target is meaningful. That’s still the analyst’s job.
  8. Use responsibly. No-code ML can deploy bad models as easily as good ones. Monitoring, drift detection, and clear sunset criteria belong in the workflow from day one.

Canvas isn’t magic. It’s a pragmatic bridge for analysts who know their domain better than anyone else on the team, have a real question, and have been blocked on engineering capacity. Get the model live; prove the value; hand it to an ML engineer if it’s mission-critical. The analyst’s domain knowledge is the scarce resource; Canvas makes it productive without a bootcamp.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.