Can You Build an ETL System Without Code?

Testing AWS Glue Studio + Amazon AppFlow for Real Data Pipelines


 
Introduction: Is No-Code Data Engineering Really Possible?

What if you could build an entire data pipeline — extracting data from a source system, transforming it, and loading it into cloud storage — without writing a single line of code?

For data engineers, that sounds like science fiction. We’re used to living in PySpark scripts, SQL notebooks, and shell commands. But with the rise of no-code data engineering, this idea is quickly becoming reality — especially with tools like Amazon AppFlow and AWS Glue Studio.

These AWS-native services claim to let you move and transform data visually, through drag-and-drop interfaces and point-and-click options.

In this blog post, we’ll test whether it’s truly possible to build a working ETL pipeline — entirely without code. We’ll use Salesforce as our data source, Amazon AppFlow to extract and load the data into S3, and AWS Glue Studio to transform and prepare it for analytics.


🎯 Real-World Use Case: Customer Data from Salesforce

Let’s say you work for a fast-growing e-commerce business.

Your sales and marketing teams use Salesforce CRM to manage all customer records — including names, emails, signup dates, and region information. They want this data to be available in S3, cleaned, transformed, and partitioned by region — so analysts can run queries using Athena and build dashboards in QuickSight.

But here’s the catch: the dataset includes PII (Personally Identifiable Information), and you need to mask sensitive fields like email and phone number. Plus, the marketing team doesn’t want to include inactive users in the final output.

Normally, this would require days of development work. But today, we’ll build this entire ETL flow without touching any code — just using AppFlow + Glue Studio.


Step 1: Extracting Data from Salesforce with Amazon AppFlow

Our first step is to bring customer data from Salesforce into AWS. Instead of writing a connector or using a third-party ETL tool, we use Amazon AppFlow — a no-code service that lets you securely transfer data between SaaS applications (like Salesforce) and AWS services (like S3, Redshift, EventBridge).

Connecting to Salesforce

Inside the AppFlow console:

  • We create a new flow.

  • Select Salesforce as the source system.

  • Authenticate using secure OAuth-based login.

  • Choose the object we want to pull — in this case, let’s say it’s the “Contact” or “Leads” object, which holds customer information.

Here’s a sample of the kind of data Salesforce stores:

full_nameemailphonestatussignup_dateregion
Priya Daspriya@email.com8882221111active2022-08-09India
Mark Brownmark@email.com9876543210inactive2023-06-18EMEA
Sarah Paulsarah@email.com1234567890active2024-12-01APAC
No-Code Transformations in AppFlow

Without writing any code, we apply these built-in transformations:

  • Field mapping: Rename full_name to customer_name

  • Filtering: Exclude users where status = inactive

  • Data masking: Automatically mask email and phone fields for privacy

Then, we choose Amazon S3 as the destination and pick a bucket like s3://nocode-etl-demo/raw-zone. The data can be delivered in CSV, JSON, or Parquet format.

Finally, we schedule the flow to run daily, ensuring fresh data lands in S3 every 24 hours — completely without writing code or setting up cron jobs.

Result: We now have clean, filtered Salesforce data in S3.


Step 2: Transforming the Data with AWS Glue Studio

With raw customer data now available in S3, the next step is transformation — preparing it for analytics. For this, we use AWS Glue Studio, a visual interface that allows you to create ETL jobs without coding.

Inside Glue Studio, we:

  • Create a new visual job using the “source and target” template

  • Choose the S3 bucket where AppFlow saved our data

  • Glue automatically infers the schema (e.g., customer_name, email, region, etc.)

Visual ETL Transformations (No Code!)

Now comes the transformation logic:

  • DropFields Node: We remove sensitive fields like ssn or phone_number entirely.

  • ApplyMapping Node: Rename and reformat fields as needed. For example, ensure signup_date is ISO-formatted.

  • Partitioning Node: Organize data by region for faster querying in Athena.

No Python, no PySpark — just click-and-configure.

Finally, we define the target S3 path as the “processed zone,” and select Parquet as the output format, which is optimized for columnar storage and query performance.

We hit “Run,” and Glue handles the rest — spinning up Spark jobs, transforming the data, and storing the clean result in a new S3 folder.


📊 Architecture Overview

Here’s what we’ve built — visually and completely without code:


A flowchart-style diagram showing a no-code ETL pipeline: Salesforce CRM connects to Amazon AppFlow for data ingestion, which sends raw data to an S3 Raw Data Zone. From there, AWS Glue Studio performs visual data transformations, and the output is stored in an S3 Processed Zone as partitioned Parquet files.

 Observations: Is This Really Production-Ready?

Surprisingly — yes, at least for a large class of use cases.

If your project involves simple extraction, filtering, renaming, and delivering to S3 or Redshift — this no-code setup is fast, secure, and repeatable.

You can schedule it, monitor it using CloudWatch, and even integrate notifications if anything fails.

✅ Where It Shines:
  • Daily batch pipelines

  • SaaS to S3 integrations (Salesforce, Zendesk, Google Analytics, Slack, etc.)

  • Light transformations (filtering, mapping, masking)

  • Easy onboarding for non-engineers

⚠️ Where It Falls Short:
  • Advanced joins across datasets

  • Dynamic logic or custom business rules

  • Real-time streaming or event processing

  • Machine learning-based transformations

For these complex cases, you’ll still need PySpark or code-based Glue jobs.

But for day-to-day marketing, sales, finance, or analytics pipelines, no-code ETL can save time, reduce errors, and make data engineering more collaborative.


Final Thoughts: No-Code ETL Is Not a Dream — It’s Already Here

With tools like Amazon AppFlow and AWS Glue Studio, data engineers can now build production-grade pipelines faster than ever — and in some cases, without writing a single line of code.

That doesn’t mean code is going away. But it means that not every data pipeline has to start with a script. You can focus your coding efforts where they matter most — and let visual tools handle the rest.

So next time you get a request like “Can we get cleaned customer data from Salesforce into S3 daily?” — consider answering it without opening your IDE.


Bonus: What Else Can You Try?
  • Connect AppFlow + Slack to track live campaign feedback

  • Use Glue Studio + Athena to power no-code dashboards

  • Combine with Redshift for near real-time reporting