Working with APIs in Python: A Data Engineer's Complete Guide

APIs are at the heart of modern data engineering. Whether you’re pulling real-time stock data, querying a weather service, syncing with a CRM like Salesforce, or accessing cloud storage logs, APIs make it all possible. As a data engineer, learning how to work with APIs in Python is essential for automating workflows and building scalable data pipelines.

In this guide, we’ll take a deep dive into:

  • What APIs are and how they work

  • Understanding RESTful APIs

  • Using Python’s requests library

  • Making GET and POST requests

  • Working with API authentication (API keys, tokens, etc.)

  • Handling JSON responses

  • Paginated APIs and rate limits

  • Real-world use cases (e.g., public APIs, cloud services, internal APIs)

  • Error handling and retries

  • Best practices for data engineers

Let’s Begin !.


What is an API?

An API (Application Programming Interface) is a set of rules that lets one software application talk to another. APIs are commonly used to expose data or services to other systems.

Most web APIs use the HTTP protocol and follow the REST (Representational State Transfer) standard. This means you interact with them using endpoints like:

GET https://api.example.com/users
POST https://api.example.com/login

Common HTTP Methods
  • GET – retrieve data

  • POST – send data

  • PUT – update data

  • DELETE – remove data


Python Requests: The Essential Tool

The requests library is the most commonly used Python package for interacting with APIs.

Install it:

pip install requests

Import it in your script:

import requests

Making Your First API Call

Here’s an example using the JSONPlaceholder test API:

import requests

url = 'https://jsonplaceholder.typicode.com/posts/1'
response = requests.get(url)

print(response.status_code)  # 200 OK
print(response.json())

JSON Responses

Most APIs return data in JSON (JavaScript Object Notation) format. Python can handle JSON natively.

Example:

data = response.json()
print(data['title'])

✉️ POST Requests – Sending Data

You’ll use POST to send data, such as to register a user or upload info.

payload = {'name': 'Alice', 'email': 'alice@example.com'}
response = requests.post('https://api.example.com/users', json=payload)

API Authentication

APIs often require some form of authentication:

1. API Key
headers = {'Authorization': 'Bearer YOUR_API_KEY'}
requests.get(url, headers=headers)
2. Token-based Auth (OAuth2, JWT)

Tokens are usually passed in the header or query string.


Handling Pagination

APIs may limit the number of records per call. You’ll need to paginate through results.

Example (GitHub API):

url = 'https://api.github.com/users/octocat/repos?page=1&per_page=100'
while url:
    res = requests.get(url)
    data = res.json()
    process(data)  # Your logic here
    url = res.links.get('next', {}).get('url')  # GitHub-style pagination

Rate Limiting

Some APIs limit how many calls you can make in a period. Respect rate limits using:

import time

for i in range(10):
    response = requests.get('https://api.example.com/data')
    if response.status_code == 429:  # Too Many Requests
        time.sleep(60)
    else:
        process(response.json())

🛑 Error Handling

Always include error handling in production code:

try:
    res = requests.get(url)
    res.raise_for_status()
    data = res.json()
except requests.exceptions.RequestException as e:
    print("API error:", e)

Working with Public APIs

There are many free APIs available:

  • OpenWeatherMap (weather data)

  • CoinGecko (cryptocurrency prices)

  • COVID19 API

  • NewsAPI

Example:

import requests

url = 'https://api.coindesk.com/v1/bpi/currentprice.json'
res = requests.get(url)
data = res.json()
print("Bitcoin Price:", data['bpi']['USD']['rate'])

Real-World API Use Cases for Data Engineers
  1. Data Ingestion Pipelines – Pull stock prices, weather, e-commerce transactions.

  2. Data Enrichment – Append IP geolocation, user metadata.

  3. Monitoring – Get logs or metrics from AWS CloudWatch, Datadog, etc.

  4. Cloud Operations – Manage AWS EC2 instances or S3 buckets via the API.


Advanced: Using Session and Retry

To avoid repeating headers, use requests.Session():

session = requests.Session()
session.headers.update({"Authorization": "Bearer MY_API_KEY"})
res = session.get("https://api.example.com/data")

Add retry logic:

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry = Retry(total=5, backoff_factor=0.3)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

Saving API Data to File or Database
import json

res = requests.get("https://api.example.com/data")
data = res.json()

with open("data.json", "w") as f:
    json.dump(data, f)

Or load into a Pandas DataFrame:

import pandas as pd

df = pd.json_normalize(data)
df.to_csv("api_data.csv", index=False)

Best Practices
  • Keep API keys secure using environment variables or AWS Secrets Manager

  • Respect rate limits

  • Log your requests and responses

  • Use retry logic for unstable networks

  • Validate JSON responses before accessing fields

  • Modularize API logic using functions or classes


Conclusion

APIs open up endless possibilities in data engineering. From collecting data in real-time to automating backend operations, knowing how to work with APIs in Python gives you a serious edge.

You now know how to use the requests library, handle JSON, authenticate securely, manage pagination and errors, and build real-world API workflows.