Working with APIs in Python

Introduction to Python for DE
Setting Up Your Environment
Python and Pandas for ETL
Handling JSON and CSV Files
Python with AWS SDK (Boto3)
Python & SQL (with SQLite/MySQL)
Data Cleaning with Pandas
Working with APIs in Python
Building Batch Jobs in Python
Real-Time Data Pipelines with Python
Logging & Error Handling in Python
ETL Jobs with Cron and AWS Lambda

Working with APIs in Python: A Data Engineer's Complete Guide

APIs are at the heart of modern data engineering. Whether you’re pulling real-time stock data, querying a weather service, syncing with a CRM like Salesforce, or accessing cloud storage logs, APIs make it all possible. As a data engineer, learning how to work with APIs in Python is essential for automating workflows and building scalable data pipelines.

In this guide, we’ll take a deep dive into:

What APIs are and how they work
Understanding RESTful APIs
Using Python’s requests library
Making GET and POST requests
Working with API authentication (API keys, tokens, etc.)
Handling JSON responses
Paginated APIs and rate limits
Real-world use cases (e.g., public APIs, cloud services, internal APIs)
Error handling and retries
Best practices for data engineers

Let’s Begin !.

What is an API?

An API (Application Programming Interface) is a set of rules that lets one software application talk to another. APIs are commonly used to expose data or services to other systems.

Most web APIs use the HTTP protocol and follow the REST (Representational State Transfer) standard. This means you interact with them using endpoints like:

GET https://api.example.com/users
POST https://api.example.com/login

Common HTTP Methods

GET – retrieve data
POST – send data
PUT – update data
DELETE – remove data

Python Requests: The Essential Tool

The requests library is the most commonly used Python package for interacting with APIs.

Install it:

pip install requests

Import it in your script:

import requests

Making Your First API Call

Here’s an example using the JSONPlaceholder test API:

import requests

url = 'https://jsonplaceholder.typicode.com/posts/1'
response = requests.get(url)

print(response.status_code)  # 200 OK
print(response.json())

JSON Responses

Most APIs return data in JSON (JavaScript Object Notation) format. Python can handle JSON natively.

Example:

data = response.json()
print(data['title'])

POST Requests – Sending Data

You’ll use POST to send data, such as to register a user or upload info.

payload = {'name': 'Alice', 'email': 'alice@example.com'}
response = requests.post('https://api.example.com/users', json=payload)

API Authentication

APIs often require some form of authentication:

1. API Key

headers = {'Authorization': 'Bearer YOUR_API_KEY'}
requests.get(url, headers=headers)

2. Token-based Auth (OAuth2, JWT)

Tokens are usually passed in the header or query string.

Handling Pagination

APIs may limit the number of records per call. You’ll need to paginate through results.

Example (GitHub API):

url = 'https://api.github.com/users/octocat/repos?page=1&per_page=100'
while url:
    res = requests.get(url)
    data = res.json()
    process(data)  # Your logic here
    url = res.links.get('next', {}).get('url')  # GitHub-style pagination

Rate Limiting

Some APIs limit how many calls you can make in a period. Respect rate limits using:

import time

for i in range(10):
    response = requests.get('https://api.example.com/data')
    if response.status_code == 429:  # Too Many Requests
        time.sleep(60)
    else:
        process(response.json())

Error Handling

Always include error handling in production code:

try:
    res = requests.get(url)
    res.raise_for_status()
    data = res.json()
except requests.exceptions.RequestException as e:
    print("API error:", e)

Working with Public APIs

There are many free APIs available:

OpenWeatherMap (weather data)
CoinGecko (cryptocurrency prices)
COVID19 API
NewsAPI

Example:

import requests

url = 'https://api.coindesk.com/v1/bpi/currentprice.json'
res = requests.get(url)
data = res.json()
print("Bitcoin Price:", data['bpi']['USD']['rate'])

Real-World API Use Cases for Data Engineers

Data Ingestion Pipelines – Pull stock prices, weather, e-commerce transactions.
Data Enrichment – Append IP geolocation, user metadata.
Monitoring – Get logs or metrics from AWS CloudWatch, Datadog, etc.
Cloud Operations – Manage AWS EC2 instances or S3 buckets via the API.

Advanced: Using `Session` and Retry

To avoid repeating headers, use requests.Session():

session = requests.Session()
session.headers.update({"Authorization": "Bearer MY_API_KEY"})
res = session.get("https://api.example.com/data")

Add retry logic:

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry = Retry(total=5, backoff_factor=0.3)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

Saving API Data to File or Database

import json

res = requests.get("https://api.example.com/data")
data = res.json()

with open("data.json", "w") as f:
    json.dump(data, f)

Or load into a Pandas DataFrame:

import pandas as pd

df = pd.json_normalize(data)
df.to_csv("api_data.csv", index=False)

Best Practices

Keep API keys secure using environment variables or AWS Secrets Manager
Respect rate limits
Log your requests and responses
Use retry logic for unstable networks
Validate JSON responses before accessing fields
Modularize API logic using functions or classes

Conclusion

APIs open up endless possibilities in data engineering. From collecting data in real-time to automating backend operations, knowing how to work with APIs in Python gives you a serious edge.

You now know how to use the requests library, handle JSON, authenticate securely, manage pagination and errors, and build real-world API workflows.