ARTICLE AD BOX
Welcome backmost to Day 10 of nan 12 Days of DigitalOcean! Yesterday, we taught your app to extract accusation from email contented utilizing DigitalOcean’s GenAI agent. That was a immense step, but let’s look it—receipts and invoices don’t ever unrecorded successful nan email body. More often than not, they’re attachments.
Today, we’re going to grip that. We’ll thatch your app really to extract these attachments, prevention them securely to DigitalOcean Spaces, and make nationalist URLs for each file. These URLs will yet beryllium stored successful our database, allowing you to preview nan attached files erstwhile reviewing expenses.
Let’s dive in.
🚀 What You’ll Learn
By nan extremity of today’s session, you’ll cognize really to:
- Create a DigitalOcean Space to shop attachments.
- Extract and decode Base64-encoded attachments from Postmark emails.
- Upload attachments to DigitalOcean Spaces utilizing boto3.
- Generate unsocial grounds names pinch uuid to forestall overwrites.
- Orchestrate nan afloat workflow to grip aggregate attachments seamlessly.
🛠 What You’ll Need
To get nan astir retired of this tutorial, we presume nan following:
- A Flask App Already Deployed connected DigitalOcean: If you haven’t deployed a Flask app yet, you tin recreation nan instructions successful Day 7 - Building and Deploying nan Email-Based Receipt Processor.
- Postmark Configured for Email Testing: To proceedings nan email-to-receipt processing pipeline, you’ll petition Postmark group up to guardant emails to your Flask app. See Day 8 - Connecting Postmark to Your Flask App for a step-by-step guide.
- DigitalOcean Spaces Setup: We’ll shop processed attachments successful a DigitalOcean Space. If you don’t personification a Space yet, we’ll line you done creating 1 successful this tutorial.
<$> [info] Note: Even if you don’t personification everything group up, you’ll still study really to:
-
Create a DigitalOcean Space to shop attachments.
-
Decode Base64-encoded attachments programmatically.
-
Upload files to DigitalOcean Spaces utilizing boto3.
-
Seamlessly merge attachment handling into your Flask app.
<$>
Step 1: Create a DigitalOcean Space
First, we petition a spot to shop our attachments. DigitalOcean Spaces is an entity retention service, cleanable for securely handling files for illustration receipts and invoices. It’s scalable, secure, and integrates seamlessly pinch our app.
Create nan Space
-
Log successful to nan DigitalOcean dashboard, and click connected Spaces Object Storage.
-
Then, click Create Bucket.
-
Choose a Region adjacent to your users (e.g., nyc3 for New York).
-
Name your Space (e.g., email-receipts)
This will create your bucket named email-receipts disposable astatine a URL for illustration https://email-receipts.nyc3.digitaloceanspaces.com
Generate Access Keys
To interact pinch your Space programmatically (e.g., via boto3), you’ll petition an Access Key and Secret Key.
-
Open your Space, click Settings, and scroll to Access Keys.
-
Click Create Access Key.
-
Set Permissions to All Permissions, truthful our app tin read, write, and delete files.
-
Name nan cardinal (or usage nan default) and click Create Access Key.
-
Save nan Access Key and Secret Key—this is nan only clip you’ll spot nan Secret Key!
Update Environment Variables
In nan DigitalOcean App Platform dashboard:
-
Go to Settings > Environment Variables.
-
Add nan following:
-
SPACES_ACCESS_KEY: Your Spaces Access Key ID.
-
SPACES_SECRET_KEY: Your Spaces Secret Key.
-
SPACES_BUCKET_NAME: The punishment of your Space (e.g., email-receipts).
-
SPACES_REGION: The region of your Space (e.g., nyc3).
-
Step 2: Process and Upload Attachments to DigitalOcean Spaces
To grip attachments successful your app, we’ll update our app.py and represent a less caller functions. Each usability serves a circumstantial purpose, from decoding attachments to uploading them to DigitalOcean Spaces. Let’s locomotion done these 1 by one.
Decode and Save Attachments
Postmark sends attachments arsenic Base64-encoded accusation incorrect nan JSON payload. The first measurement is decoding this accusation and redeeming it locally utilizing Python’s base64 library. This usability ensures each grounds gets a unsocial punishment pinch nan thief of nan uuid library.
What is Base64? It’s for illustration a translator for binary files (like PDFs). It converts them into a plain matter format that’s safe to nonstop complete nan web. Once we decode it backmost into binary, we tin grip it conscionable for illustration immoderate regular file.
Where Do Files Get Saved?: We’ll temporarily prevention nan decoded files successful /tmp. It’s a short-term retention directory disposable connected astir systems. Think of it for illustration a scratchpad—it’s cleanable for short-term use, and everything gets cleared erstwhile nan app stops running.
Here’s nan usability to decode nan attachment, guarantee nan filename is unsocial (thanks to uuid), and prevention it successful /tmp.
import os import base64 import uuid def decode_and_save_attachment(attachment): """Decode base64-encoded attachment and prevention it locally pinch a unsocial name.""" file_name = attachment.get("Name") encoded_content = attachment.get("Content") if not file_name or not encoded_content: logging.warning("Invalid attachment, skipping.") return None unique_file_name = f"{uuid.uuid4()}_{file_name}" file_path = os.path.join("/tmp", unique_file_name) try: with open(file_path, "wb") as file: file.write(base64.b64decode(encoded_content)) logging.info(f"Attachment saved locally: {file_path}") return file_path except Exception as e: logging.error(f"Failed to decode and prevention attachment {file_name}: {e}") return None
Upload Attachments to DigitalOcean Spaces
Now that we’ve decoded and saved nan files, nan adjacent measurement is uploading them to DigitalOcean Spaces. We’ll usage boto3, a powerful Python SDK for moving pinch AWS-compatible APIs, to grip nan upload. Spaces useful conscionable for illustration an S3 bucket, truthful it’s a cleanable fit.
This usability uploads nan grounds to your Space and returns a nationalist URL.
import boto3 def upload_attachment_to_spaces(file_path): """Upload a grounds to DigitalOcean Spaces and return its nationalist URL.""" file_name = os.path.basename(file_path) object_name = f"email-receipt-processor/{file_name}" try: s3_client.upload_file(file_path, SPACES_BUCKET, object_name, ExtraArgs={"ACL": "public-read"}) file_url = f"https://{SPACES_BUCKET}.{SPACES_REGION}.cdn.digitaloceanspaces.com/{object_name}" logging.info(f"Attachment uploaded to Spaces: {file_url}") return file_url except Exception as e: logging.error(f"Failed to upload attachment {file_name} to Spaces: {e}") return None
Process Multiple Attachments
Let’s bring it each together. This usability orchestrates everything:
- Decodes each attachment.
- Uploads it to Spaces.
- Collects nan URLs for nan uploaded files.
def process_attachments(attachments): """Process each attachments and return their URLs.""" attachment_urls = [] for attachment in attachments: file_path = decode_and_save_attachment(attachment) if file_path: file_url = upload_attachment_to_spaces(file_path) if file_url: attachment_urls.append({"file_name": os.path.basename(file_path), "url": file_url}) os.remove(file_path) return attachment_urls
Update nan /inbound Route
Finally, update nan /inbound measurement to spot attachment handling. This measurement will now grip email contented processing, attachment decoding and uploading, and returning nan past response.
@app.route('/inbound', methods=['POST']) def handle_inbound_email(): """Process inbound emails and return extracted JSON.""" logging.info("Received inbound email request.") accusation = request.json email_content = data.get("TextBody", "") attachments = data.get("Attachments", []) if not email_content: logging.error("No email contented provided.") return jsonify({"error": "No email contented provided"}), 400 extracted_data = extract_text_from_email(email_content) attachment_urls = process_attachments(attachments) response_data = { "extracted_data": extracted_data, "attachments": attachment_urls } logging.info("Final Response Data: %s", response_data) return jsonify(response_data)
Final Complete Code
Here’s nan afloat app.py grounds pinch each nan updates:
from flask import Flask, request, jsonify import os import base64 import uuid import boto3 from dotenv import load_dotenv from openai import OpenAI import logging load_dotenv() app = Flask(__name__) logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') SECURE_AGENT_KEY = os.getenv("SECURE_AGENT_KEY") AGENT_BASE_URL = os.getenv("AGENT_BASE_URL") AGENT_ENDPOINT = f"{AGENT_BASE_URL}/api/v1/" client = OpenAI(base_url=AGENT_ENDPOINT, api_key=SECURE_AGENT_KEY) SPACES_ACCESS_KEY = os.getenv("SPACES_ACCESS_KEY") SPACES_SECRET_KEY = os.getenv("SPACES_SECRET_KEY") SPACES_BUCKET = os.getenv("SPACES_BUCKET_NAME") SPACES_REGION = os.getenv("SPACES_REGION") SPACES_ENDPOINT = f"https://{SPACES_BUCKET}.{SPACES_REGION}.digitaloceanspaces.com" session = boto3.session.Session() s3_client = session.client( 's3', region_name=SPACES_REGION, endpoint_url=SPACES_ENDPOINT, aws_access_key_id=SPACES_ACCESS_KEY, aws_secret_access_key=SPACES_SECRET_KEY ) def extract_text_from_email(email_content): """Extract applicable specifications from nan email contented utilizing DigitalOcean GenAI.""" logging.debug("Extracting specifications from email content.") punctual = ( "Extract nan pursuing specifications from nan email:\n" "- Date of transaction\n" "- Amount\n" "- Currency\n" "- Vendor name\n\n" f"Email content:\n{email_content}\n\n" "Ensure nan output is successful JSON format pinch keys: date, amount, currency, vendor." ) consequence = client.chat.completions.create( model="your-model-id", messages=[{"role": "user", "content": prompt}] ) logging.debug("GenAI processing completed.") return response.choices[0].message.content def decode_and_save_attachment(attachment): """Decode base64-encoded attachment and prevention it locally pinch a unsocial name.""" file_name = attachment.get("Name") encoded_content = attachment.get("Content") if not file_name or not encoded_content: logging.warning("Invalid attachment, skipping.") return None unique_file_name = f"{uuid.uuid4()}_{file_name}" file_path = os.path.join("/tmp", unique_file_name) try: with open(file_path, "wb") as file: file.write(base64.b64decode(encoded_content)) logging.info(f"Attachment saved locally: {file_path}") return file_path except Exception as e: logging.error(f"Failed to decode and prevention attachment {file_name}: {e}") return None def upload_attachment_to_spaces(file_path): """Upload a grounds to DigitalOcean Spaces and return its nationalist URL.""" file_name = os.path.basename(file_path) object_name = f"email-receipt-processor/{file_name}" try: s3_client.upload_file(file_path, SPACES_BUCKET, object_name, ExtraArgs={"ACL": "public-read"}) file_url = f"https://{SPACES_BUCKET}.{SPACES_REGION}.cdn.digitaloceanspaces.com/{object_name}" logging.info(f"Attachment uploaded to Spaces: {file_url}") return file_url except Exception as e: logging.error(f"Failed to upload attachment {file_name} to Spaces: {e}") return None def process_attachments(attachments): """Process each attachments and return their URLs.""" attachment_urls = [] for attachment in attachments: file_path = decode_and_save_attachment(attachment) if file_path: file_url = upload_attachment_to_spaces(file_path) if file_url: attachment_urls.append({"file_name": os.path.basename(file_path), "url": file_url}) os.remove(file_path) return attachment_urls @app.route('/inbound', methods=['POST']) def handle_inbound_email(): """Process inbound emails and return extracted JSON.""" logging.info("Received inbound email request.") accusation = request.json email_content = data.get("TextBody", "") attachments = data.get("Attachments", []) if not email_content: logging.error("No email contented provided.") return jsonify({"error": "No email contented provided"}), 400 extracted_data = extract_text_from_email(email_content) attachment_urls = process_attachments(attachments) response_data = { "extracted_data": extracted_data, "attachments": attachment_urls } logging.info("Final Response Data: %s", response_data) return jsonify(response_data) if __name__ == "__main__": logging.info("Starting Flask application.") app.run(port=5000)
Step 3: Deploy to DigitalOcean
To deploy nan updated Flask app, recreation nan steps from Day 7. Here’s a speedy summary:
-
Push Your Updated Code to GitHub: After making nan basal changes to your Flask app, perpetrate and push nan updated codification to GitHub. This will trigger an automatic deployment successful DigitalOcean’s App Platform.
git add . git perpetrate -m "Add attachment processing pinch DigitalOcean Spaces" git push guidelines main -
Monitor Deployment: You tin measurement nan advancement successful nan Deployments conception of your app’s dashboard.
-
Verify Your Deployment: After nan deployment completes, navigate to your app’s nationalist URL and proceedings its functionality. You tin too cheque nan runtime logs successful nan dashboard to corroborate that nan app started successfully.
Step 4: Test nan Entire Workflow
Now that your app is afloat configured and ready, it’s clip to proceedings nan afloat workflow. We’ll guarantee that nan email assemblage is processed, attachments are decoded and uploaded to DigitalOcean
Spaces, and nan past output includes everything we need.
Here’s really you tin proceedings measurement by step:
-
Send a Test Email: Send an email to Postmark pinch a matter assemblage and an attachment. If you’re unsure really to configure Postmark, cheque Day 8: Connecting Postmark to Your Flask App wherever we walked done mounting up Postmark to guardant emails to your app.
-
Check Postmark Activity JSON: In nan Postmark dashboard, navigate to nan Activity tab. Locate nan email you sent, and guarantee that nan JSON payload includes nan matter assemblage and Base64-encoded attachment data. This confirms Postmark is correctly forwarding nan email accusation to your app.
-
Monitor nan Logs: Check nan runtime logs successful your DigitalOcean App Platform dashboard to guarantee nan app processes nan JSON payload. We covered really to entree runtime logs successful Day 9.
-
Verify Spaces Upload: Visit your DigitalOcean Space to corroborate that nan files were uploaded successfully. You should spot nan attachments successful your bucket.
-
Check nan Final Output: The app should log nan extracted accusation and nan attachment URLs. These logs will include:
- Details extracted from nan email body.
- Public URLs for nan uploaded attachments.
Refer to Day 9 for tips connected inspecting runtime logs.
By nan extremity of these steps, your workflow will beryllium caller to prevention accusation to a database, which we’ll tackle next.
🎁 Wrap-Up
Today, we taught your app to grip attachments for illustration a pro. Here’s what we did:
- Created a DigitalOcean Space for secure, scalable storage.
- Decoded Base64-encoded attachments from Postmark JSON.
- Ensured unsocial filenames pinch uuid.
- Uploaded attachments to DigitalOcean Spaces utilizing boto3.
- Generated nationalist URLs for each file, caller to beryllium utilized successful your receipt processor.
Up next, we’ll merge this accusation into a database. This will fto you to shop extracted email specifications and attachment URLs for semipermanent use, making your receipt processor moreover overmuch powerful. Stay tuned!