Sending Emails with CSV Attachments from Databricks

Dilorom
3 min readJan 14, 2023
Photo by Serhat Beyazkaya on Unsplash

This demo post illustrates the process of sending an email from Databricks with a CSV attachment from DBFS (Databricks File System).

If you work as a data analyst or data engineer, you may have been asked how to send automated emails in CSV format from Databricks on a daily or weekly schedule. While there is no built-in option for this in Databricks, it can be achieved through external libraries such as

- the Python library SMTP,

- Azure PowerAutomate Flow for lightweight tasks, and

- Azure LogicApp for more complex tasks.

This notebook will demonstrate how to send automated emails in CSV format from Databricks on a daily schedule.

# Email parameters
sender_email = "sender_email@outlook.com"
receiver_email = "reciever_email@gmail.com"
password = ""

After testing, you can securely store and retrieve your passwords in Azure KeyVault or any other key vault services, where access can be restricted to only authorized individuals or groups.

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders

# Attachment parameters
attachment_path = "/dbfs/FileStore/shared_uploads/your_account/Daily_Vehicles_Test.csv"

# Create a multipart message and set headers
message = MIMEMultipart()
message["From"] = sender_email
message["To"] = receiver_email
message["Subject"] = "CSV Attachment Test"
# Customize the body message
body = 'Hello, please find the daily view of the vehicles in the attachment.'

# convert the body to a MIME compatible string
body = MIMEText(body)
# attach it to your main message
message.attach(body)

# Open the attachment file in bynary
with open(attachment_path, "rb") as attachment:
# Add file as application/octet-stream
part = MIMEBase("application", "octet-stream")
part.set_payload((attachment).read())

# Encode file in ASCII characters to send by email
encoders.encode_base64(part)

# Add header with pdf name
part.add_header(
"Content-Disposition",
f"attachment; filename={attachment_path}",
)

# Add attachment to message and convert message to string
message.attach(part)
text = message.as_string()

To customize the body message of the email, you can use string formatting techniques or a template engine to insert dynamic content into the message. You can also use variables or placeholders to represent the dynamic content and replace them at runtime. It is important to ensure that the message is well-formatted, clear and professional.

The same customization techniques can also be applied to the email subject and attachment file name, allowing for dynamic and personalized messaging.

  1. Using Outlook — Make sure that the sender_email is from Outlook
# Connect to the Outlook SMTP server and send the email
with smtplib.SMTP("smtp-mail.outlook.com", 587) as server:
server.starttls()
server.login(sender_email, password)
server.sendmail(sender_email, receiver_email, text)
From Outlook to Gmail

2. Using Gmail — Make sure that the sender_email is from Gmail

# Log in to Gmail SMTP server using secure context and send email
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, password)
server.sendmail(sender_email, receiver_email, text)

Note: To use Gmail in this process, you will need to create an App password, which is different from your regular login password. See references here and here.

From Gmail to Outlook

To schedule this email on a daily or weekly basis, you can either click on the “Schedule” button in the top right corner of the Databricks Notebook or go to the Jobs section in Databricks and set up a job with a task.

In conclusion, this demo illustrates the process of sending automated emails with a CSV attachment from Outlook and Gmail using the Python SMTP library in Databricks. The guide includes information on how to customize the email subject, body, and attachment name, as well as how to securely retrieve the password using Azure KeyVault. This is a useful tool for data analysts and engineers who need to send automated emails with CSV attachments from Databricks on a daily or weekly schedule.

--

--