Skip to main content

Pipelines

Pipelines consist of:

  • Trigger: an email sent to a Dropmail address that contains a CSV or Excel file attachment
  • File extraction options: which file name and file type to extract from the email
  • File opening options: how to open the file, including file delimiter, file headers, and rows to skip
  • Column mapping: map the incoming columns from the file to the destination columns in the database
  • Cleaning step: column mapping and any additional custom transformation code using Python and Pandas
  • Load step: how to import the data to your database, including append, overwrite, and custom code options

Trigger

Pipelines are triggered automatically when you send an email to your Dropmail address with a file attachment that's either a CSV or an Excel file.

File extraction options

File extraction options specify the type of file (CSV or Excel) and which file to extract from the email (if there are more than one). For emails that only contain one attachment, you can simple select to match by single file attachment. If your email contains more than one file or if you only want to trigger a pipeline for files with specific file names, use the match file using regex option.

Regex

When using regex, you must use raw string notation. In the backend, we use the Python 'r' prefix to match patterns. This means that backslashes are treated as literal characters and don't require additional escaping. This makes regex patterns easier to read, write, and avoids confusion.

For example, to match a filename that contains a date formatted as YYYY-MM-DD you can enter the following regex: \d{4}-\d{2}-\d{2}. Note that you do not have to espace the backslashes. You also don't need any quotes or other characters. A file named sales_reports_2023-01-31.csv would result in a match, and therefore, the file will be extracted and processed via your Dropmail pipeline.

File opening options

Specify what data to extract from the file and how to extract it. By default, Dropmail opens CSV/Excel files into a Pandas dataframe with the user-supplied options. After reading the file, we will try to infer data types using the dtype parameter in the read_csv() or read_excel() functions. You can specify custom delimiters, different header rows, or select the rows to skip. Custom options are particularly helpful in cases where the data in your file doesn't start in the first row, or if the header rows and data rows in your file are in different locations (not adjacent to each other). Custom options must be specified in JSON format, following the read_csv() or read_excel() functions parameters.

For example, if you want to extract data with the header row and data row as indicated in the screenshot, you could write a custom option as: {"skiprows":[0,1,3]}

File opening options

Column mapping

Specify mapping between the file columns and the columns in your destination database. By default, we provide 1-1 mapping assuming the column names in the incoming file match 1-1 to the destination columns. You can edit the mapping here to map the columns you need. Note that the file columns need to be correctly spelled for mapping to occur. Typos in the file column names result in pipeline not running on trigger.

Cleaning step

This step allows you to further perform data transformations by writing custom code. You can use any Pandas dataframe API and you can access the current dataframe context by using df. You can also use Python code to perform data wrangling.

For example, to find any value of "basic" and replace it with "standard" in the "plan" column, you can write the following cleaning code: df['plan'] = df['plan'].replace('basic', 'standard'). Note that you have to refer to column names in the dataframe using its name post-column-mapped name, since that will be how the column is stored in the dataframe.

Load step

Specify how you want to import data to your database. By default, pipeline load step is set to "Append"

Append

Append (insert) rows to your database.

Overwrite

OVerwrites your exiting rows, except for your database column names.

Custom SQL

Write your own import code. This step is useful if you need to upsert, merge, or specific a custom set of conditions for data import to your database. You can access values by using the :column_name notation. This is a convenient way to refer to the data context, without having to write each row value into this step. You must also specify the full path to your database tables. For example, to upsert (insert and update) you can use this code:

INSERT INTO schema.table (film_id, title, description, release_year, language_id, rental_duration)
VALUES(:film_id, :title, :description, :release_year, :language_id, :rental_duration)
ON CONFLICT (film_id)
DO UPDATE
SET title = EXCLUDED.title,
description = EXCLUDED.description
release_year = EXCLUDED.release_year,
language_id = EXCLUDED.language_id;