- Print
Glue ETL Script Example for Loading Tulip Table Data
Query Tulip Tables with a Glue ETL Script to simplify moving data from Tulip to Redshift (Or other data clouds)
Purpose
This script provides a simple starting point for querying data on Tulip Tables and moving to Redshift or other Data Warehouses
High-level Architecture
This high-level architecture can be used to query data from Tulip's Tables API and then saved to Redshift for further analytics and processing.
Example Script
The example script below shows how to query a single Tulp Table with Glue ETL (Python Powershell) and then write to Redshift. NOTE: For scaled production use cases, writing to a temporary S3 bucket and then copying the bucket contents to S3 is recommended instead. Additionally, the credentials are saved via AWS Secrets Manager.
Scale Considerations
Consider using S3 as an intermediary temporary storage to then copy data from S3 to Redshift instead of writing it directly to Redshift. This can be more computationally efficient.
Additionally, you can also use metadata to write all Tulip Tables to a Data Warehouse instead of one-off Tulip Tables
Finally, this example script overwrites the entire table each time. A more efficient method would be to update rows modified since the last update or query.
Next Steps
For further reading, please check out the Amazon Well-Architected Framework. This is a great resource for understanding optimal methods for data flows and integrations