Building an Event-driven Architecture using AWS Lambda & DataChannel
In March this year, after introducing data orchestration with DataChannel we added two new data analytics nodes: ‘Tableau + Power BI’. Now, we’re excited to announce another powerful addition: the ‘Lambda Function’ node. This new feature allows you to trigger any Lambda function in your directory directly from the DataChannel console, giving you control to automate your data flow and manage events within your Amazon S3 buckets.
If you're new to data orchestration and want to learn more about Lambda functions and how they gel up with DataChannel, this blog is for you. By the end of this blog, you’ll have a clear understanding and who knows, you might even start using DataChannel for your own data orchestration needs. Let’s begin with a quick introduction to AWS Lambda, serverless computing, why serverless matters, and some common use cases.
What is an AWS Lambda Function?
Lambda functions are a way of automatically triggering a function in response to an event, that event could be a new data object or point addition to an S3 bucket. Lambda functions are serverless, meaning you don't have to worry about computing power, automatic scaling, server maintenance & infrastructure needs, detailed logging, or system capacity provisioning – AWS handles it all through Lambda.
Why Serverless Computing?
Serverless computing is an auto-scaling technology that enables developers to use backend services without worrying about managing multiple on-premise servers. With serverless computing, you only pay for the services you use, without any extra charges for increased demand or activity. An analogy that could help better understand serverless computing is that it is like switching from a cell phone data plan with a monthly fixed limit to one that only charges for each data byte.
Advantages of Serverless Computing
Lower costs: Serverless computing is generally very cost-effective, as traditional cloud providers of backend services (server allocation) often result in the user paying for unused space or idle CPU time.
Scalable: Developers using serverless architecture don’t have to worry about code maintenance and scalability. The serverless vendor handles all of the scalings on demand.
Quicker turnaround: Serverless architecture can significantly cut time to market. Instead of going through a complex deployment process to release bug fixes and new features, developers can update and change code on the fly.
Serverless & FaaS (Function-as-a-Service)
FaaS and Serverless are more or less the same, while serverless involves a cloud provider maintaining all the backend services and infrastructure-related requirements required to run your application (code).
FaaS (Function as a service) is used to refer to a platform providing serverless architecture deployment, orchestration and management - AWS Lambda, Google Functions, Azure Functions etc. In simpler terms, a serverless function is a code of your application that is used to execute certain actions.
Serverless Platform: AWS Lambda
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services (AWS). It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. It again works on serverless technology and there is no need to worry about any server-related requirement, you just simply put the code on Lambda and run it.
How does AWS Lambda work?
The code that you want Lambda to run is known as Lambda Function. The function responds to the “Events” such as object uploads to Amazon S3 bucket, updates to dynamo DB table, in-app purchases, etc. For example, these “Events” can be ‘Create Events’ like PUT, POST, COPY objects from or to S3 Bucket.
You can create a Lambda function by uploading your code as function or write one on you own within the AWS Console.
AWS Lambda Components
AWS lambda’s main purpose is to create applications based on events that have the possibility of being activated by various AWS events.
If the situation of having multiple concurrent events arises, AWS Lambda will trigger multiple copies of the function, making Lambda a Function of Service (FaaS) type.
Its three main components are:
Function: this is where the actual code that performs the task lives. Currently Lambda supports more than 15 runtimes with new additions planned to come up sometime later in 2024.
Configuration: This component specifies how the function is to be executed.
Event Source: this is the event that triggers the function and can be triggered by multiple AWS services or a third-party service. It is an optional component; it does not have to be added in all cases.
Log streams: Lambda monitors our function automatically and one can view its metric on CloudWatch, directly, but we can also code our function in a way that it provides us custom logging statements to let us analyze the flow of execution and performance of our function to check if it’s working properly.
AWS Lambda Use Cases
Let’s take a simple use case of uploading an image to Amazon S3 and resizing the image (using the Lambda function) to support mobile, tablet, and desktop devices.
‘File upload to S3’ event triggers the Lambda function & then executes the code to resize the uploaded image.
Here are the steps:
✓ A user uploads an image using the web or mobile app. The image is saved in the Amazon S3 bucket.
✓ ‘Create Event’ is triggered once the image upload to S3 is successful.
✓ The event calls/invokes the configured Lambda function.
✓ The code inside the lambda function is executed for resizing the images.
✓ The resized images will be stored in S3.
Task automation
With its event-driven model and flexibility, AWS Lambda is a great fit for automating various business tasks that don’t always require an entire server. This might include running scheduled jobs that perform cleanup in your infrastructure, processing data from forms submitted on your website, or moving data around between different data stores on demand.
Event-driven ETL Architecture
Let’s say a website’s order data is stored directly in an operational database: Amazon DynamoDB. Using Lambda functions (which will run in response to a new order-related transactional entry) this data can be transformed into a structured format suitable for the data warehouse: Amazon Redshift from which the data can be further moved into any BI tool in the form of tables or views.
What’s Next?
In our next blog, we’ll talk about how we have built our integration with the Lambda function and how it enables our users to automate event-driven tasks via DataChannel. We’ll deep dive into the use cases that can be covered using this integration while leveraging the same functionalities as lambda’s management console.