Logging Step Functions to CloudWatch
Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.
Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.
In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:
---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
StepFunctionExecRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: !Sub "states.${AWS::Region}.amazonaws.com"
Action:
- sts:AssumeRole
Path: "/"
Policies:
- PolicyName: StepFunctionExecRole
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- lambda:InvokeFunction
- lambda:ListFunctions
Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
- Effect: Allow
Action:
- logs:CreateLogDelivery
- logs:GetLogDelivery
- logs:UpdateLogDelivery
- logs:DeleteLogDelivery
- logs:ListLogDeliveries
- logs:PutResourcePolicy
- logs:DescribeResourcePolicies
- logs:DescribeLogGroups
Resource: "*"
MyStateMachineLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: /aws/stepfunction/my-step-function
RetentionInDays: 14
DashboardImportStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: my-step-function
StateMachineType: STANDARD
LoggingConfiguration:
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
IncludeExecutionData: True
Level: ALL
DefinitionString:
!Sub |
{
... JSON Step Function definition goes here
}
RoleArn: !GetAtt StepFunctionExecRole.Arn
The key pieces in this example are the second statement in the IAM Role
with all the logging permissions, the LogGroup defined by
MyStateMachineLogGroup
and the
LoggingConfiguration
section of the Step Function
definition.
The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.
The LogGroup
definition creates the log group in
CloudWatch. You can use what ever value you want for the
LogGroupName
. I followed the Amazon convention of
prefixing everything with /aws/[service-name]/
and then
appended the Step Function name. I recommend using the
RetentionInDays
configuration. It stops old logs
sticking around for ever. In my case I send all my logs to
ELK, so I don’t need to
retain them in CloudWatch long term.
Finally we use the LoggingConfiguration
to tell AWS
where we want to send out logs. You can only specify a single
Destinations
. The IncludeExecutionData
determines if the inputs and outputs of each function call is logged.
You should not enable this if you are passing sensitive information
between your steps. The verbosity of logging is controlled by
Level
. Amazon has a page on Step Function log
levels.
For dev you probably want to use ALL
to help with
debugging but in production you probably only need ERROR
level logging.
I removed the Parameters
and Output
from the template. Use them as you need
to.