How long is too long? The answer may surprise you!
tl;dr: It’s 30 minutes.
To support tasks that take a long time AWS Elastic Beanstalk provides Worker Environments to make developing applications which consume an SQS queue easier. There are a lot of benefits to this but it does come with some caveats if your jobs take longer than Beanstalk expects out of the box.
The worker tier is laid out like so.
Between SQS and the application are a daemon (sqsd) which reads from SQS and posts the message to your app through an nginx proxy. This provides some really nice abstractions for developing but when it comes to long running jobs the devil is in the timeouts.
SQS
Visibility timeout is the main setting here. It will need to be set to something greater than how long you expect processing to actually take. One notable omission out of the box for the worker environment is the ability to scale based on queue depth. Beanstalk supports creating additional resources via CloudFormation so this can be set up via a config file in the .ebextensions folder. (source)
Resources:
QueueDepthAlarmHigh:
Type: AWS::CloudWatch::Alarm
Properties:
Namespace: "AWS/SQS"
MetricName: ApproximateNumberOfMessagesVisible
Dimensions:
- Name: QueueName
Value: { "Fn::GetAtt" : ["AWSEBWorkerQueue", "QueueName"] }
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- Ref: ScaleOutPolicy
QueueDepthAlarmLow:
Type: AWS::CloudWatch::Alarm
Properties:
Namespace: "AWS/SQS"
MetricName: ApproximateNumberOfMessagesVisible
Dimensions:
- Name: QueueName
Value: { "Fn::GetAtt" : ["AWSEBWorkerQueue", "QueueName"] }
Statistic: Sum
Period: 300
EvaluationPeriods: 6
Threshold: 0
ComparisonOperator: LessThanOrEqualToThreshold
AlarmActions:
- Ref: ScaleInPolicy
ScaleOutPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName:
Ref: AWSEBAutoScalingGroup
ScalingAdjustment: 1
ScaleInPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName:
Ref: AWSEBAutoScalingGroup
ScalingAdjustment: -1
Nginx
For really long running jobs you may need to extend the proxy timeouts in nginx. You can do that with a config file placed in .ebextentions like so. (source)
files:
"/tmp/proxy.conf":
mode: "000644"
owner: root
group: root
content: |
proxy_connect_timeout 1800;
proxy_send_timeout 1800;
proxy_read_timeout 1800;
send_timeout 1800;
container_commands:
00-add-config:
command: cat /tmp/proxy.conf > /var/elasticbeanstalk/staging/nginx/conf.d/00_elastic_beanstalk_proxy.conf
01-restart-nginx:
command: /sbin/service nginx restart
SQSD
The next timeout you’ll want to visit is the inactivity timeout setting in the worker details. This determines how long the SQS daemon will wait for the application to respond for a given message. The caveat here is that the max value for this setting is 30 minutes. This means that if your jobs regularly take >30 minutes you are S.O.L. Of course you could construct the application to fork off from the request thread but then you lose the ability to properly respond with a success or failure for each message.
Conclusion
The Elastic Beanstalk Worker Environment is nice if it fits your workloads. If, like me, you have jobs that take longer than 30 minutes it might not be the best fit. In this case I ended up creating a Terraform module to create custom autoscaling groups and integrated directly with SQS in the application. I do however miss the decoupling of the application to SQS provided by Beanstalk. Luckily there are a number of open source alternatives to aws-sqsd that I may explore in the future.