AWS SQS Configuration refresher
Quick reference for AWS SQS queue configurations
FIFO vs Standard Queue
SQS offers two queue types:
- Standard Queue: This is the default queue type. This provides nearly unlimited throughput with at-least-once message delivery. However, duplicates may occur, and message order is not guaranteed. This queue is ideal for applications where message order is not critical, but high throughput and fast delivery are priorities.
- FIFO Queue: Ensures messages are processed exactly once and in the order they were sent, eliminating duplicates. FIFO queues have lower throughput but are crucial for applications where message order and data integrity are essential.
Dead Letter Queue
SQS processes messages asynchronously, so when a message fails, it isn’t immediately apparent. A Dead Letter Queue (DLQ) is a special type of SQS queue used to handle these failures by capturing messages that couldn’t be processed after several attempts. Messages sent to the DLQ can be reviewed for debugging and troubleshooting purposes. Common reasons for message failure include invalid data or insufficient processing time by consumers. When you designate a queue to be a source queue, a DLQ is not created automatically. You must first create a queue to designate as the DLQ. DLQ queue type (standard or FIFO) must match the source queues. You can associate the same DLQ with more than one source queue.
Tip: to manage failed messages, it’s advisable to set alarms on the DLQ, such as monitoring the queue size, to track error rates and receive alerts for potential issues.
To send messages to a DLQ, we must configure a redrive policy on the main queue, specifying:
- MaxReceiveCount: The number of retry attempts before moving the message to the DLQ.
- DLQ association: The DLQ that will receive failed messages.
The Maximum receives value determines when a message will be sent to the DLQ. If the ReceiveCount for a message exceeds the maximum receive count for the queue, Amazon SQS moves the message to the associated DLQ (with its original message ID).
Dead Letter Queues are crucial for preventing repeatedly failed messages from blocking the queue and degrading system performance.
Visibility Timeout
When working with SQS, it’s important to understand the lifecycle of a message. When a consumer retrieves a message from an SQS queue, the message is not immediately deleted. Instead, it becomes temporarily invisible to other consumers for a set period, known as the visibility timeout. During this time, the consumer processes the message and must explicitly delete it using the DeleteMessage
API call. If the consumer fails to delete the message within the visibility timeout, the message becomes visible again, allowing another consumer to pick it up for processing.
The visibility timeout ensures that unprocessed messages don’t remain stuck in the queue. Each time a message becomes visible again, a retry counter increases. After a certain number of retries, the message can be moved to a Dead Letter Queue (DLQ) for further inspection.
Setting an appropriate visibility timeout is key to efficient message handling. The timeout should be long enough to allow the consumer to process and delete the message, but not so long that failures are delayed.
Tip: We should set the timeout to the maximum time it is expected for the consumer to process and delete a message. For example, when using AWS Lambda, it’s recommended to set the visibility timeout to the Lambda function’s execution timeout.
The visibility timeout begins when Amazon SQS returns a message. If the consumer fails to process and delete the message before the visibility timeout expires, the message becomes visible to other consumers. If a message must be received only once, your consumer must delete it within the duration of the visibility timeout.
- The Default Visibility Timeout is 30 seconds.
- Visibility Timeout can be increased if your task takes more than 30 seconds.
- The maximum Visibility Timeout is 12 hours.
Batch Processing
SQS supports batch message operations, allowing you to send, receive, and delete messages in batches of up to 10 messages per API call. This can greatly reduce the number of API requests, improving efficiency and lowering costs, especially when dealing with large volumes of messages.
Batch processing is ideal for high-throughput applications where multiple messages can be handled together without needing individual acknowledgments.
Retention Period
The message retention period defines how long SQS stores messages in the queue before they are automatically deleted, even if they haven’t been processed. The default retention period is 4 days, but we can configure it anywhere between 1 minute to 14 days.
For applications where immediate processing is crucial, shorter retention periods might be more appropriate. However, if our application experiences occasional downtime or latency, you may want to increase the retention period to ensure no messages are lost.
Tip: We should choose a message retention period based on our application’s tolerance for delays and recovery capabilities.
Message Delay
Message delays allow us to control when messages become visible to consumers. We can set message timers on individual messages to make them invisible for a specified time upon arrival. For scenarios where all messages should be delayed, delay queues can be configured to apply a uniform delay (up to 15 minutes) on every message added to the queue.
Delay queues and message timers are useful for delayed or batch processing, such as event-driven architectures or workflows requiring throttled execution. These features help manage message timing, ensuring tasks are handled at the right moment.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
Receive Message Wait Time / Long Polling
By default, SQS uses short polling, which continuously polls the queue for messages, even when the queue is empty, potentially leading to unnecessary API requests and additional cost. Long polling (set ReceiveMessageWaitTimeSeconds
to greater than 0) reduces this overhead by allowing the consumer to wait until a message is available, or for a timeout period to expire.
Enabling long polling helps reduce the number of empty responses and lowers costs by decreasing the number of API requests when no messages are available.
Short polling occurs when the WaitTimeSeconds parameter of a ReceiveMessage request is set to 0.
Some numbers to consider for SQS
- SQS is pull-based, not push-based.
- Messages are 256 KB in size.
- Messages are kept in a queue from 1 minute to 14 days.
- The default retention period is 4 days.
- It guarantees that your messages will be processed at least once.
- Delay queue — The default (minimum) delay for a queue is 0 seconds. The maximum is 15 minutes.
- Inflight messages per queue — For most standard queues , there can be a maximum of approximately 120,000 inflight messages (received from a queue by a consumer, but not yet deleted from the queue). You can request a limit increase.
- For FIFO queues, there can be a maximum of 20,000 inflight messages (received from a queue by a consumer, but not yet deleted from the queue).
- Queue name — A queue name can have up to 80 characters. The following characters are accepted: alphanumeric characters, hyphens, and underscores. Queue names are case-sensitive.
- The name of a FIFO queue must end with the .fifo suffix. The suffix counts towards the 80-character queue name limit.
- Message attributes — A message can contain up to 10 metadata attributes.
- Message batch — A single message batch request can include a maximum of 10 messages.
- Message throughput — Standard queues support a nearly unlimited number of transactions per second (TPS) per action.
- By default, FIFO queues support up to 3,000 messages per second with batching.
- FIFO queues support up to 300 messages per second (300 send, receive, or delete operations per second) without batching.
- Message visibility timeout — The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.
- Long polling wait time — The maximum long polling wait time is 20 seconds.
Points to note
- Message order is not guaranteed. Use a FIFO queue if order is important.
- Avoid very small Retention Period
- Adjust Visibility Timeout based on your consumer capacity
- Don’t use Lambda reserved concurrency to control throughput
- Only send messages that the consumer can process
- If possible batch messages before sending to SQS
- Don’t send too many messages to
SendMessageBatchCommand
or batch too big messages - Don’t forget to handle
SendMessageBatchCommand
results
Happy Queueing!!