Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timeouts more gracefully by allowing the application to shutdown #895

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions couscous.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ menu:
text: Databases
url: /docs/environment/database.html
title: Using a database from AWS Lambda
timeouts:
text: Timeouts
url: /docs/environment/timeouts.html
title: Configure and handle timeouts
custom-domains:
text: Custom domains
url: /docs/environment/custom-domains.html
Expand Down
2 changes: 2 additions & 0 deletions docs/environment/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ When possible, an alternative to NAT Gateways is to split the work done by a lam

Finally, another free alternative to NAT Gateway is to access AWS services by creating "*private VPC endpoints*": this is possible for S3, API Gateway, [and more](https://docs.aws.amazon.com/en_pv/vpc/latest/userguide/vpc-endpoints-access.html).

Read more in the section about [timeouts](/docs/environment/timeouts.md).

## Creating a database

On the [RDS console](https://console.aws.amazon.com/rds/home):
Expand Down
63 changes: 63 additions & 0 deletions docs/environment/timeouts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Timeouts
current_menu: timeouts
introduction: Configure and handle timeouts.
---

When a Lambda function times out, it is like the power to the computer is suddenly
just turned off. This does not give the application a chance to shut down properly.
This leaves you without any logs and the problem could be hard to fix.

To allow your application to shut down properly and write logs, Bref can throw an exception just before the Lambda times out.

> Note, this feature is experimental and available since Bref 1.3.

To enable this feature **in `php-XX` layers**, set the environment variable `BREF_FEATURE_TIMEOUT`:

```yaml
provider:
environment:
BREF_FEATURE_TIMEOUT: 1
```

To enable this feature **in `php-XX-fpm` layers**, call `Timeout::enableInFpm()` in your application.
For example in `index.php`:

```php
if (isset($_SERVER['LAMBDA_TASK_ROOT'])) {
\Bref\Timeout\Timeout::enableInFpm();
}
```

Whenever a timeout happens, a full stack trace will be logged, including the line that was executing.

In most cases, it is an external call to a database, cache or API that is stuck waiting.
If you are using a RDS database, [you are encouraged to read this section](database.md#accessing-the-internet).

## Catching the exception

You can catch the timeout exception to perform some cleanup, logs or even display a proper error page.

In `php-XX-fpm` layers, most frameworks will catch the `LambdaTimeout` exception automatically (like any other error).

In `php-XX` layers, you can catch it in your handlers. For example:

```php
use Bref\Context\Context;
use Bref\Timeout\LambdaTimeout;

class Handler implements \Bref\Event\Handler
{
public function handle($event, Context $context)
{
try {
// your code here
// ...
} catch (LambdaTimeout $e) {
echo 'Oops, sorry. We spent too much time on this.';
} catch (\Throwable $e) {
echo 'Some other unexpected error happened.';
}
}
}
```
9 changes: 9 additions & 0 deletions runtime/layers/fpm/bootstrap
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ use Bref\Runtime\LambdaRuntime;
ini_set('display_errors', '1');
error_reporting(E_ALL);

/**
* Disable the Bref timeout feature in the main bootstrap process.
* Indeed it doesn't make sense here because the application code runs in the
* FPM worker process, not in this process (the bootstrap process). We don't want to
* terminate the bootstrap process while leaving the FPM worker running.
* That's why we force `BREF_FEATURE_TIMEOUT` to be disabled.
*/
putenv('BREF_FEATURE_TIMEOUT=0');

$appRoot = getenv('LAMBDA_TASK_ROOT');

if (getenv('BREF_DOWNLOAD_VENDOR')) {
Expand Down
17 changes: 15 additions & 2 deletions src/Runtime/LambdaRuntime.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
use Bref\Context\Context;
use Bref\Context\ContextBuilder;
use Bref\Event\Handler;
use Bref\Timeout\Timeout;
use Exception;
use Psr\Http\Server\RequestHandlerInterface;

Expand Down Expand Up @@ -42,19 +43,23 @@ final class LambdaRuntime
/** @var Invoker */
private $invoker;

/** @var bool */
private $enableTimeout;

public static function fromEnvironmentVariable(): self
{
return new self((string) getenv('AWS_LAMBDA_RUNTIME_API'));
return new self((string) getenv('AWS_LAMBDA_RUNTIME_API'), (bool) getenv('BREF_FEATURE_TIMEOUT'));
}

public function __construct(string $apiUrl)
private function __construct(string $apiUrl, bool $enableTimeout)
{
if ($apiUrl === '') {
die('At the moment lambdas can only be executed in an Lambda environment');
}

$this->apiUrl = $apiUrl;
$this->invoker = new Invoker;
$this->enableTimeout = $enableTimeout;
}

public function __destruct()
Expand Down Expand Up @@ -96,6 +101,12 @@ public function processNextEvent($handler): void
[$event, $context] = $this->waitNextInvocation();
\assert($context instanceof Context);

$remainingTimeInMillis = $context->getRemainingTimeInMillis();
if ($this->enableTimeout && 0 < $remainingTimeInMillis) {
// Throw exception one second before Lambda pulls the plug.
Timeout::enable($context->getRemainingTimeInMillis());
}

$this->ping();

try {
Expand All @@ -104,6 +115,8 @@ public function processNextEvent($handler): void
$this->sendResponse($context->getAwsRequestId(), $result);
} catch (\Throwable $e) {
$this->signalFailure($context->getAwsRequestId(), $e);
} finally {
Timeout::reset();
}
}

Expand Down
12 changes: 12 additions & 0 deletions src/Timeout/LambdaTimeout.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?php declare(strict_types=1);

namespace Bref\Timeout;

/**
* The application took too long to produce a response. This exception is thrown
* to give the application a chance to flush logs and shut it self down before
* the power to AWS Lambda is disconnected.
*/
class LambdaTimeout extends \RuntimeException
{
}
86 changes: 86 additions & 0 deletions src/Timeout/Timeout.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<?php declare(strict_types=1);

namespace Bref\Timeout;

/**
* Helper class to trigger an exception just before the Lambda times out. This
* will give the application a chance to shut down.
*/
final class Timeout
{
/** @var bool */
private static $initialized = false;

/**
* Automatically setup a timeout (based on the AWS Lambda timeout).
*
* This method can only be called when running in PHP-FPM, i.e. when using a `php-XX-fpm` layer.
*/
public static function enableInFpm(): void
{
if (! isset($_SERVER['LAMBDA_INVOCATION_CONTEXT'])) {
throw new \LogicException('Could not find value for bref timeout. Are we running on Lambda?');
}

$context = json_decode($_SERVER['LAMBDA_INVOCATION_CONTEXT'], true, 512, JSON_THROW_ON_ERROR);
$deadlineMs = $context['deadlineMs'];
$remainingTimeInMillis = $deadlineMs - intval(microtime(true) * 1000);

self::enable($remainingTimeInMillis);
}

/**
* @internal
*/
public static function enable(int $remainingTimeInMillis): void
{
self::init();

$remainingTimeInSeconds = (int) floor($remainingTimeInMillis / 1000);

// The script will timeout 1 second before the remaining time
Nyholm marked this conversation as resolved.
Show resolved Hide resolved
// to allow some time for Bref/our app to recover and cleanup
$margin = 1;

$timeoutDelayInSeconds = max(1, $remainingTimeInSeconds - $margin);

// Trigger SIGALRM in X seconds
pcntl_alarm($timeoutDelayInSeconds);
}

/**
* Setup custom handler for SIGALRM.
*/
private static function init(): void
{
if (self::$initialized) {
return;
}

if (! function_exists('pcntl_async_signals')) {
trigger_error('Could not enable timeout exceptions because pcntl extension is not enabled.');
return;
}

pcntl_async_signals(true);
// Setup a handler for SIGALRM that throws an exception
// This will interrupt any running PHP code, including `sleep()` or code stuck waiting for I/O.
pcntl_signal(SIGALRM, function (): void {
throw new LambdaTimeout('Maximum AWS Lambda execution time reached');
});

self::$initialized = true;
}

/**
* Reset timeout.
*
* @internal
*/
public static function reset(): void
{
if (self::$initialized) {
pcntl_alarm(0);
}
}
}
32 changes: 31 additions & 1 deletion tests/Runtime/LambdaRuntimeTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
use Bref\Event\Sqs\SqsHandler;
use Bref\Runtime\LambdaRuntime;
use Bref\Test\Server;
use Bref\Timeout\LambdaTimeout;
use GuzzleHttp\Psr7\Response;
use PHPUnit\Framework\TestCase;
use Psr\Http\Message\ResponseInterface;
Expand All @@ -35,13 +36,40 @@ protected function setUp(): void
{
ob_start();
Server::start();
$this->runtime = new LambdaRuntime('localhost:8126');
// Mock the API with our test server
putenv('AWS_LAMBDA_RUNTIME_API=localhost:8126');
$this->runtime = LambdaRuntime::fromEnvironmentVariable();
}

protected function tearDown(): void
{
Server::stop();
ob_end_clean();
// Cleanup
putenv('AWS_LAMBDA_RUNTIME_API=');
putenv('BREF_FEATURE_TIMEOUT=');
}

public function test Lambda timeouts can be anticipated()
{
// Re-create the runtime class with timeout prevention enabled
putenv('BREF_FEATURE_TIMEOUT=1');
$this->runtime = LambdaRuntime::fromEnvironmentVariable();

$this->givenAnEvent([]);

$start = microtime(true);
$this->runtime->processNextEvent(function () {
// This 10s sleep should be interrupted
sleep(10);
});

$elapsedTime = microtime(true) - $start;
// The Lambda timeout was 2 seconds, we expect the Bref timeout to trigger 1 second before that: 1 second
$this->assertEqualsWithDelta(1, $elapsedTime, 0.2);

$this->assertInvocationErrorResult(LambdaTimeout::class, 'Maximum AWS Lambda execution time reached');
$this->assertErrorInLogs(LambdaTimeout::class, 'Maximum AWS Lambda execution time reached');
}

public function test basic behavior()
Expand Down Expand Up @@ -348,6 +376,8 @@ private function givenAnEvent($event): void
[
'lambda-runtime-aws-request-id' => '1',
'lambda-runtime-invoked-function-arn' => 'test-function-name',
// now + 2 seconds
'lambda-runtime-deadline-ms' => intval((microtime(true) + 2) * 1000),
],
json_encode($event)
),
Expand Down
16 changes: 12 additions & 4 deletions tests/Server.php
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,13 @@ public static function stop()
if (self::$started) {
self::getClient()->request('DELETE', 'guzzle-server');
}
$tries = 0;
while (self::isListening() && ++$tries < 5) {
usleep(100000);
}
if (self::isListening()) {
throw new \RuntimeException('Unable to stop node.js server');
}
self::$started = false;
}

Expand All @@ -159,11 +166,12 @@ public static function start()
if (self::$started) {
return;
}
if (! self::isListening()) {
exec('node ' . __DIR__ . '/server.js '
. self::$port . ' >> /tmp/server.log 2>&1 &');
self::wait();
if (self::isListening()) {
throw new \Exception('Server is already running');
}
exec('node ' . __DIR__ . '/server.js '
. self::$port . ' >> /tmp/server.log 2>&1 &');
self::wait();
self::$started = true;
}

Expand Down
Loading