Getting AWS S3 Data into Honeycomb | Honeycomb

We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Getting AWS S3 Data into Honeycomb

Honeycomb provides an agentless integration for ingesting S3 -based data. The integration runs as one or more Lambda functions, subscribed to PutObject events on your bucket.

The source is available on Github and instructions for getting started are provided here. Do you have a use case not covered here? Please open an issue.

Prerequisites  🔗

You will need permission to deploy a Cloudformation stack with an IAM role in your AWS account. You will also need permission to edit your S3 bucket events configuration.

Install  🔗

To install, use one of the AWS Cloudformation Quick-Create links below. These links will launch the AWS Cloudformation console with the appropriate template and steer you through the installation process.

Cloudformation Stack Creation

Generic JSON Integration  🔗

This integration accepts lines with arbitrary JSON. If you are already writing structured logs in JSON format, this is what you want! Click Here

You will need to provide the following parameters:

  • Stack Name
  • S3 Bucket Name
  • Your Honeycomb API Key (optionally encrypted)
  • Honeycomb Dataset Name

Optionally, you can supply:

  • Sample rate
  • The ID of the AWS Key Management Service key used to encrypt your API Key. If your API Key is not encrypted, do not set a value here

Example Log Format

The integration expects each line in your S3 file to contain a JSON object and nothing else.

{"field1": "data1", "field2": "data2", "field3": 12345, "field4": {"field5": false}}
{"field1": "data1", "field2": "data2", "field3": 12345, "field4": {"field5": false}}

Bucket Logs Integration  🔗

Bucket Logs are access logs for your S3 bucket. To gain insight into how your S3 buckets are accessed, you can enable access logs on one or more buckets, and then configure the Bucket Logs Integration to send that log data into Honeycomb. Click Here

You will need to provide the following parameters:

  • Stack Name
  • S3 Bucket Name (this is the bucket that receives your access logs from other buckets)
  • Your Honeycomb API Key (optionally encrypted)
  • Honeycomb Dataset Name

Optionally, you can supply:

  • Sample rate
  • The ID of the AWS Key Management Service key used to encrypt your API Key. If your API Key is not encrypted, do not set a value here

Subscribing to Bucket Events  🔗

After installing the S3 integration, you will need to configure your bucket to trigger the Lambda after each PutObject event. To do this, access the S3 Console and follow these steps.

From the S3 console, select the bucket that you want to subscribe to and select Properties:

S3 Console Bucket Properties

Find Advanced Settings and click Events:

S3 Console Advanced Settings

Enable events Put and Complete Multipart Upload and select the Lambda function belonging to the Honeycomb S3 integration. If you have multiple integrations, remember to select the integration belonging to the stack that has permissions to access your bucket. You can optionally set a prefix and suffix, if you only want a subset of objects to be processed by the integration. This is recommended if the bucket has multiple uses.

S3 Console Enable Events

Encrypting Your API Key  🔗

When installing the integration, you must supply your Honeycomb API Key via Cloudformation parameter. Cloudformation parameters are not encrypted, and are plainly viewable to anyone with access to your Cloudformation stacks or Lambda functions. For this reason, we strongly recommend that your Honeycomb API Key be encrypted. To encrypt your key, use AWS’s KMS service.

First, you’ll need to create a KMS key if you don’t have one already. The default account keys are not suitable for this use case.

$ aws kms create-key --description "used to encrypt secrets"
{
    "KeyMetadata": {
        "AWSAccountId": "123455678910",
        "KeyId": "a38f80cc-19b5-486a-a163-a4502b7a52cc",
        "Arn": "arn:aws:kms:us-east-1:123455678910:key/a38f80cc-19b5-486a-a163-a4502b7a52cc",
        "CreationDate": 1524160520.097,
        "Enabled": true,
        "Description": "used to encrypt Honeycomb secrets",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER"
    }
}
$ aws kms create-alias --alias-name alias/secrets_key --target-key-id=a38f80cc-19b5-486a-a163-a4502b7a52cc

Now you’re ready to encrypt your Honeycomb API Key:

$ aws kms encrypt --key-id=a38f80cc-19b5-486a-a163-a4502b7a52cc --plaintext "thisismyhoneycombkey"
{
    "CiphertextBlob": "AQICAHge4+BhZ1sURk1UGUjTZxmcegPXyRqG8NCK8/schk381gGToGRb8n3PCjITQPDKjxuJAAAAcjBwBgkqhkiG9w0BBwagYzBhAgEAMFwGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQM0GLK36ChLOlHQiiiAgEQgC9lYlR3qvsQEhgILHhT0eD4atgdB7UAMW6TIAJw9vYsPpnbHhqhO7V8/mEa9Iej+g==",
    "KeyId": "arn:aws:kms:us-east-1:702835727665:key/a38f80cc-19b5-486a-a163-a4502b7a52cc"
}

Record the CiphertextBlob and the last part of the Key ID (example: a38f80cc-19b5-486a-a163-a4502b7a52cc) - this is what you’ll pass to the Cloudformation templates.

Troubleshooting  🔗

Integration Logs  🔗

The S3 integration is just a normal Lambda function, which means you can see its metrics and log messages from the Lambda Console. Look for functions starting with S3LambdaHandler. From there, you can view error rate, latency, and Cloudwatch logs.

Missing Events  🔗

If you are getting some events, not all (and are not sampling), your S3 files may be too large to process inside the maximum Lambda runtime of 5 minutes. Some possible solutions:

  • Increase the LambdaMemorySize parameter in the stack creation screen. Lambda increases CPU proportionally with reserved memory, and allocating more CPU can allow the integration to process more data in less time.
  • Send smaller files, more frequently. Lambda is meant to scale horizontally - it can handle lots of smaller log files better than a few large ones.

Updating/Redeploying  🔗

If you are trying to pick up a newer version of the integration, or have misconfigured an existing installation, it is better to completely delete the CFN stack and re-create it using the quick-create links.

Advanced Use  🔗

Quick-create links are great for getting started, but if you have an existing workflow for configuring infrastructure, you might want to directly configure the Lambdas yourself to suit your needs. We’ve provided example templates for Cloudformation and Terraform in our repository to get you started.