AWS Kinesis Data Firehose- KDF
AWS Kinesis Data Firehose- KDF
Amazon Kinesis Data Firehose, a fully managed service that delivers real-time streaming data, is available from Amazon Kinesis.
Kinesis Data Firehose, a fully managed service, automatically scales to meet the data throughput and does not require ongoing administration.
Data transfer solution to deliver real-time streaming data to destinations like S3, Redshift, Elasticsearch, and Splunk.
It is NOT Real Time, but Near Real Time because it supports batching and buffers streaming information to a specified size (Buffer Size (MBs)) before delivering it at destinations.
Supports batching, compression and encryption of data before loading it. This reduces storage requirements at the destination and increases security
Data compression is used to reduce the storage required at the destination. It currently supports GZIP and ZIP compression formats. If the data is further loaded into Redshift, only GZIP will be supported.
Supports data at rest encryption using KMS once the data has been delivered to the S3 bucket.
Multiple producers can be supported as datasources. These include Kinesis Datastream, Kinesis Agent or the Kinesis Data Firehose API using AWS SDK. CloudWatch Logs, CloudWatch Events or AWS IoT.
Supports both out-of-box data transformation and custom transformation using Lambda function. To transform incoming source data, and deliver the transformed data back to destinations
Supports source record backup with custom-data transformation with Lambda. Kinesis Data Firehose will deliver un-transformed incoming data into a separate S3 bucket.
Data delivery requires at least one semantics. Firehose can deliver duplicates in rare situations, such as request timeout after data delivery attempts.
Interface VPC endpoint (AWS Private Link), to prevent traffic from leaving the Amazon network between the VPC/Kinesis Data Firehose. Interface VPC endpoints do not require an internet gateway, VPN connection, NAT device, VPN connection or AWS Direct Connect connection.
Kinesis Key Concepts
Kinesis Data Firehose delivery streamUnderlying entity for Kinesis Data Firehose, where data is sent
RecordData sent from data producer to a Kinesis Data Firehose Delivery stream
The maximum size of a record before Base64-encoding is 1024 KB.
Data producerProducers send records directly to Kinesis Data Firehose delivery channels.
Buffer size and buffer intervalKinesis Firehose buffers streaming data for a specified size or for a specific time period before it is delivered to destinations
During the creation of the delivery stream, you can set the buffer size and interval.
Buffer size is measured in MBs. It ranges from 1MB up to 128MB for destination S3 and 1MB up to 100MB for destination Elasticsearch Service.
Buffer intervals are measured in seconds and can range from 60 secs up to 900 secs
Firehose dynamically increases buffer size to catch up and ensure that all data is delivered at destination.
Before compression, a buffer size is applied
DestinationA destination is where the data will arrive.
Splunk, Elasticsearch, Redshift, and Elasticsearch are supported as destinations. Kinesis Data Streams vs Kinesis Firehose
Refer blog Kinesis Data Streams vs Kinesis Firehose
Questions for AWS Certification Exam Practice
A user is creating a new service that receives updates from 3600 rental cars per hour. An Amazon S3 bucket must be used to upload the cars location. Also, each location must be verified for distance to the original rental location. What services will automatically scale the updates? Amazon EC2 or Amazon EBS
Amazon Kinesis Firehose, Amazon S3
Amazon ECS and Amazon RDS
Amazon S3 Events and AWS LambdaYou will need to run ad-hoc SQL queries for mas