AWS Data Ecosystem
The data lake
Amazon Simple Storage Service (Amazon S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 99.999999999% durability, and scale past trillions of objects worldwide.
Customers use S3 as primary storage for cloud-native applications; as a bulk repository, or “data lake,” for analytics; as a target for backup & recovery and disaster recovery; and with serverless computing.
– Can run SQL queries against the data
– The input data is stored in it’s original state
– Ability to attach meta data to objects stored in S3 (allows us to literally query and filter things like heart rate events for a specific class directly on the s3 store)
This allows us to operate as a schema on read data lake. This allows the data to build the app, rather than have the app constrain the data. We just provide an ecosystem that encourages this process. S3 is even more powerful with the ability to store meta data directly on the objects themselves. This means we can tag sensor events by the actual member that produced them, and query those results directly from the lake.
Price and calculator
US East (Ohio)
|Standard Storage||Standard – Infrequent Access Storage †||Glacier Storage|
|First 50 TB / month||$0.023 per GB||$0.0125 per GB||$0.004 per GB|
|Next 450 TB / month||$0.022 per GB||$0.0125 per GB||$0.004 per GB|
|Over 500 TB / month||$0.021 per GB||$0.0125 per GB||$0.004 per GB|
Amazon S3 request costs are based on the request type, and are charged on the quantity of requests or the volume of data retrieved as listed in the table below.
US East (Ohio)
|For Requests Not Otherwise Specified Below|
|PUT, COPY, POST, or LIST Requests||$0.005 per 1,000 requests|
|GET and all other Requests||$0.004 per 10,000 requests|
|Delete Requests||Free †|
|For Standard – Infrequent Access Requests|
|PUT, COPY, or POST Requests||$0.01 per 1,000 requests|
|GET and all other Requests||$0.01 per 10,000 requests|
|Lifecycle Transition Requests into Standard – Infrequent Access||$0.01 per 1,000 requests|
|Data Retrievals||$0.01 per GB|
|For Glacier Requests|
|Lifecycle Transition Requests into Glacier||$0.05 per 1,000 requests|
|Glacier Retrieval Fees||See Glacier Pricing Page|
The pricing below is based on data transferred “in” to and “out” of Amazon S3 (over either Direct Connect or the public Internet). Transfers between S3 buckets or from S3 to any service(s) within the same region are free.
US East (Ohio)
|Data Transfer IN To Amazon S3|
|All data transfer in||$0.000 per GB|
|Data Transfer OUT From Amazon S3 To|
|Amazon EC2 in the same region||$0.000 per GB|
|US East (N. Virginia)||$0.010 per GB|
|Another AWS Region||$0.020 per GB|
|Amazon CloudFront||$0.000 per GB|
|Data Transfer OUT From Amazon S3 To Internet|
|First 1 GB / month||$0.000 per GB|
|Up to 10 TB / month||$0.090 per GB|
|Next 40 TB / month||$0.085 per GB|
|Next 100 TB / month||$0.070 per GB|
|Next 350 TB / month||$0.050 per GB|
|Next 524 TB / month||Contact Us|
|Next 4 PB / month||Contact Us|
|Greater than 5 PB / month||Contact Us|
The data store
Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora provides up to five times better performance than MySQL with the security, availability, and reliability of a commercial database at one tenth the cost.
The data mart(s)
Data Mart A data mart is a simple form of data warehouse focused on a specific functional area or subject matter. For example, you can have specific data marts for each division in your organization or segment data marts based on regions. You can build data marts from a large data warehouse, operational stores, or a hybrid of the two. Data marts are simple to design, build, and administer. However, because data marts are focused on specific functional areas, querying across functional areas can become complex because of the distribution.