Database Considerations
Requirements to consider when choosing a database:
- Scalability → Solution should achieve required throughput at launch and be able to scale if needed.
- Solution should be able to scale up/down if needed.
- Automatic scaling can save the cost of having someone do this.
- Storage Requirements → How large the DB needs to be, it depends on the architecture/design of the database.
- Object size and type → Which structures/objects do you need to store.
- Durability → Assurance that data will not be lost. It’s data availability and data durability.
- Take into account if the DB needs to have backups and if there are regional requirements for data.
Amazon Relational Database Service (RDS)
To maintain full control over the database system → Install DB system on EC2 instance (unmanaged).
- Won’t be able to scale unless specified.
- Challenges → Take up time and resources → RDS solves this.
- Server maintenance and energy footprint.
- Software installation and patches.
- DB backups and high availability.
- Limits on scalability.
- Data security.
Database Instance → Isolated database environment, can contain multiple databases. Pick instance class (type of EC2 instance) and instance storage (type of storage used). Also choose DB engine (SQL only) to run.
- Engines: MySQL, Aurora, SQL Server (up to 100 db per instance), PostgreSQL, MariaDB, Oracle (1 db per instance).
Billing → DB instance hours, storage, I/O request, provisioned IOPS, backup storage, and data transfer IN/OUT (💵).
Features
-
Multi-AZ deployment → Uses Primary/Standby replication (synchronous replication). It can do automatic failover, and it provides more durability and availability. Might generate a bit more latency due to the replication.
- ❌ Standby replicas cannot serve read requests.
- Failover is triggered by loss of availability in primary AZ, loss of network connectivity to primary, compute unit failure on primary, and storage failure on primary.
- Failover is done by switching the CNAME from the primary to standby.
-
Read replicas → Uses asynchronous replication (for read heavy apps). Enhanced performance, availability, and security.
- Uses: Scaling horizontally (reads), serve read traffic if source DB is in maintenance (will serve “stale” data), business reporting, data warehouse, or disaster recovery.
- SQL Server → Supports Read Replicas in the Enterprise version.
- Up to 5 instances for a given source DB instance.
- Replica of a replica (2nd tier replica) → Aurora, MySQL, and MariaDB support this.
-
Maintenance Window → Opportunity to modify, upgrade instances or engines. Multi-AZ deployments reduce impact of maintenance.
-
Enhanced Monitoring → Enables metrics from RDS child processes
(includes threads), RDS processes
, OS processes
from an agent in the instances.
-
Automated Backups → Point-in-time recovery of the instance. Automatic daily full snapshots and transaction logs. Stored in S3.
- Retained for up to 35 days (default is 7).
- Free storage for all the DB storage (limited to the DB size and active DBs).
- Deleted if the DB is deleted.
-
Automatic Scaling → Automatically scales storage capacity in response to growing workloads, without downtime.
-
Database Snapshots → User-initiated, back up DB state as frequently as you want. Stored in S3.
- Not exported automatically to S3, must be done manually.
-
Encryption → Can use SSL for data in transit. Encryption at rest using keys in KMS.
- For SQL Server, to use SSL:
- Force SSL in all connections with
rds.force_ssl
(default false
).
- For specific clients, download the SSL certificate.
-
RDS Proxy → Fully managed, highly available DB proxy that pools and shares DB connections.
- Uses: Applications with unpredictable workloads, frequently close/open connections, keep open (idle) connections, high-availability, and improved credential management.
- Supported only in MySQL and Aurora (compatible with MySQL).
-
RDS Events → Operational events triggered, no data-modifying events.
-
Blue/Green Deployments → Copies a production (blue) database into a separate, synchronized staging (green) environment. You can make changes to the staging environment and then promote that one to production.
- Supported only in MySQL, MariaDB, and Aurora (compatible with MySQL).
-
Optimized Writes → Can get up to 2x write throughput by writing only once when flushing to durable storage.
- Uses: Write-heavy workloads.
- Supported only in MySQL.
-
Optimized Reads → Can speed up query processing by up to 50% by using NVMe-based instance storage.
- Uses: Complex queries and analytics.
- Supported only in MySQL.
-
Trusted Language Extensions → Help build extensions in PostgreSQL (supported only in PostgreSQL).
-
Compliance → HIPAA and PCI compliant.
Use cases and requirements:
Instance Types
Each family has a different purpose and best use case:
- T Family → For burstable, unpredicted workloads.
- M Family → General purpose, mid/small size workloads.
- R Family → Memory intensive workloads.
Storage Types
Storage is EBS-backed, so the EBS storage types can be used:
- General Purpose SSD → Broad range workloads.
- Provisioned IOPS SSD → Fast, predictable, consistent I/O performance.
- Magnetic Storage → Small workloads, less frequent data access. Not for production.
Cost Optimization
Amazon DynamoDB
Features:
- Global tables can be replicated across multiple regions. Multi-region, multi-master database. You choose which regions can access those tables.
- DynamoDB Stream is an ordered flow of information about changes in a table.
Recommendation is to have compound keys with partition keys that have a high-cardinality attributes (such as IDs) (source). Not following this can result in “hot” partitions, leading to throttling.