AWS Serverless: DynamoDB

DynamoDB - Section Introduction

DynamoDB: NoSQL Serverless database
-Database is managed by AWS and scale for you
-Well integrated with lambda and AWS services
-Learn how to properly design DynamoDB tables, how to enable streams, and make sure DynamoDB tables are well secured

Traditional Architecture

AWS_Serverless:_DynamoDB1
:We have clients and they connect to an Application layer that could be made of Elastic Load Balancer, and EC2 Instances that are grouped and scaling with an Auto Scaling Group
Have a Database Layer and it could be using Amazon RDS which is backed by MySQL, or PostgreSQL, or these kind of technologies
-Traditional applications leverage RDBMS databases
-These Databases have the SQL query language
-Strong requirements about how the data should be modeled
-Ability to do query joins, aggregations, complex computations
-Vertical Scaling (Getting a more powerful CPU/RAM/IO)
If wanting a better database, you need to replace a database and get a more powerful CPU, or better disc with IO
-Horizontal Scaling (Increasing reading capability by adding EC2/ RDS Read Replicas)
As for Read Replicas, you need to limit the number of Read Replicas you can have and therefore limited into your horizontal read scaling

No SQL Databases

-NoSQL Databases are non-relational databases and are distributed
-NoSQL databases include MongoDB, DynamoDB, …
-NoSQL databases do not support query joins (or just limited support)
-NoSQL databases don’t perform aggregations such as “SUM, “AVG”,…)
-All the data that is needed for a query is present in one row
-NoSQL databases scale horizontally
If you need more write or read capacity, behind the scenes can have more instances and scale very well
-There’s no “right or wrong” for NoSQL vs SQL, they just require to the model the data differently and think about the user queries differently

Amazon Dynamo DB

-Fully managed, highly available with replication across multiple AZs
-NoSQL Database - not a relational database
-Scales to massive workloads, distributed database
-Millions of requests per seconds, trillions of row, 100s of TB of storage
-Fast and consistent in performance(Low latency in retrieval)
-Integrated with IAM for security, authorization and administration
-Enables event driven programming with Dynamo DB Streams
-Low-cost and auto-scaling capabilites
-Standard & Infrequent Access(IA) Table Class

Dynamo DB - Basics

-DynamoDB is made of Tables
-Each table has a Primary Key (must be decided at creation time)
-Each table can have an infinite number of items (=rows)
-Each item has attributes (can be added over time - can be null)
-Maximum size of an item is 400KB
-Data types supported are:
Scalar Types - String, Number, Binary, Boolean, Null
Document Types - List, Map
Set Types - String Set, Number Set, Binary Set

DynamoDB - Primary Keys

Option1: Partition Key (HASH)
-Partition key must be unique for each item
-Partition key must be “diverse” so that the data is distributed
-Example: “User_ID” for a users table
AWS_Serverless:_DynamoDB2
Option2: Partition Key + Sort Key (HASH + RANGE)
-The combination should be unique for each item
-Data is grouped by partition key
-Example: user-games table, “User_ID” for a Partition Key and “Game_ID” for Sort Key
AWS_Serverless:_DynamoDB3

DynamoDB - Partition Keys (Exercise)

-We’re building a movie database
-What is the best Partition Key to maximize data distribution?
movie_id, producer_name, leader_actor_name, movie_language
Answer: “movie_id” has the highest cardinality so it’s a good candidate
Is unique for each row
“movie_language” doesn’t take many values and may be skewed towards English so it’s not a great choice for the Partition Key
=> Point: Always choose the parttion key with the highest cardinality and that can take the most amount of values

DynamoDB Basics - Hands On

Create Table

Database is already created for us
1)Enter Table name, Partition Key, and Sort Key
AWS_Serverless:_DynamoDB4
Table name: Users
Partition key: user_id, (type is String)
2)Settings
AWS_Serverless:_DynamoDB5
3)Table Class
AWS_Serverless:_DynamoDB6
Dynamo DB Standard: recommended for most use cases
Dynamo DB Standard-IA: Data is infrequently accessed and give you some cost optimizations <br? 4)Read/Write Capacity Settings
Privisioned: In the free tier
Reading Capacity: Off, Writing Capacity: Off
=> Privisioned a read and write capacity
Two provisioned capacity units are within the free tier, so two for reads and two for writes
5)Secondary Indexes
Skip
6) Estimated Cost
AWS_Serverless:_DynamoDB7
=> Within the Free tier
7) Encryption at Rest
AWS_Serverless:_DynamoDB8
8)Overview of Users Table
AWS_Serverless:_DynamoDB9 Can confirm Partition key, Sort key, and Capacity mode
9) See Items by clicking View items
9-1) Choose user_id and define define partition key
AWS_Serverless:_DynamoDB10
9-2) Add attributes
Create though form
AWS_Serverless:_DynamoDB11
Create though json
AWS_Serverless:_DynamoDB12
10)First item created
AWS_Serverless:_DynamoDB13
Item John Doe created in my table
user_id, first_name, and last_name
11)Second item created with the same user_id?
AWS_Serverless:_DynamoDB14
=>Going to have a problem, because I am using the same user_id as before and has to be unique, when I have just the partition key
12)Create second item with a new attribute
AWS_Serverless:_DynamoDB15
-Request is successful
-As for an RDS database, would’ve had a problem such as columns not defined, some values or null
-Can have john to have a first name and last name, and have Alice to have an age and first name
=> Completely fine to you to add attributes over time. The only thing that has to be non-null is the user_id
-In the example, john has a null age, and alice has a null last_name by default
This is the risk and the power of dynamo DB. You can add attributes over time without impacting previous data
13)Create a Second table now including a sort_key
AWS_Serverless:_DynamoDB17
AWS_Serverless:_DynamoDB18
14)Table Settings
Customize Settings
AWS_Serverless:_DynamoDB19
15)Read/Write Capacity Settings
-Provisioned
-Read Capacity Auto Scaling Off & Provisioned Capacity Units 2, Write Capacity Auto Scaling Off & Provisioned Capacity Units 2
AWS_Serverless:_DynamoDB22
-Because, it is serverless, there is no need to say what’s happening to the table, what’s the underlying database, you can create as many tables as you want, you can create as many tables as you want within DynamoDB database at a region level
16)Overview of UserPosts Table
AWS_Serverless:_DynamoDB23
17)Create item with a new attribute
AWS_Serverless:_DynamoDB24

Footnote
non-relational databases:
Unlike relational databases that store data in tables with predefined schemas
NoSQL databases use a variety of data models, including key-value, document, wide-column, and graph formats
These models provide more flexibility in terms of data structure, allowing for the storage of unstructured or semi-structured data
distributed:
NoSQL databases are often designed to run on distributed systems
where data is spread across multiple servers or nodes in a network
This distribution can be across data centers or even geographical regions
Vertical Scaling:
Also known as scaling up, involves enhancing the database server’s resources (CPU, RAM, IO) to handle more load
This is often simpler but can become expensive and has physical limits
Horizontal Scaling for Reads:
Achieved by adding Read Replicas in RDS, which are read-only copies of the database
While this improves read capacity, there’s a practical limit to the number of replicas you can have, thus limiting scalability
AWS_Serverless:_DynamoDB20
18)Estimated Cost
AWS_Serverless:_DynamoDB21
19)First item created
AWS_Serverless:_DynamoDB25
20)Second item created with the same user_id

AWS_Serverless:_DynamoDB26
AWS_Serverless:_DynamoDB27
Conclusion
Super important to choose a good partition key. If john123 is the only user for the post and he has 10,000 posts, the data is going to be heavily skewed toward john123

-Even though having the same user_id, because the post_ts was different, we were able to enter the data into the table
-Uniqueness is on user_id and post_ts as a combination
-The data is partitioned by user_id, which is why john123 is clickable. We can query and search for john123 as an user_id
-We can sort the data by post timestamp, which is why we call it a sort key

DynamoDB-Read/Write Capacity Modes

-Control how you manage your table’s capacity (read/write throughput)
-Provisioned Mode (default)
You specify the number of read/writes per second ( = Read capacity mode & Write capacity mode
) You need to plan capacity beforehand
Pay for provisioned read & write capacity units
Example: Want 10 read capacity units & 5 write capacity units, want to pay for that every hour
-On-Demand Mode
Read/Writes automatically scale up/down with your workloads
No capacity mode needed
Pay for what you use, more expensive
-You can switch between different modes once every 24 hours

R/W Capacity Modes - Provisioned

-Table must have provisioned read and write capacity units
-Read Capacity Units(RCU) - throughput for reads
-Write Capacity Units(WCU) - throughput for writes
-Option to setup auto-scaling of throughput to meet demand
-Throughput can be exceeded temporarily using “Burst Capacity”
-If Burst Capacity has been consumed, you’ll get a “ProvisionedThroughputExceededException”
-It’s then advised to do an exponential backoff retry

DynamoDB - Write Capacity Units (WCU)

Exam will ask you to perform some computations, so you need to understand the formulas to compute WCU and RCU -One WCU(Write Capacity Unit) represents one write per second for an item up to 1KB in size -If the items are larger than 1KB, more WCUs are consumed -Example1:
We write 10 items per second, with item size 2KB
10 * (2/1) = 20WCUs
-Example2:
We write 6 items per second, with item size 4.5KB
6 * (5/1) = 30WCUs
-Example3:
We write 120 items per minute, with item size 2KB
(120/60) * (2/1) = 4WCUs

Strongly Consistent Read vs. Eventually Consistent Read

Two kind of read modes for DynamoDB, which are strongly consistent read and eventually consistent read Example:
-DynamoDB is indeed a serverless database, but behind the scenes, there are servers. You just don’t see them or manage them
-We have servers, and let’s just consider three servers right now to make it very, very simple, and your data is going to be distributed and replicated across all the servers. -Now considering your application, application is going to do writes to one of these servers, and internally DynamoDB is going to replicate these writes across different servers, such as Server2 and Server3 -Now when your application reads from DynamoDB, there is a chance that you’re going to read not from Server1, but from Server2

Strongly Consistent Read vs. Eventually Consistent Write

-Eventually Consistent Read (default)
If we read just after a write, it is possible we will get some stale data because of replication
-Strongly Consistent Read
If we read just after write, we will get the correct data
Set “ConsistentRead” parameter to True in API calls (GetItem, BatchGetItem, Query, Scan)
Consumes twice the RCU
Why would we want to do strongly consistent reads all the time?
It is a more expensive query and may also have higher latency