If you deal with large amounts of unstructured data for your applications, you may have come across MongoDB as a popular document database solution. But running MongoDB at scale yourself can be a challenge. This is where Amazon DocumentDB comes in – it aims to provide MongoDB-like capabilities in a fully managed package.
In this comprehensive guide, we‘ll dig deep into DocumentDB to see what makes it tick under the hood. We‘ll look at how it stores data, queries documents, scales performance, and provides high availability. Along the way, we‘ll compare DocumentDB capabilities and costs to MongoDB and other databases. By the end, you‘ll be able to make an informed choice between DocumentDB and alternatives for your own workload.
DocumentDB Under the Hood
DocumentDB is designed as a distributed document-oriented NoSQL database, similar in many ways to MongoDB. Let‘s look at some of its key technical details:
Document Data Model
Like MongoDB, DocumentDB uses a document data model. Data is stored in flexible JSON documents, not rows and columns. This allows nested, hierarchical data to be modeled naturally:
{
"name": "John",
"addresses": [
{
"street": "123 Main St",
"city": "Anytown",
"state": "CA"
},
{
"street": "456 Oak Rd",
"city": "Mytown",
"state": "NY"
}
]
}
Documents make it easy to represent complex real-world data without restrictive schema.
Indexing and Querying
To optimize query performance, DocumentDB creates indexes on frequently queried document fields. These indexes are updated in real-time on writes.
You can write ad-hoc queries, filters, sorts, and aggregations using the MongoDB query language. For example:
db.customers.find({state: "CA"})
db.orders.aggregate([
{$match: {status: "A"}},
{$group: {_id: "$customerId", total: {$sum: "$amount"}}}
])
DocumentDB aims to provide query latency and throughput on par with leading NoSQL and relational databases.
Storage Architecture
Storage is distributed across SSD-backed instances for performance. Data is sharded into 10GB chunks and replicated across AZs for durability:

(Image source: AWS re:Invent 2018)
The storage system detects and repairs corrupted blocks automatically. You can scale from 10GB up to 64TB without sharding the data yourself.
Independent Scaling
A key advantage of DocumentDB is the ability to elastically scale storage and compute independently. Storage scales seamlessly in 10GB increments behind the scenes.
For read scaling, you can add up to 15 read replicas to parallelize queries without affecting write performance. This architecture pattern is shown below:

High Availability and Durability
DocumentDB achieves high availability through synchronous replication across 3 Availability Zones. If the primary node fails, an election process will automatically failover to a replica.
You can sustain 2 copies lost for writes, 3 copies lost for reads without affecting availability. DocumentDB also continuously verifies data integrity and self-heals any corrupt blocks found.
Durability is enhanced by storing 6 full copies of your data across AZs, protecting against data loss.
Comparing DocumentDB to MongoDB
Now that we understand how DocumentDB works under the hood, how does it compare to running open-source MongoDB yourself?
| MongoDB | DocumentDB |
|---|---|
| Self-managed | Fully managed |
| Tune deployment yourself | AWS handles management |
| Open source | Proprietary AWS service |
| Configure own scaling | Auto-scaling |
| Manual failover | Automatic failover |
| DIY security | Built-in security |
For many, the biggest benefit of DocumentDB is removing the administrative burden of managing a distributed database cluster. AWS handles all that for you.
But you do lose some control – you can‘t customize the operating system or storage, for example. DocumentDB also lags MongoDB in some features like aggregations and geospatial support.
That said, DocumentDB‘s auto-scaling, high availability, and security capabilities make it suitable for mission-critical apps with less operational overhead.
Alternatives to Consider
Beyond MongoDB and DocumentDB, what other database options are worth considering?
| Database | Use Cases |
|---|---|
| MongoDB Atlas | Fully managed MongoDB with more features than DocumentDB |
| DynamoDB | High performance NoSQL DB for serverless apps |
| Aurora | MySQL/Postgres compatible relational database |
| Redshift | Petabyte-scale data warehousing |
| Elasticsearch | Search and analytics for log and text data |
Each database has its strengths based on data model (document, key-value, relational), performance patterns, and scaling needs.
For example, DynamoDB makes sense for high-throughput serverless applications, while Redshift is tailored for analytics. Elasticsearch is purpose-built for text search and logs.
When to choose DocumentDB
| You need | DocumentDB offers |
|---|---|
| MongoDB compatibility | Drop-in replacement for MongoDB apps |
| No admin overhead | Fully managed database |
| Fast scaling | Auto-scaling built-in |
| High availability | Multi-AZ synchronous replication |
| Strong security | Encryption, access control, standards compliance |
| Disaster recovery | Backup/restore, multi-region clusters |
Migrating to DocumentDB
If you have an existing MongoDB deployment, migrating to DocumentDB is relatively smooth:
-
Use
mongodumpto export your MongoDB data to a dump file. -
Create a DocumentDB cluster and launch instances.
-
Modify your connection strings to the new DocumentDB endpoint.
-
Import the dump file into DocumentDB using
mongorestore.
Most apps will work without code changes, assuming you are using supported drivers and MongoDB operations. Testing queries and performance under load is still advised post-migration.
Use DMS (Database Migration Service) for continuous replication to migrate with minimal downtime.
Scaling DocumentDB Performance
To scale DocumentDB throughput for read-heavy workloads, you can add up to 15 read replicas. This increases the number of nodes available to serve queries and parallelize operations.

Based on benchmarks from AWS, adding read replicas can reduce average query latency significantly:

You can monitor CPU and connections on existing instances and scale out reads before latency increases. Storage scales seamlessly in 10GB increments up to 64TB as needed.
Global vs Multi-AZ Deployment
For disaster recovery, you can deploy DocumentDB across multiple AWS regions or multiple AZs:
| Multi-AZ | Global Clusters |
|---|---|
| Sync replication across 3 AZs | Async replication across regions |
| Minimizes RTO on failures | Protects against region outage |
| Lower latency reads | Higher latency than Multi-AZ |
Multi-AZ is better for high availability and low RTO. Global Clusters provide geo-redundancy across regions but have higher latency than Multi-AZ.
Pick Multi-AZ for production workloads needing always-on availability. Use Global Clusters for true disaster recovery capability.
Securing DocumentDB
DocumentDB provides robust security capabilities out of the box:
-
Encryption: Data at rest and in transit is encrypted using AWS KMS. Replicas, snapshots, backups are also encrypted.
-
Network isolation: DocumentDB clusters run in an isolated VPC with security groups. Access is restricted.
-
Access control: Granular IAM policies control access to clusters and operations.
-
Audit logs: API calls can be logged for auditing and visibility.
-
Compliance: DocumentDB is SOC, PCI DSS, HIPAA eligible to meet compliance requirements.
These built-in security features reduce the burden of hardening MongoDB and protecting data from leaks or unauthorized access.
Estimating DocumentDB Costs
When comparing DocumentDB costs, start by estimating storage and compute needs:
-
Storage: Calculate current DB size and growth rate. Provision 25-50% extra.
-
Compute: Gather CPU, memory, and connection metrics. Test different instance classes.
-
Read replicas: Estimate required read throughput based on queries/second.
Here‘s a sample monthly cost estimate for a 100GB DocumentDB cluster:
| Item | Configuration | Monthly Cost |
|---|---|---|
| Instances | 2 x db.r5.large | $308 |
| Storage | 100GB @ $0.25/GB | $25 |
| Total per month | $333 |
Going from 2 instances to 4 read replicas with 200GB of storage would be around $700/month.
You can further optimize costs by right-sizing instances, scaling capacity appropriately, using reserved instances, and setting up auto-scaling.
Key Takeaways
To wrap up, here are the key takeaways on DocumentDB:
-
✅ Fully managed MongoDB-compatible database
-
✅ Built for scalability, high availability, strong security
-
✅ Smooth migration from existing MongoDB apps
-
✅ Automated backups, failover, and disaster recovery capabilities
-
✅ Advanced monitoring and security out of the box
-
✅ More operational overhead vs self-managed MongoDB
-
✅ Less DBA control and flexibility
-
✅ Not full feature parity with MongoDB
Ultimately, DocumentDB simplifies running production MongoDB workloads by handling the operational complexities of a distributed database for you. If your application needs enterprise-grade performance, availability, and security, DocumentDB is an excellent choice to consider.