MongoDB - Sharding
Sharding
SLIDES 2-3
Replicas are one way to scale. They create additional full copies of the data and provide resilience through automatic failover.
SLIDES 4-5
Sharding is MongoDB’s method for horizontal scaling. One or more large databases are partitioned into smaller, more manageable pieces. This separation is accomplished using a carefully selected key (shard key) and then the partitions are distributed across multiple servers typically based on ranges of shard key values.
To shard, or not to shard
SLIDE 6
Sharding is rather complicated, whether in MongoDB or some other DB, so you should be sure that it’s the best option for your needs.
Storage use
Monitoring the actual disk space your databases are using is pretty simple:
watch the /data
directory tree (or wherever you configure MongoDB to store its
data). If the space begins to run low (~ 80% full) you should add more capacity
if you have the money and places to add it. Transferring everything to a new
volume is possible, but it can take time and could impact production. If you
can’t add any more, maybe sharding is the next step.
Processor power
MongoDB works best when indexes and the working set are in RAM. The working set is the most commonly queried and updated for a particular application.
If the load grows, you may need to increase RAM, processor speed, or storage speed. Sometimes with cloud based MongoDB’s (like Atlas, AWS, IBM Cloud, or others) you may find that the largest configuration available will not meet the demand.
If you can’t address the issue, the system degrades to the point of thrashing, where every read or write requires non-RAM storage access.
Well before you get here, you need to consider Sharding.
Sharding architecture
SLIDE 7 (a repeat of 5)
- replica sets
-
chunks are typically key ranges
-
mongos
is a router- usually one started for an application and run on the same system for efficiency
- cache the info from the config servers and use it to route read and write operations to the correct shards.
- updates the cache when there are metadata changes for the cluster, such as Chunk Splits or adding a shard.
- Shards also read chunk metadata from the config servers.
- Config servers
- determine where the documents are sent (following excerpts are from MongoDB docs):
- store the metadata for a sharded cluster. The metadata reflects state and organization for all data and components within the sharded cluster. The metadata includes the list of chunks on every shard and the ranges that define the chunks.
- store authentication information for the cluster and also manage distributed locks
Only administrators and the mongos
router should connect to the shards. If an
app connects directly to a shard, it will only see that shard’s segment of the
data.
Two techniques for handling sharding:
- range-based - uses the shard key directly select a shard
- hash-based - hashes the shard key
Cluster balancing
SLIDE 8
MongoDB uses chunk splits and migration to maintain load balancing. Applications shouldn’t notice when this happens.
- If a chunk grows to exceed 64MB, MongoDB will split it into multiple chunks.
- If one shard is getting more than its share of the activity, MongoDB will
migrate chunks between shards.
- That is, it modifies the ranges by updating the configuration servers and doing the migration
Example startup script from MongoDB’s Andrew Erlichson
#!/bin/bash
# Andrew Erlichson
# Nathan Leniz - modified to update for MongoDB 3.4
# MongoDB
# script to start a sharded environment on localhost
# clean everything up
echo "killing mongod and mongos"
killall mongod
killall mongos
echo "removing data files"
rm -rf /data/config
rm -rf /data/shard*
# start a replica set and tell it that it will be shard0
echo "starting servers for shard 0"
mkdir -p /data/shard0/rs0 /data/shard0/rs1 /data/shard0/rs2
mongod --replSet s0 --logpath "s0-r0.log" --dbpath /data/shard0/rs0 --port 37017 --fork --shardsvr
mongod --replSet s0 --logpath "s0-r1.log" --dbpath /data/shard0/rs1 --port 37018 --fork --shardsvr
mongod --replSet s0 --logpath "s0-r2.log" --dbpath /data/shard0/rs2 --port 37019 --fork --shardsvr
sleep 5
# connect to one server and initiate the set
echo "Configuring s0 replica set"
mongo --port 37017 << 'EOF'
config = { _id: "s0", members:[
{ _id : 0, host : "localhost:37017" },
{ _id : 1, host : "localhost:37018" },
{ _id : 2, host : "localhost:37019" }]};
rs.initiate(config)
EOF
# start a replicate set and tell it that it will be a shard1
echo "starting servers for shard 1"
mkdir -p /data/shard1/rs0 /data/shard1/rs1 /data/shard1/rs2
mongod --replSet s1 --logpath "s1-r0.log" --dbpath /data/shard1/rs0 --port 47017 --fork --shardsvr
mongod --replSet s1 --logpath "s1-r1.log" --dbpath /data/shard1/rs1 --port 47018 --fork --shardsvr
mongod --replSet s1 --logpath "s1-r2.log" --dbpath /data/shard1/rs2 --port 47019 --fork --shardsvr
sleep 5
echo "Configuring s1 replica set"
mongo --port 47017 << 'EOF'
config = { _id: "s1", members:[
{ _id : 0, host : "localhost:47017" },
{ _id : 1, host : "localhost:47018" },
{ _id : 2, host : "localhost:47019" }]};
rs.initiate(config)
EOF
# start a replicate set and tell it that it will be a shard2
echo "starting servers for shard 2"
mkdir -p /data/shard2/rs0 /data/shard2/rs1 /data/shard2/rs2
mongod --replSet s2 --logpath "s2-r0.log" --dbpath /data/shard2/rs0 --port 57017 --fork --shardsvr
mongod --replSet s2 --logpath "s2-r1.log" --dbpath /data/shard2/rs1 --port 57018 --fork --shardsvr
mongod --replSet s2 --logpath "s2-r2.log" --dbpath /data/shard2/rs2 --port 57019 --fork --shardsvr
sleep 5
echo "Configuring s2 replica set"
mongo --port 57017 << 'EOF'
config = { _id: "s2", members:[
{ _id : 0, host : "localhost:57017" },
{ _id : 1, host : "localhost:57018" },
{ _id : 2, host : "localhost:57019" }]};
rs.initiate(config)
EOF
# now start 3 config servers
echo "Starting config servers"
mkdir -p /data/config/config-a /data/config/config-b /data/config/config-c
mongod --replSet csReplSet --logpath "cfg-a.log" --dbpath /data/config/config-a --port 57040 --fork --configsvr
mongod --replSet csReplSet --logpath "cfg-b.log" --dbpath /data/config/config-b --port 57041 --fork --configsvr
mongod --replSet csReplSet --logpath "cfg-c.log" --dbpath /data/config/config-c --port 57042 --fork --configsvr
echo "Configuring configuration server replica set"
mongo --port 57040 << 'EOF'
config = { _id: "csReplSet", members:[
{ _id : 0, host : "localhost:57040" },
{ _id : 1, host : "localhost:57041" },
{ _id : 2, host : "localhost:57042" }]};
rs.initiate(config)
EOF
# now start the mongos on a standard port
mongos --logpath "mongos-1.log" --configdb csReplSet/localhost:57040,localhost:57041,localhost:57042 --fork
echo "Waiting 60 seconds for the replica sets to fully come online"
sleep 60
echo "Connnecting to mongos and enabling sharding"
# add shards and enable sharding on the test db
mongo <<'EOF'
use admin
db.runCommand( { addshard : "s0/localhost:37017" } );
db.runCommand( { addshard : "s1/localhost:47017" } );
db.runCommand( { addshard : "s2/localhost:57017" } );
db.runCommand( { enableSharding: "school" } );
db.runCommand( { shardCollection: "school.students", key: { student_id:1 } } );
EOF
Choosing shard keys
SLIDE 9
As always, we think about which key(s) will naturally arise as the most commonly used? Back to the business rules, but also some simulations with a test machine and real usage flows would be helpful.
For example, does the nature of the input data and/or query patterns come in cycles? For example, does the frequency of queries regarding completed sales increase above the usual top queries near the end of a quarter?
Shard keys are required
Documents going into the database should include the shard key. If not, mongos
will send the document to every shard. If you inserted a lot of these
no-shard-key documents, you will have to do a COLSCAN
on each shard’s replica
set to find them and delete, repair, and re-insert them. This “scatter/gather”
approach can become very expensive.
It would be better to put data lacking the shard key into a different database.
Shard keys are messy to change
Once identified, the shard key cannot be changed without rebuilding the database (and all the shards).
Sufficient cardinality
{studentID: “F01012”, entryYear: 2017, department: “COSC”, GPA: 3.7}
The sharding process will be unable to shard across replicas if the shard
key has fewer than values. If GPA
is used as the shard key, with chunks
of 1.0 (e.g., GPA≥4, 3≤GPA<4, etc.), you obtain five chunks. If you need to
shard across eight servers, MongoDB will be unable to determine shard
membership.
You can improve cardinality by having a multi-part key as long as the better
key comes first. For example, is studentID
is monotonically increasing,
combining it with the department could be better:
{department: “...”, studentID: “...”}
Any unique
attribute must be part of the shard key
Consider how MongoDB would handle this. If it’s not part of the shard key, then it has no influence on which shard the document is stored. In that situation, MongoDB can’t ensure the uniqueness you desired since the shards are fully independent.
Avoid potential hot spots
If the distribution of the shard keys is not relatively uniform, some shards could receive more use than others. Shard key choices that have sufficient cardinality but lead to these hot spots are keys that are monotonically increasing.
Examples include _id
, timestamps, auto-increment values, etc.
{invoiceNumber: ..., totalPrice: ..., buyerID: ..., ...}
Here, invoiceNumber
is likely yo be monotonically increasing.so it would cause
hotspots. Combined with buyerID
might be a better choice, as long as buyerID
is first:
{buyerID:..., invoiceNumber:..., totalPrice: ...}
If that is still insufficient, then try totalPrice
instead, or even in
combination with buyerID
.
Shards maintain their own indexes
Otherwise the shard performance would be like any other database.
Shards only index on the _id
and/or the shard key, since using other fields
would require inter-shard communication.