Sharding

SLIDES 2-3

Replicas are one way to scale. They create additional full copies of the data and provide resilience through automatic failover.

SLIDES 4-5

Sharding is MongoDB’s method for horizontal scaling. One or more large databases are partitioned into smaller, more manageable pieces. This separation is accomplished using a carefully selected key (shard key) and then the partitions are distributed across multiple servers typically based on ranges of shard key values.

To shard, or not to shard

SLIDE 6

Sharding is rather complicated, whether in MongoDB or some other DB, so you should be sure that it’s the best option for your needs.

Storage use

Monitoring the actual disk space your databases are using is pretty simple: watch the /data directory tree (or wherever you configure MongoDB to store its data). If the space begins to run low (~ 80% full) you should add more capacity if you have the money and places to add it. Transferring everything to a new volume is possible, but it can take time and could impact production. If you can’t add any more, maybe sharding is the next step.

Processor power

MongoDB works best when indexes and the working set are in RAM. The working set is the most commonly queried and updated for a particular application.

If the load grows, you may need to increase RAM, processor speed, or storage speed. Sometimes with cloud based MongoDB’s (like Atlas, AWS, IBM Cloud, or others) you may find that the largest configuration available will not meet the demand.

If you can’t address the issue, the system degrades to the point of thrashing, where every read or write requires non-RAM storage access.

Well before you get here, you need to consider Sharding.

Sharding architecture

SLIDE 7 (a repeat of 5)

CPShardArch

replica sets
chunks are typically key ranges
mongos is a router
- usually one started for an application and run on the same system for efficiency
- cache the info from the config servers and use it to route read and write operations to the correct shards.
- updates the cache when there are metadata changes for the cluster, such as Chunk Splits or adding a shard.
- Shards also read chunk metadata from the config servers.
Config servers
- determine where the documents are sent (following excerpts are from MongoDB docs):
- store the metadata for a sharded cluster. The metadata reflects state and organization for all data and components within the sharded cluster. The metadata includes the list of chunks on every shard and the ranges that define the chunks.
- store authentication information for the cluster and also manage distributed locks

Only administrators and the mongos router should connect to the shards. If an app connects directly to a shard, it will only see that shard’s segment of the data.

Two techniques for handling sharding:

range-based - uses the shard key directly select a shard
hash-based - hashes the shard key

Cluster balancing

SLIDE 8

MongoDB uses chunk splits and migration to maintain load balancing. Applications shouldn’t notice when this happens.

If a chunk grows to exceed 64MB, MongoDB will split it into multiple chunks.
If one shard is getting more than its share of the activity, MongoDB will migrate chunks between shards.
- That is, it modifies the ranges by updating the configuration servers and doing the migration

Example startup script from MongoDB’s Andrew Erlichson

#!/bin/bash

# Andrew Erlichson
# Nathan Leniz - modified to update for MongoDB 3.4
# MongoDB
# script to start a sharded environment on localhost

# clean everything up
echo "killing mongod and mongos"
killall mongod
killall mongos
echo "removing data files"
rm -rf /data/config
rm -rf /data/shard*


# start a replica set and tell it that it will be shard0
echo "starting servers for shard 0"
mkdir -p /data/shard0/rs0 /data/shard0/rs1 /data/shard0/rs2
mongod --replSet s0 --logpath "s0-r0.log" --dbpath /data/shard0/rs0 --port 37017 --fork --shardsvr
mongod --replSet s0 --logpath "s0-r1.log" --dbpath /data/shard0/rs1 --port 37018 --fork --shardsvr
mongod --replSet s0 --logpath "s0-r2.log" --dbpath /data/shard0/rs2 --port 37019 --fork --shardsvr

sleep 5
# connect to one server and initiate the set
echo "Configuring s0 replica set"
mongo --port 37017 << 'EOF'
config = { _id: "s0", members:[
{ _id : 0, host : "localhost:37017" },
{ _id : 1, host : "localhost:37018" },
{ _id : 2, host : "localhost:37019" }]};
rs.initiate(config)
EOF

# start a replicate set and tell it that it will be a shard1
echo "starting servers for shard 1"
mkdir -p /data/shard1/rs0 /data/shard1/rs1 /data/shard1/rs2
mongod --replSet s1 --logpath "s1-r0.log" --dbpath /data/shard1/rs0 --port 47017 --fork --shardsvr
mongod --replSet s1 --logpath "s1-r1.log" --dbpath /data/shard1/rs1 --port 47018 --fork --shardsvr
mongod --replSet s1 --logpath "s1-r2.log" --dbpath /data/shard1/rs2 --port 47019 --fork --shardsvr

sleep 5

echo "Configuring s1 replica set"
mongo --port 47017 << 'EOF'
config = { _id: "s1", members:[
{ _id : 0, host : "localhost:47017" },
{ _id : 1, host : "localhost:47018" },
{ _id : 2, host : "localhost:47019" }]};
rs.initiate(config)
EOF

# start a replicate set and tell it that it will be a shard2
echo "starting servers for shard 2"
mkdir -p /data/shard2/rs0 /data/shard2/rs1 /data/shard2/rs2
mongod --replSet s2 --logpath "s2-r0.log" --dbpath /data/shard2/rs0 --port 57017 --fork --shardsvr
mongod --replSet s2 --logpath "s2-r1.log" --dbpath /data/shard2/rs1 --port 57018 --fork --shardsvr
mongod --replSet s2 --logpath "s2-r2.log" --dbpath /data/shard2/rs2 --port 57019 --fork --shardsvr

sleep 5

echo "Configuring s2 replica set"
mongo --port 57017 << 'EOF'
config = { _id: "s2", members:[
{ _id : 0, host : "localhost:57017" },
{ _id : 1, host : "localhost:57018" },
{ _id : 2, host : "localhost:57019" }]};
rs.initiate(config)
EOF


# now start 3 config servers
echo "Starting config servers"
mkdir -p /data/config/config-a /data/config/config-b /data/config/config-c
mongod --replSet csReplSet --logpath "cfg-a.log" --dbpath /data/config/config-a --port 57040 --fork --configsvr
mongod --replSet csReplSet --logpath "cfg-b.log" --dbpath /data/config/config-b --port 57041 --fork --configsvr
mongod --replSet csReplSet --logpath "cfg-c.log" --dbpath /data/config/config-c --port 57042 --fork --configsvr

echo "Configuring configuration server replica set"
mongo --port 57040 << 'EOF'
config = { _id: "csReplSet", members:[
{ _id : 0, host : "localhost:57040" },
{ _id : 1, host : "localhost:57041" },
{ _id : 2, host : "localhost:57042" }]};
rs.initiate(config)
EOF

# now start the mongos on a standard port
mongos --logpath "mongos-1.log" --configdb csReplSet/localhost:57040,localhost:57041,localhost:57042 --fork
echo "Waiting 60 seconds for the replica sets to fully come online"
sleep 60
echo "Connnecting to mongos and enabling sharding"

# add shards and enable sharding on the test db
mongo <<'EOF'
use admin
db.runCommand( { addshard : "s0/localhost:37017" } );
db.runCommand( { addshard : "s1/localhost:47017" } );
db.runCommand( { addshard : "s2/localhost:57017" } );
db.runCommand( { enableSharding: "school" } );
db.runCommand( { shardCollection: "school.students", key: { student_id:1 } } );
EOF

Choosing shard keys

SLIDE 9

As always, we think about which key(s) will naturally arise as the most commonly used? Back to the business rules, but also some simulations with a test machine and real usage flows would be helpful.

For example, does the nature of the input data and/or query patterns come in cycles? For example, does the frequency of queries regarding completed sales increase above the usual top queries near the end of a quarter?

Shard keys are required

Documents going into the database should include the shard key. If not, mongos will send the document to every shard. If you inserted a lot of these no-shard-key documents, you will have to do a COLSCAN on each shard’s replica set to find them and delete, repair, and re-insert them. This “scatter/gather” approach can become very expensive.

It would be better to put data lacking the shard key into a different database.

Shard keys are messy to change

Once identified, the shard key cannot be changed without rebuilding the database (and all the shards).

Sufficient cardinality

{studentID: “F01012”, entryYear: 2017, department: “COSC”, GPA: 3.7}

The sharding process will be unable to shard across $n$ replicas if the shard key has fewer than $n$ values. If GPA is used as the shard key, with chunks of 1.0 (e.g., GPA≥4, 3≤GPA<4, etc.), you obtain five chunks. If you need to shard across eight servers, MongoDB will be unable to determine shard membership.

You can improve cardinality by having a multi-part key as long as the better key comes first. For example, is studentID is monotonically increasing, combining it with the department could be better: {department: “...”, studentID: “...”}

Any `unique` attribute must be part of the shard key

Consider how MongoDB would handle this. If it’s not part of the shard key, then it has no influence on which shard the document is stored. In that situation, MongoDB can’t ensure the uniqueness you desired since the shards are fully independent.

Avoid potential hot spots

If the distribution of the shard keys is not relatively uniform, some shards could receive more use than others. Shard key choices that have sufficient cardinality but lead to these hot spots are keys that are monotonically increasing.

Examples include _id, timestamps, auto-increment values, etc.

{invoiceNumber: ..., totalPrice: ..., buyerID: ..., ...}

Here, invoiceNumber is likely yo be monotonically increasing.so it would cause hotspots. Combined with buyerID might be a better choice, as long as buyerID is first:

{buyerID:..., invoiceNumber:..., totalPrice: ...}

If that is still insufficient, then try totalPrice instead, or even in combination with buyerID.

Shards maintain their own indexes

Otherwise the shard performance would be like any other database.

Shards only index on the _id and/or the shard key, since using other fields would require inter-shard communication.

Backup diagram

summary diagram