Parallel and distributed systems
Database schema before class: database_day19_end_of_class.sql
Slides from class
Database notes
Centralized database
A single database that serves an organization.
Parallel database
A database that can execute multiple tasks in parallel allowing the database to make use of multiple CPU cores and multiple disks that are standard for modern database servers (https://www.quora.com/What-is-the-difference-between-parallel-and-distributed-databases). Often the computational resources are located in one physical location (e.g., one data center). The primary goal of a parallel database system is typically to reduce execution time of large tasks. Examples include: Hadoop Map-Reduce and Hive.
Distributed database
A database that is distributed across multiple hosts (https://www.quora.com/What-is-the-difference-between-parallel-and-distributed-databases), normally at different geographical sites. A primary goal of distributed systems are often to reduce execution time of large tasks as well as to provide high availability. Examples include MongoDB and Cassandra.
High availability
A system that strives to be always available (to have minimal downtime).
Five nines
Means 99.999% uptime or about five minutes of down time per year.
MongoDB
A NoSQL database. MongoDB (from humongous ) is a scalable, high-performance, open source, schema-free, document-oriented database.
Cluster
The set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of config servers, shards, and one or more mongos routing processes. Also called a shared cluster.
mongo
The MongoDB shell. The mongo process starts the MongoDB shell as a daemon connected to either a mongod or mongos instance. The shell has a JavaScript interface.
mongod
The MongoDB database server. The mongod process starts the MongoDB server as a daemon. The MongoDB server manages data requests and formats and manages background operations.
mongos
The MongoDB sharded cluster query router. The mongos process starts the MongoDB router as a daemon. The MongoDB router acts as an interface between an application and a MongoDB sharded cluster and handles all routing and load balancing across the cluster.
Config server
A mongod instance that stores all the metadata associated with a sharded cluster.
Sharding
A database architecture that partitions data by key ranges and distributes the data among two or more database instances. Sharding enables horizontal scaling. Also called a shared cluster.
Shard
A single mongod instance or replica set that stores some portion of a sharded cluster’s total data set. In production, all shards should be replica sets.
Shard key
The field MongoDB uses to distribute documents among members of a sharded cluster.
Replication
A feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing.
Replication lag
The length of time between the last operation in the primary’s oplog and the last operation applied to a particular secondary. In general, you want to keep replication lag as small as possible.
Replica set
A cluster of MongoDB servers that implements replication and automated failover. This is MongoDB’s recommended replication strategy.
Op log
A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB.
Failover
The process that allows a secondary member of a replica set to become primary in the event of a failure.
Primary
In a replica set, the primary is the member that receives all write operations.
Secondary
A replica set member that replicates the contents of the master database. Secondary members may handle read requests, but only the primary members can handle write operations.
Election
The process by which members of a replica set select a primary on startup and in the event of a failure.
CAP theorem
Given three properties of computing systems, consistency, availability, and partition tolerance, a distributed computing system can provide any two of these features, but never all three.
Note: definitions from mongodb.com