mongodb tutorial and references

Introduction

MongoDB is a collection-oriented, schema-free document database.

By collection-oriented, we mean that data is grouped into sets that are called 'collections'. Each collection has a unique name in the database, and can contain an unlimited number of documents. Collections are analogous to tables in a RDBMS, except that they don't have any defined schema.

By schema-free, we mean that the database doesn't need to know anything about the structure of the documents that you store in a collection. In fact, you can store documents with different structure in the same collection if you so choose.

By document, we mean that we store data that is a structured collection of key-value pairs, where keys are strings, and values are any of a rich set of data types, including arrays and documents. We call this data format "BSON" for "Binary Serialized dOcument Notation."

MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application. We recommend running in 64-bit mode, since Mongo is limited to a total data size of about 2GB for all databases in 32-bit mode.

The MongoDB process listens on port 27017 by default (note that this can be set at start time - please see Command Line Parameters for more information).

MongoDB stores its data in files (default location is /data/db/), and uses memory mapped files for data management for efficiency.

Example data of mongodb

 bin/mongo --shell slides.js/** Slides for presentation on Mastering the MongoDB Shell.* Copyright 2010 Mike Dirolf (http://dirolf.com)* This work is licensed under the Creative Commons* Attribution-Noncommercial-Share Alike 3.0 United States License. To* view a copy of this license, visit* http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a* letter to Creative Commons, 171 Second Street, Suite 300, San* Francisco, California, 94105, USA.* Originally given at MongoSF on 4/30/2010.* Modified and given at MongoNYC on 5/21/2010.* To use: run `mongo --shell slides.js`*/db = db.getSisterDB("shell");db.dropDatabase();// some sample datafor (var i = 0; i < 1000; i += 1) {    db.data.save({        x: i    });} // "slides"db.deck.save({    slide: 0,    welcome: "to MongoNYC!",    hashtag: "#mongonyc",    mirror: "http://confmirror.10gen.com/"});db.deck.save({    slide: 1,    title: "Mastering the MongoDB Shell",    who: "Mike Dirolf, 10gen",    handle: "@mdirolf"});db.deck.save({    slide: 2,    question: "what is the shell?",    answer: "a better Powerpoint?"});db.deck.save({    slide: 3,    question: "what is the shell?",    answer: "a full JavaScript environment"});db.deck.save({    slide: 4,    question: "what is the shell?",    answer: "a reference MongoDB client"});db.deck.save({    slide: 5,    "use cases": ["administrative scripting", "exploring and debugging", "learning (and teaching!)"]});db.deck.save({    slide: 6,    repl: ["arrows for history", "^L"]});db.deck.save({    slide: 7,    "getting help": ["help", "db.help", "db.foo.help"]});db.deck.save({    slide: 8,    "show": ["dbs", "collections", "users", "profile"]});db.deck.save({    slide: 9,    navigating: "databases",    how: "'use' or 'db.getSisterDB'"});db.deck.save({    slide: 10,    navigating: "collections",    how: "dots, brackets, or 'db.getCollection'",    note: "careful with names like foo-bar"});db.deck.save({    slide: 11,    "basic operations": ["insert", "findOne", "find", "remove"]});db.deck.save({    slide: 12,    "fun with cursors": ["auto-iteration", "it"]});db.deck.save({    slide: 13,    "error checking": "auto 'db.getLastError'"});db.deck.save({    slide: 14,    "commands": ["count", "stats", "repairDatabase"],    "meta": "listCommands"});db.deck.save({    slide: 15,    "pro tip!": "viewing JS source"});db.deck.save({    slide: 16,    "getting help": "--help"});db.deck.save({    slide: 17,    scripting: "run .js files",    tools: ["--eval", "--shell", "runProgram"]});db.deck.save({    slide: 18,    warning: "dates in JS suck"});db.deck.save({    slide: 19,    warning: "array iteration in JS sucks"});db.deck.save({    slide: 20,    warning: "numeric types in JS suck"});db.deck.save({    slide: 21,    so: "why JS?"});db.deck.save({    slide: 22,    homework: ["convince 2 friends to try MongoDB", "send feedback @mdirolf"]});db.deck.save({    slide: 23,    url: "github.com/mdirolf/shell_presentation",    questions: "?"}); // current slidevar current = 0; // print current slide and advancevar next = function() {    var slide = db.deck.findOne({        slide: current    });    if (slide) {        current++;        delete slide._id;        delete slide.slide;        print(tojson(slide, null, false));    } else {        print("The End!");    }}; // go to slide and printvar go = function(n) {    current = n;    next();}; // repeat the previous slidevar again = function() {    current--;    next();};

Basic Commands of mongodb

Privileged Commands

Certain operations are for the database administrator only. These privileged operations may only be performed on the special database named admin.

Start the server like this:

 bin/mongod --dbpath /path/to/data

Stop it with Ctrl-C or with kill (but don’t use kill -9, which doesn’t give the server a chance to shut down cleanly and flush data to disk).

Viewing stats without the mongo console:

Visit http://localhost:28017 (28017 is the port the server is running on, plus 1000) to get an overview of what’s going on.

You can also query http://localhost:28017/_status to see the same data that db.serverStatus() returns, but in JSON format.

Use client command console

 bin/mongo

use admin;
db.addUser('dummy, 'dummy') // create new user with password "dummy"
db.runCommand("shutdown"); // shut down the database

 bin/mongo -u dummy -p dummy admin

Create a new Database:

use new_database
db.addUser('dummy', 'differentpassword')

 bin/mongo -u dummy -p differentpassword new_database

Deleting a database

use [db name]
db.dropDatabase()

Working with Collections (Tables):

show collections

Lists all the available databases:

show dbs

To get a list of data in a specific collection, use:

db.[collection name].find()

Directory Global functions and properties:

for(var o in this) {
print(o);
}

__quietchattyfriendlyEqualdoassertassertargumentsToArrayisStringisNumberisObjecttojsontojsonObjectshellPrintprintjsonshellPrintHelpershellHelperhelpRandomkillWithUrisGeoconnectMRMapReduceResult__lastres__sleep_sleepquit_quitgetMemInfo_getMemInfo_srand__srand_rand__rand_isWindows__isWindows_startMongoProgram__startMongoProgramrunProgram_runProgramrunMongoProgram_runMongoProgramstopMongod_stopMongodstopMongoProgram_stopMongoProgramstopMongoProgramByPid_stopMongoProgramByPidrawMongoProgramOutput_rawMongoProgramOutputclearRawMongoProgramOutput_clearRawMongoProgramOutputremoveFile_removeFilelistFiles_listFilesresetDbpath_resetDbpathcopyDbpath_copyDbpath_parsePath_parsePortcreateMongoArgsstartMongodTeststartMongodstartMongodNoResetstartMongosstartMongoProgramstartMongoProgramNoConnectmyPortShardingTestprintShardingStatusMongodRunnerReplPairToolTestReplTestallocatePortsSyncCCTestdbhex_md5_hex_md5version_versionicurrentnext__iscmd_____it___it

Directory properties of db:

for(var o in db){
print(o);
}

_mongo_nameshellPrintgetMongogetSisterDBgetNamestatsgetCollectioncommandHelprunCommand_dbCommand_adminCommandaddUserremoveUser__pwHashauthcreateCollectiongetProfilingLeveldropDatabaseshutdownServercloneDatabasecloneCollectioncopyDatabaserepairDatabasehelpprintCollectionStatssetProfilingLevelevaldbEvalgroupevalgroupcmdgroup_groupFixParmsresetErrorforceErrorgetLastErrorgetLastErrorObjgetLastErrorCmdgetPrevErrorgetCollectionNamestojsontoStringcurrentOpcurrentOPkillOpkillOPgetReplicationInfoprintReplicationInfoprintSlaveReplicationInfoserverBuildInfoserverStatusversionprintShardingStatus

db.shell.help()

DBCollection helpdb.foo.count()db.foo.dataSize()db.foo.distinct( key ) - eg. db.foo.distinct( 'x' )db.foo.drop() drop the collectiondb.foo.dropIndex(name)db.foo.dropIndexes()db.foo.ensureIndex(keypattern,options) - options should be an object with these possible fields: name, unique, dropDupsdb.foo.reIndex()db.foo.find( [query] , [fields]) - first parameter is an optional query filter. second parameter is optional set of fields to return.e.g. db.foo.find( { x : 77 } , { name : 1 , x : 1 } )db.foo.find(...).count()db.foo.find(...).limit(n)db.foo.find(...).skip(n)db.foo.find(...).sort(...)db.foo.findOne([query])db.foo.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )db.foo.getDB() get DB object associated with collectiondb.foo.getIndexes()db.foo.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )db.foo.mapReduce( mapFunction , reduceFunction , <optional params> )db.foo.remove(query)db.foo.renameCollection( newName , <droptarget> ) renames the collection.db.foo.runCommand( name , <options> ) runs a db command with the given name where the 1st param is the colleciton namedb.foo.save(obj)db.foo.stats()db.foo.storageSize() - includes free space allocated to this collectiondb.foo.totalIndexSize() - size in bytes of all the indexesdb.foo.totalSize() - storage allocated for all data and indexesdb.foo.update(query, object[, upsert_bool, multi_bool])db.foo.validate() - SLOWdb.foo.getShardVersion() - only for use with sharding

db.help()

DB methods:db.addUser(username, password[, readOnly=false])db.auth(username, password)db.cloneDatabase(fromhost)db.commandHelp(name) returns the help for the commanddb.copyDatabase(fromdb, todb, fromhost)db.createCollection(name, { size : ..., capped : ..., max : ... } )db.currentOp() displays the current operation in the dbdb.dropDatabase()db.eval(func, args) run code server-sidedb.getCollection(cname) same as db['cname'] or db.cnamedb.getCollectionNames()db.getLastError() - just returns the err msg stringdb.getLastErrorObj() - return full status objectdb.getMongo() get the server connection objectdb.getMongo().setSlaveOk() allow this connection to read from the nonmaster member of a replica pairdb.getName()db.getPrevError()db.getProfilingLevel()db.getReplicationInfo()db.getSisterDB(name) get the db at the same server as this onewdb.killOp(opid) kills the current operation in the dbdb.printCollectionStats()db.printReplicationInfo()db.printSlaveReplicationInfo()db.printShardingStatus()db.removeUser(username)db.repairDatabase()db.resetError()db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into { cmdObj : 1 }db.serverStatus()db.setProfilingLevel(level,) 0=off 1=slow 2=alldb.shutdownServer()db.stats()db.version() current version of the server

help()

HELPshow dbs                     show database namesshow collections             show collections in current databaseshow users                   show users in current databaseshow profile                 show most recent system.profile entries with time >= 1msuse <db name>                set curent database to <db name>db.help()                    help on DB methodsdb.foo.help()                help on collection methodsdb.foo.find()                list objects in collection foodb.foo.find( { a : 1 } )     list objects in foo where a == 1it                           result of the last line evaluated; use to further iterate

If I insert the same Document twice, it does not raise an Error?

Using PyMongo for example, why does inserting the same document (read same _id) more than once not raise an error? When we need to detect if a document already exists in the database, we could try catching DuplicateKeyError on insert. Below we explicitly insert the document with the same _id twice but the exception is never raised. Why?

try

doc = { _id: '123123' }db.foo.insert(doc)db.foo.insert(doc)except pymongo.errors.DuplicateKeyError, error:print("Same _id inserted twice:", error)

The answer is that DuplicateKeyError will only be raised if we do the insert in safe mode i.e. db.foo.insert(doc, safe=True). The reason why we do not see an error raised with most drivers but with MongoDB's interactive shell is because the shell does a safe insert by default whereas, with most drivers, it is the developers choice whether or not to use a safe insert.

mongodb's interactive shell

doc = {_id: 123421, name: 'test'};
// { "_id" : 123421, "name" : "test" }
db.deck.insert(doc)
db.deck.insert(doc)
// E11000 duplicate key error index: shell.deck.$_id_ dup key: { : 123421.0 }

Backup

  1. http://effectif.com/mongodb/mongo-administration

There are basically two approaches to backing up a Mongo database:

  1. mongodump and mongorestore are the classic approach. Dumps the contents of the database to files. The backup is stored in the same format as Mongo uses internally, so is very efficient. But it’s not a point-in-time snapshot.
  2. To get a point-in-time snapshot, shut the database down, copy the disk files (e.g. with cp) and then start mongod up again.

Alternatively, rather than shutting mongod down before making your point-in-time snapshot, you could just stop it from accepting writes:

db._adminCommand({fsync: 1, lock: 1})

{    "info" : "now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock",    "ok" : 1}

To unlock the database again, you need to switch to the admin database and then unlock it:

use admin
// switched to db admin
db.$cmd.sys.unlock.findOne()
// { "ok" : 1, "info" : "unlock requested" }

If you don’t switch to the admin database first you’ll get an unauthorized error:

db._adminCommand({fsync: 1, lock: 1})

{    "info" : "now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock",    "ok" : 1}

db.$cmd.sys.unlock.findOne()
// { "err" : "unauthorized" }

You can take a point in time snapshot from a slave just as easily as your master database, which avoids downtime. This is one of the reasons that running a slave is so strongly recommended…

What RAID should I use?

  1. http://www.mongodb.org/display/DOCS/Developer+FAQ

We recommend not using RAID-5, but rather, RAID-10 or the like. Both will work of course.

Replication

Do it (did you read the previous section?). Seriously.

Start your master and slave up like this:

 mongod --master --oplogSize 500 mongod --slave --source localhost:27017 --port 3000 --dbpath /data/slave

When seeding a new slave server from master use the --fastsync option.

You can see what’s going on with these two commands:

db.printReplicationInfo() # tells you how long your oplog will last
db.printSlaveReplicationInfo() # tells you how far behind the slave is

If the slave isn’t keeping up, how do you find out what’s going on? Check the mongo log for any recent errors. Try connecting with the mongo console. Try running queries from the console to see if everything is working. Run the status commands above to try and find out which database is taking up resources. If you can’t work it out hop on the IRC channel; Mathias says they’ll be very responsive.

What is the difference between MongoDB and RDBMSs

  1. http://sunoano.name/ws/mongodb.html#faqs

mysql, postgresql, ...

Server:Port- Database- Table- Row

MongoDB

Server:Port- Database- Collection- Document

The concept of server and database are very similar. But the concept of table and collection are quite different. In RDBMSs a table is a rectangle. It is all columns and rows. Each row has a fixed number of columns, if we add a new column, we add that column to each and every row.

In MongoDB a collection is more like a really big box and each document is like a little bag of stuff in that box. Each bag contains whatever it needs in a totally flexible manner (read schema-less). However, schema-less does not equal type-less i.e. it is just that any document has its own schema, which it may or may not share with any other document. In practice it is normal to have the same schema for all the documents in collection.

Clone Database

  1. http://www.mongodb.org/display/DOCS/Clone+Database

MongoDB includes commands for copying a database from one server to another.

// copy an entire database from one name on one server to another// name on another server.  omit <from_hostname> to copy from one// name to another on the same server.db.copyDatabase(<from_dbname>, <to_dbname>, <from_hostname>);// if you must authenticate with the source databasedb.copyDatabase(<from_dbname>, <to_dbname>, <from_hostname>, <username>, <password>);// in "command" syntax (runnable from any driver):db.runCommand( { copydb : 1, fromdb : ..., todb : ..., fromhost : ... } );// command syntax for authenticating with the source:n = db.runCommand( { copydbgetnonce : 1, fromhost: ... } );db.runCommand( { copydb : 1, fromhost: ..., fromdb: ..., todb: ..., username: ..., nonce: n.nonce, key: <hash of username, nonce, password > } );// clone the current database (implied by 'db') from another hostvar fromhost = ...;print("about to get a copy of database " + db + " from " + fromhost);db.cloneDatabase(fromhost);// in "command" syntax (runnable from any driver):db.runCommand( { clone : fromhost } );

What might be a use case for --fork?

The --fork switch forks a mongod process of another exiting process. One use case where this is pretty handy for example is if we SSH into a remote machine and start a MongoDB and leave right away after a new mongod got started.

The fastest way to do so would simply be to append /path/to/mongod --fork to the SSH command line since this will skip creating an intermediate shell and bring up a remote mongod right away.

Are there any Reasons not to use MongoDB?

http://sunoano.name/ws/mongodb.html#faqs

  1. We need transactions (read ACID).
  2. Our data is very relational.
  3. Related to 2, we want to be able to do joins on the server (but can not do embedded objects / arrays).
  4. We need triggers on our tables. There might be triggers available soon however.
  5. Related to 4, we rely on triggers (or similar functionality) to do cascading updates. or deletes. As for #4, this issue probably goes away once triggers are available.
  6. We need the database to enforce referential integrity (MongoDB has no notion of this at all).
  7. If we need 100% per node durability.

Use Cases

http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

It may be helpful to look at some particular problems and consider how we could solve them.

  1. if we were building Lotus Notes, we would use Couch as its programmer versioning reconciliation/MVCC model fits perfectly. Any problem where data is offline for hours then back online would fit this. In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.
  2. if we had very high performance requirements we would use Mongo. For example, web site user profile object storage and caching of data from other sources.
  3. if we were building a system with very critical transactions, such as bond trading, we would not use MongoDB for those transactions -- although we might in hybrid for other data elements of the system. For something like this we would likely choose a traditional RDBMS.
  4. for a problem with very high update rates, we would use Mongo as it is good at that. For example, updating real time analytics counters for a web sites (pages views, visits, etc.)

Generally, we find MongoDB to be a very good fit for building web infrastructure.

ruby driver of mongodb tutorial

  1. http://api.mongodb.org/ruby/1.0.3/index.html
  2. http://www.mongodb.org/display/DOCS/Ruby+Tutorial
  3. http://github.com/mongodb/mongo-ruby-driver
 gem update --system gem install mongo gem install bson gem install bson_ext

Making a Connection

# db = Mongo::Connection.new.db("shell")# db = Mongo::Connection.new("localhost").db("shell")db = Mongo::Connection.new("localhost", 27017).db("shell")

Listing All Databases

m = Mongo::Connection.new # (optional host/port args)m.database_names.each { |name| puts name }m.database_info.each { |info| puts info.inspect}

Dropping a Database

m.drop_database('database_name')

Authentication (Optional)

auth = db.authenticate(my_user_name, my_password)

Getting a Collection

deck = db.collection("deck")

Inserting a Document

deck = db['deck']doc = {"name" => "MongoDB", "type" => "database", "count" => 1, "info" => {"x" => 203, "y" => '102'}}deck.insert(doc)100.times { |i| deck.insert("i" => i) }

Finding the First Document In a Collection using find_one()

my_doc = deck.find_one()puts my_doc.inspect

Note the _id element has been added automatically by MongoDB to your document. Remember, MongoDB reserves element names that start with _ for internal use.

Counting Documents in a Collection

puts deck.count()

Using a Cursor to get all of the Documents

deck.find.each { |row| puts row.inspect }

Getting a Single Document with a Query:

deck.find("i" => 71).each { |row| puts row.inspect }

Getting a Set of Documents With a Query:

deck.find("i" => {"$gt" => 50}).each { |row| puts row }

Querying with Regular Expressions:

deck.find("question" => /w/i).each { |row| puts row }

Creating An Index:

To create an index, you specify an index name and an array of field names to be indexed, or a single field name.

deck.create_index("i")# deck.create_index([["i", Mongo::ASCENDING]])

Getting a List of Indexes on a Collection:

deck.index_information

Ruby driver example

require 'rubygems'  not necessary for Ruby 1.9require 'mongo'# 需要先导入slides.js之后,再进行下面的ruby代码测试db = Mongo::Connection.new.db("mydb")db = Mongo::Connection.new("localhost").db("mydb")m = Mongo::Connection.new("localhost", 27017)db = m.db("shell")puts db.inspectm.database_names.each { |name| puts name }m.database_info.each { |info| puts info.inspect}db.collection_names.each { |name| puts name }deck = db['deck']doc = {"name" => "MongoDB", "type" => "database", "count" => 1,"info" => {"x" => 203, "y" => '102'}}deck.insert(doc)100.times { |i| deck.insert("i" => i) }my_doc = deck.find_one()puts my_doc.inspectputs deck.count()deck.find.each { |row| puts row.inspect }deck.find("i" => 71).each { |row| puts row.inspect }deck.find("i" => {"$gt" => 90}).each { |row| puts row }deck.find("i" => {"$gt" => 20, "$lte" => 30}).each { |row| puts row }deck.find("question" => /w/i).each { |row| puts row }deck.create_index([["i", Mongo::ASCENDING]])p deck.index_informationdb.validate_collection('deck')

References

  1. http://www.mongodb.org/display/DOCS/Developer+Zone