Leveraging MongoDB - NoSQL Database for Large Scale High Performance Applications

With the emergence of new data sources such as social media, mobile applications, and sensor-equipped “Internet of Things” networks, enterprises applications today are adopting disruptive technologies to manage the higher volume of information; specially when the data itself arrives at much faster rates and is more complex and dynamic than existing transactional sources.

The scalable storage, powerful data processing and embedded analytics engines allows you to easily access, enrich and analyze variety and volume of big data to deliver real-time insights into areas such as operational performance, customer satisfaction, and competitor behavior.

The dominant back end of enterprise applications has been a rational database, whether Oracle, SQL server or DB2. But, many things have changed since the advent of rational databases such as:

Scalability: Scaling write operations are hard, expensive, or impossible. Vertical scaling (or upgrading equipment) is either limited or very expensive. Unfortunately, this is often the only possible way you can scale. Horizontal scaling (or adding new nodes to the cluster) is either unavailable or you can only implement it partially.

Flexibility: For RDBMS, schema is created together with the database and you may need significant time and effort to change this structure.

With In-memory solutions such as NoSQL databases, systems will optimize themselves for flexible schema design, lightning-fast data processing, while being equally good at compressed data storage and updates.

MongoDB, one of the leading document oriented NoSQL databases, is well known for its fast performance, flexible schema, scalability and great indexing capabilities. At the core of this fast performance lie MongoDB indexes, which supports efficient execution of queries by avoiding full-collection scans and hence limiting the number of documents searches in MongoDB.

When should we really use MongoDB?

We know that consistency and isolation are very valuable ACID properties. But latency, high speed data writes, availability and not losing data, even if our primary server goes down are also important. These changing requirements lead us to different tradeoffs while designing database architecture for next generation applications, which requires:

High write load: - MongoDB is preferable during high insert rate over transactional applications. If you need to load tons of data lines with a near to real time data, MongoDB should fit. But, if your application has $1M transactions processing, you need to carefully use it with additional “consistency triggers”.

Horizontal Scalability and High availability:  With MongoDB, you can easily build a clustered topology with replication and sharding to increase availability and consistency. Moreover, recovery from a node (or a data center) failure is instant, safe and automatic. 

Schema is Not Stable: Adding new columns to RDBMS can lock the entire database, or create a major load and performance degradation in other. Usually it happens when table size is larger than 1GB. As MongoDB is schema-less, adding a new field, does not affect old rows (or documents) and will be instant, also you do not need a DBA to modify your schema when application changes.

Eventual consistency:  If you need to load tons of data lines with a low business value for each one, MongoDB should fit.

Location based Query: MongoDB is a great choice if you have to develop an app that has location based features like location tracking, location based event tracking etc. MongoDB is among the only NoSQL databases with geo-spatial features.

MongoDB for Global e-Learning Platform

To demonstrate the aspects of leveraging your RDBMS solution with a NoSQL platform lie MongoDB, let’s take a look at this case study of an e-learning company which uses MongoDB to augment RDBMS.

Here are some of the important operational aspects where the use of MongoDB increases efficiency, adds features and improves overall capacity of the platform –

  • Storage Engine - Using the WiredTiger as a storage engine of MongoDB, the database application can achieve ten times better output with the ability to saturate all present CPU cores. There is also scope for up to 90% data compression using the relevant compression module. The upgrade is completely backward compatible and there needs to be no server downtime while deploying the upgrade.
  • Schema Design – By studying data types and common queries, it is possible to convert your RDBMS schema to a MongoDB schema. With careful tuning and de-normalization of collections, you can get optimum performance. Query optimization can also be carried out via MongoDB profiler and explain method, and transaction support can be implemented using two-phase commits.
  • Data Migration – Using a JAVA based custom ETL tool, an interface can be established between your RDBMS platform and MongoDB. Using migration scripts, data can be fetched from SQL and fed into MongoDB collections in an efficient, fault tolerant manner. Hosting the RDBMS on read optimized instance can further expedite the migration process.
  • Sharding – The use of “tag aware” sharding with the ideal shard key can bring geographical awareness to your data.

Post the implementation of MongoDB, performance tests showed response time came down to about 3 seconds on all parameters. While it's a 75% increase in performance with portal dashboard operation, searches occurred 90% faster.

Few of the CIGNEX Datamatics' MongoDB implementations are:

  • US based start-up – Building a social listening platform to leverage social media and unstructured data analytics for collecting supporting evidences
  • GPS solutions provider - Real time analysis of data  accumulated from 200,000+ GPS based devices (Internet of Things) to provide operation intelligence and efficiencies for construction and agriculture verticals
  • World’s largest chemical company - Patent search solution resulting into 10x increase in performance and 20x reduction in TCO

Summary

Overall, MongoDB provides up to ten times better performance, reduces your storage requirements by 80% with compression algorithms and can reduce operational overhead up to a whooping 95% processing time.

With rich document model, powerful analytical capabilities over high volumes of multi-structured data sets, and broadest integration with leading BI and analytics tools, MongoDB provides a foundation to evolve data visualization to support real-time analytics for big data applications. Learn more about our MongoDB Consulting Services and connect with us today to supercharge your business operations with higher speed, better efficiency and improved productivity.