QUESTIONS ON DOCUMENT-BASED DATABASE

Started by
2 comments, last by hplus0603 3 years, 11 months ago

hi.

im wondering to use document-based database such as mongodb or couchdb for my next project.

i searched alot on this subject that what i lose and what i get in distinguish with sql.

things that i look in document-based is:

  • i dont want to struggle with complex queries and indexing
  • speed as its said its much faster than sql
  • extending when there is no basic model for user progress

i need database for these subjects:

  • user data and progress
  • log( i need to log all important user actions for user support and basic cheat detection)
  • report and metrics(arpu,install,paidrated …)

for logging as its big amount of data, i want to know can i simply remote data for example for more than past 2 month or its a heavy task for db like mongo and does all records have the date log on them or i have to handle date on records?

as i used sql for my past project, i had to query complex metrics on database that was supporting users live. for a database with 10 million user records sometimes it toke 10 minutes to calculate paidrate that is a very long time. i know with better indexing and more efficient query it can be much less but as i noticed, sql is all about exprience. these are the reasons that make me to look at document based as an alternative for some one that is not much exprienced database programmer.

i have more questions.

  • sql supports stored procedures that are faster than basic queries because they are parsed and planed before. as o looked, mongodb has stored js. use of them for performance is a must or not?
  • i know there is no best practice for document-based databases but i know there is 2 typed of recrding. embedded and also flat records with referencing. can you tell me which one is more performant
  • can i use document-based as main database with no supporting database?
  • i use redis for leaderboard. i joined the redis result with sql to get final leaderboard data. join in sql is faster or finding 100 records in mongo one by one?
  • it said that mongo gived indexing hints based on database statistics. are these hints enough for making most performant condition?

Advertisement

Document stores make it easy to get started, and make it very hard to extend and grow a project once it's established.

Most of the “benefits” of document store databases, end up being myths in practice. Here are some examples:

  1. “Schema-less” means that you can add a new property without having to update some database schema. Instead, you need to update every single place of your code that uses this object, to pay attention to that property. “Schema-less” also means that you can remove any property you want, without updating the database – but it, again, means that you have to update all your code to now be prepared for that property not being there. “Schema-less” databases actually just mean that the schema is implicit, inside all of your code, and finding all the dependencies ends up being hard.
  2. “In memory” means that operation costs at scale are astronomical. All databases have a “long tail” of data that's no longer active, but that you don't want to delete yet. Wasting RAM on this data is dumb. Trying to solve it with virtual memory, like tools like MongoDB or Redis might do, is dumb, because that's a less efficient model than a database with a nice, unified, buffer cache for database tables. A relational database is totally fine answering queries from RAM, if you have sufficient memory in the database server.
  3. “No indexes” means that writes are fast, but it means that it's really hard to query your data for anything other than the primary key / id. You can get the same performance benefits from a relational database, by not declaring indices on your tables, because the write cost of indices comes from using the indices. The fact that a document store doesn't support secondary indices doesn't make it “faster,” it makes it “more limited.”
  4. Redis, specifically, is “fast” because it's single threaded, but that means its scalability top is very abrupt. A SQL database may be “slower” (or at least, higher latency) for the small, simple use cases, but then has a much higher scalability ceiling once your data volumes get large.
  5. MongoDB, specifically, uses memory mapped files, and does not support consistent checkpoint backups. This means that your database WILL have inconsistencies if you ever lose power or otherwise lose the database process. It's design-in to the database. If you don't care about the consistency of your data, how about just storing it in memcached? When it's gone, it's gone!
  6. “Complicated queries” is a myth. SQL is no more complicated than the Redis API, or the MongoDB query selectors. In fact, there are more, and better, tutorials for SQL than there are for Redis and Mongo. (That being said – there are also more BAD tutorials for SQL, just because it's bigger and more widely used.)
  7. MongoDB provides a “change stream” to listen in to whatever changes in the database. Unfortunately, this serializes all updates, and limits performance to a hard ceiling. You'll do better by listening to a MySQL row based replication stream (although even that is a scalability limitation.)

I have shipped applications on Redis, Scylla, and other document databases. I've also, several times, lost data to MongoDB instances in applications and services I use and manage. There are cases where Redis is the right choice, mainly in caching, There are cases where ScyllaDB will perform OK, as long as you understand the trade-offs involved. There are, as far as I can tell, no cases where MongoDB is the right choice. For my current system, I'm using Postgres/Aurora for business objects, Kafka for buffering, and Snowflake as analytics. Previously, I've used MySQL (good) Hadoop (miserable) and Spark (alright) as well as SQL Server (good) as well.

enum Bool { True, False, FileNotFound };

Regarding MongoDB, there is also this: http://jepsen.io/analyses/mongodb-4.2.6​ showing that the MongoDB transactions don't actually work in case of network failures.

If you really want to go key/value or document store, the good news is that FoundationDB has been open sourced by Apple. It used to be a commercial product. It's quite robust, has good multi-node scalability, and has a subscription/notify mechanism that scales better than Mongo. If I had to use an in-memory store, and could run it ephemerally, I'd probably use Redis. If I had to use a kv-store that's persistent with good transactions, I'd probably use FoundationDB. If I had to run a database that's the highest possible performance for “low to middle end” hardware, I'd use MySQL / MariaDB. And what I actually do right now, because my main goal is correctness and robustness, is to use Postgres on AWS Aurora.

enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement