QBoard » Big Data » Big Data - Data Storage : Hive, HBase, MongoDB, Teradata.. » Uses of NoSQL database in data science

Uses of NoSQL database in data science

  • How can NoSQL databases like MongoDB be used for data analysis? What are the features in them that can make data analysis faster and powerful?
      June 11, 2019 5:03 PM IST
    0
  • SQL (Structured Query Language) is used by most relational database managements systems to manage databases that store data in tabular form. NoSQL refers to non-SQL or non-relational database design. It still provides an organized way of storing data but not in tabular form.



    The common structures adapted by NoSQL databases to store data are key-value pairs, wide column, graph, or document. There are several NoSQL databases used in the data science ecosystem. In this article, we will be using one of the popular ones which is MongoDB. MongoDB stores data as documents
      August 10, 2021 3:00 PM IST
    0
  • To be perfectly honest, most NoSQL databases are not very well suited to applications in big data. For the vast majority of all big data applications, the performance of MongoDB compared to a relational database like MySQL is significantly is poor enough to warrant staying away from something like MongoDB entirely.

    With that said, there are a couple of really useful properties of NoSQL databases that certainly work in your favor when you're working with large data sets, though the chance of those benefits outweighing the generally poor performance of NoSQL compared to SQL for read-intensive operations (most similar to typical big data use cases) is low.

    No Schema - If you're working with a lot of unstructured data, it might be hard to actually decide on and rigidly apply a schema. NoSQL databases in general are very supporting of this, and will allow you to insert schema-less documents on the fly, which is certainly not something an SQL database will support.
    JSON - If you happen to be working with JSON-style documents instead of with CSV files, then you'll see a lot of advantage in using something like MongoDB for a database-layer. Generally the workflow savings don't outweigh the increased query-times though.
    Ease of Use - I'm not saying that SQL databases are always hard to use, or that Cassandra is the easiest thing in the world to set up, but in general NoSQL databases are easier to set up and use than SQL databases. MongoDB is a particularly strong example of this, known for being one of the easiest database layers to use (outside of SQLite). SQL also deals with a lot of normalization and there's a large legacy of SQL best practices that just generally bogs down the development process.
    Personally I might suggest you also check out graph databases such as Neo4j that show really good performance for certain types of queries if you're looking into picking out a backend for your data science applications.
      June 11, 2019 5:07 PM IST
    0
  • Consider, try, and perhaps even use multiple databases. It's not just a "performance" issue at play here. It's really going to come down to your requirements. How much data are you talking about? what kind of data? how fast do you need it? Are you more read heavy or write heavy?

    Here's one thing you can't do in a SQL database: Calculate sentiment. http://www.slideshare.net/shift8/mongodb-machine-learning

    Of course the speed in that case may not be fast enough for your needs, but it is something that's possible. With some caching of specific aggregate values, it was quite acceptable even. Why would you do this? Convenience.

    Convenience really is something that you're going to be persuaded by. That's exactly why (in my opinion) NoSQL databases were created. Performance too of course, but I'm trying to discount benchmarks and focus more on other concerns.

    MongoDB (and some other NoSQL) databases have some very powerful features such as built-in map/reduce. This could result in a savings both in cost and time over using something like Hadoop. Or it could provide a prototype or MVP to launch a larger business.

    What about graph databases? They're "NoSQL" too. Look at databases like OrientDB. If you want to argue performance ...I don't think you're gonna show me a SQL database that's faster there =) ...and graph databases have some really amazing application based on what you need to do.

    Rule of technology (and the internet) don't get too comfortable with one thing. You're gonna be limited and set yourself up for failure.

      January 15, 2022 1:03 PM IST
    0
  • One benefit of the schema-free NoSQL approach is that you don't commit prematurely and you can apply the right schema at query time using an appropriate tool like Apache Drill. See this presentation for details. MySQL wouldn't be my first choice in a big data setting.

     
      January 17, 2022 1:55 PM IST
    0