QBoard » Big Data » Big Data on Cloud » Docker data volumes and scaling in a distributed system

Docker data volumes and scaling in a distributed system

  • Docker data volumes is living within the host or boot2docker on the local VM(boot2docker).

    Having big data from mongoDB running in a data container and mongoDB in another seems to be the way. Will this scale on Google Cloud Engine or Azure Virtuel Machines or other. I mean if all this is running within ONE Virtuel Machine, like boot2docker or other in the cloud. Normally you would scale VM's by creating new instances of VM's but how is this possible with Docker?

    Sorry to ask this on StackExchange, but there is no category on dba - but StackExchange has a Category for Docker.

      October 5, 2021 1:25 PM IST
    0
  • To use a data volume in Docker, you first need to create a container to host the volume. This is pretty basic. Just use a command like:

    docker create -v /some/directory mydatacontainer debian
    ​

     

    This command tells Docker to create a new container named mydatacontainer based on the Debian Docker image. (You could use any of Docker’s other OS images here, too.) Meanwhile, the -v flag in the command above sets up a storage container in the directory /some/directory inside the container.

    To repeat: That means the data is stored at /some/directory inside the container called mydatacontainer — not at /some/directory on your host system.

    The beauty of this, of course, is that we can now write data to /some/directory inside this container, and it will stay there as long as the container remains up.

      October 20, 2021 1:02 PM IST
    0
  • In production you would have a large number of (virtual) machines each running docker. To scale mongodb you would have multiple pairs of {mongodb, mongodb-data} containers, where each pair is running on the same machine (required for sharing volumes).
    The problems you need to solve:
    1) configuring mongodb in a way that makes sense for scaling purposes (sharding, replica pools ,etc)
    2) firewall permissions, ip addresses, ports etc to allow docker containers to talk to eachother across hosts.
    Docker doesn't solve 1) for you because its application specific. Mongodb will have a different way of doing this than say, couchbase. 2) is not solved by Docker yet but I think it might someday.
      October 6, 2021 9:27 AM IST
    0
  • You can use rancher to solve 2., as it automatically manage IP addresses using IpSec private network. I guess kubernetes also solve problem no. 2.

     
      October 7, 2021 12:50 PM IST
    0