QBoard » Artificial Intelligence & ML » AI and ML - Others » How do I run TensorFlow on Google cloud?

How do I run TensorFlow on Google cloud?

  • How do I run TensorFlow on Google cloud
      August 16, 2021 3:27 PM IST
    0
    1. Go to Cloud Shell. Open Cloud Shell.
    2. Set your default Compute Engine zone and your default project. 
    3. Create the initial VM instance from an Ubuntu Wily image
    4. Use ssh to connect to the VM
    5. Install pip 
    6. Install TensorFlow
    7. Type exit to return to Cloud Shell.
     
     
      August 17, 2021 4:25 PM IST
    0
  •   August 17, 2021 11:10 PM IST
    0
  • Step 1 : Set up a Google Cloud Account

    The first thing you need to go is setup a google cloud account.

    Go to https://cloud.google.com/ and sign in using your Gmail account. If you have a school or organization account it may lead to some collaboration issues in the future so I would strongly recommend to create an account using your personal Gmail account. If you don’t have a Gmail account, now maybe a good time to create one.

    While signing in, Google will ask you to share your credit card details. You can put them in but your credit card wont be charged unless you have run out of all your $300 worth of credits, so you need not worry.

    Step 2: Create a project

    Once you have signed in, it will take you to a console screen which should look something like this.

    If you don’t get an automatic assignment of a project, you should go ahead and create one. Click on the “Create New Project” icon and the banner and create a new project. The project ID is assigned automatically, however you can change it if you like. I decided to stay with the default project ID assigned to me.

    Step 3: Deploy Deep Learning Virtual Machine

    Now that you have an account and project, you can deploy a marketplace solution.

    Your billing will only start once you deploy the solution.

    To set up a deep learning marketplace solution, search for “Deep Learning VM” in the search bar. This should take you to the landing page for “Deep Learning VM”.

    The advantage of using Deep Learning VM is that we don’t have to install python or tensorflow since it is a part of a pre-packaged image developed by Google. Once you’re on this page, you just hit the “Launch Compute Engine” button. This page also shows the number of past deployments you have for this engine. It is 3 in my case.

    Once you launch the compute engine, you will be taken to the configuration page, where you can set a name for the environment, select the zone for the machines and select the number of CPUs and GPUs you would want.

    It is important to note the Zone you select for your deployment since the machine configuration you choose will depend on it. For example, there maybe restrictions in some zones on the number of CPUs and GPUs you can access.

    Depending on the kind of machine that you choose, you can see the billing amount on the right hand side change accordingly.

    For example, if you choose 16 CPUs and 0 GPUs, you can see that you will be charged $392.36 per month if you use 730 hours per month. If you hit “Details” it will give you the break up for the billing. Generally, GPUs are more expensive than CPUs, so if you don’t need GPUs, it is better to skip them altogether. You also need to request for GPU quota in the zone of your deployment (which I will talk about in detail in Step 6).

    For now, choose Zone: us-west1-b, 16 CPUs and GPUs as “None”. The next thing to choose is the size of your hard-drive. “Standard Persistent Disk” should be good for any project, but if you want you can expand the memory if you have a lot of data. Keep in mind larger disk sizes will lead to bigger bills, so best to be parsimonious about the requirements. They can always be modified later on if required (covered in Step 5).

    Once you have selected the config hit “Deploy”. Based on your selection, it may take 5 to 10 mins for the deployment to set up. If you get an error after deploying, check to see if you selected GPUs by mistake. If you select GPUs without having an assigned quota, it may lead to an error. Just create another deployment without any GPUs and you should be good to go.

    Now there are 3 different ways of running code on this VM. The easiest is using a GUI of Jupyter Notebook which runs on localhost:8080 on your machine. To access this, you need to install Google SDK to SSH this VM.

    You can install Google SDK here. Initialize Google SDK and connect to your google account and project that you created. Initialization options should show up automatically after installation, if it doesn’t you can run the command : gcloud init and make sure that you connect with the same email and project ID as before.

    Once you have Google SDK installed and configured, you just copy the SSH link that shows up on your deployment page and paste it on to the Google SDK. The SSH link will be under the header “ Create an SSH connection to your machine” (see image below)

    If you have successfully created a SSH connection, a PuTTY screen will pop-up (image below)

    Step 4: Access Jupyter Notebook GUI

    Once you have your SSH setup, you are just once click away from your Jupyter GUI. Go back to the deployment manager and hit the localhost:8080 button

    Voilà!! That will take you to the Jupyter Notebook instance that is deployed on 16 CPUs. You can use this like any machine.

    Additionally you can also run python batch jobs on the PuTTY terminal or by hitting the SSH terminal in the Compute Engine VM. More on that in the next step.

    Step 5: Add GPUs to Virtual Machine

    Before we add GPUs we need to request GPU quota in the same zone our instance is deployed on.

    Just search for “Quotas” in the search bar and that should take you to the Quotas page under “IAM & admin”.

    Here under “Metrics” first, select “None” and then search for GPU. Based on the GPU present in your zone you can select the name of the GPU and we also need to select “GPUs (all regions)”. For example, since our zone is us-west1-b, you can select the “NVIDIA P100 GPUs” and “GPUs (all regions)”

    Once you select both the GPUs, you hit the “Edit Quotas” button on top.

    Make sure that the GPU you select is in the same zone as your instance, or else your deployment will not be able to access it.

    This will generate a form where you need to share your personal phone number and reason for this request. As soon as you submit this form, you will get an email from Google saying that your request is under process and it will take 2–3 business days.

    Although their email says 2–3 business days, the request is approved within a couple of hours. Once your quota request is approved, you can edit your virtual machine to include more GPUs.

    Step 6: Change Virtual Machine configuration

    To add the requested GPUs, you need to edit the instance that is created on the Compute Engine page. Go to the menu on your google console and then hit “Compute Engine”.

    The VM instances pages gives a list of the VMs that we have installed across various solutions on google cloud platform. An important thing to note here is that we should stop all instances if we do not want to be billed for the machines. Even if we don’t run any code, google charges us for the instances.

    Hence it is important to stop all instances when we aren’t running anything.

    Once you have stopped the instance, you can edit it. If your quota request is approved, you should be able to add more GPUs and deploy the solution again in no time.

    Another neat trick is to “Enable connection to serial ports” and “Allow full access to cloud APIs” (under Access scopes) to enable your instance to talk to buckets and vice versa.

    Once your config has been modified by adding another GPU, you can either run a deep learning model on the Jupyter Lab UI or the PuTTY terminal. You will notice that it will be much faster as we have added GPUs to our system. This also means our bill is higher so make sure to keep checking the “Billing” page to ensure that you don’t run out of credits.

    Below is a short video on how I accessed the Jupyter Notebook GUI on the cloud to run models.

     

    Additional hacks

    Move data from bucket to VM

    You can copy data from your bucket to the instance that you just created using “gsutil” feature of GCP.

    You can either use the PuTTY terminal or the SSH on the Compute Engine to write this command.

    This post was edited by Viaan Prakash at August 26, 2021 1:56 PM IST
      August 26, 2021 1:55 PM IST
    0
  • The good news is that after the setup, you won't need to make any changes to your TensorFlow code to run it on the cloud!
    1. Create a GCP Project.
    2. Enable AI Platform Services.
    3. Create a Service Account.
    4. Download an authorization key.
    5. Create a Google Cloud Storage Bucket.
      January 29, 2022 2:51 PM IST
    0