gcloud ai-platform jobs submit training JOB1
--module-name=trainer.cnn_with_keras
--package-path=./trainer
--job-dir=gs://mykerasstorage
--region=europe-north1
--config=gs://mykerasstorage/trainer/cloudml-gpu.yaml
But I get errors, first the cloudml-gpu.yaml file can't be found, it says "no such folder or file", and trying to just remove it, I get errors because it says the --init--.py file is missing, but it isn't, even if it is empty (which it was when I downloaded from the tutorial GitHub). I am Guessing I haven't uploaded it the right way.
Any suggestions of how I should do this? There is really no info on this in the tutorial itself.
I have read in another guide that it is possible to let gcloud package and upload the job directly, but I am not sure how to do this or where to write the commands, in my terminal with gcloud
command? Or in the Cloud Shell in the browser? And how do I define the path where my python files are located?
Should mention that I am working with Mac, and pretty new to using Keras and Python.
try:
if os.stat(URLFilePath + URLFile).st_size > 0:
print "Processing..."
else:
print "Empty URL file ... exiting"
sys.exit()
except OSError:
print "URL file missing ... exiting"
sys.exit()
gsutil mb -l europe-north1 gs://keras-cloud-tutorial
gcloud init
. However, you can submit the job from the Cloud Shell too, if you download the needed files there. The only files we need from the repository are the trainer
folder and the setup.py
file. So, if we put them in a folder named keras-cloud-tutorial
we will have this file structure:keras-cloud-tutorial/ ├── setup.py └── trainer ├── __init__.py ├── cloudml-gpu.yaml └── cnn_with_keras.py
ImportError: No module named eager
error is that you might have changed the runtimeVersion
inside the cloudml-gpu.yaml
file. As we can read here, eager
was introduced in Tensorflow 1.5. If you have specified an earlier version, it is expected to experience this error. So the structure of cloudml-gpu.yaml
should be like this:trainingInput: scaleTier: CUSTOM # standard_gpu provides 1 GPU. Change to complex_model_m_gpu for 4 GPUs masterType: standard_gpu runtimeVersion: "1.5"
setup.py
file should look like this:from setuptools import setup, find_packages setup(name='trainer', version='0.1', packages=find_packages(), description='Example on how to run keras on gcloud ml-engine', author='Username', author_email='user@gmail.com', install_requires=[ 'keras==2.1.5', 'h5py' ], zip_safe=False)
I got it to work halfway now by not uploading the files but just running the upload commands from cloud at my local terminal... however there was an error during it running ending in "job failed"
Seems it was trying to import something from the TensorFlow backend called "from tensorflow.python.eager import context" but there was an ImportError: No module named eager
I have tried "pip install tf-nightly" which was suggested at another place, but it says I don't have permission or I am loosing the connection to cloud shell(exactly when I try to run the command).
I have also tried making a virtual environment locally to match that on gcloud (with Conda), and have made an environment with Conda with Python=3.5, Tensorflow=1.14.0 and Keras=2.2.5, which should be supported for gcloud.
The python program works fine in this environment locally, but I still get the (ImportError: No module named eager) when trying to run the job on gcloud.
I am putting the flag --python-version 3.5 when submitting the job, but when I write the command "Python -V" in the google cloud shell, it says Python=2.7. Could this be the issue? I have not fins a way to update the python version with the cloud shell prompt, but it says google cloud should support python 3.5. If this is anyway the issue, any suggestions on how to upgrade python version on google cloud?
It is also possible to manually there a new job in the google cloud web interface, doing this, I get a different error message: ERROR: Could not find a version that satisfies the requirement cnn_with_keras.py (from versions: none) and No matching distribution found for cnn_with_keras.py. Where cnn_with_keras.py is my python code from the tutorial, which runs fine locally.
Really don't know what to do next. Any suggestions or tips would be very helpful!