QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » TensorFlow: InternalError: Blas SGEMM launch failed

TensorFlow: InternalError: Blas SGEMM launch failed

  • When I run sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) I get InternalError: Blas SGEMM launch failed. Here is the full error and stack trace:
    InternalErrorTraceback (most recent call last)
    <ipython-input-9-a3261a02bdce> in <module>()
          1 batch_xs, batch_ys = mnist.train.next_batch(100)
    ----> 2 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    
    /usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
        338     try:
        339       result = self._run(None, fetches, feed_dict, options_ptr,
    --> 340                          run_metadata_ptr)
        341       if run_metadata:
        342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
    
    /usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
        562     try:
        563       results = self._do_run(handle, target_list, unique_fetches,
    --> 564                              feed_dict_string, options, run_metadata)
        565     finally:
        566       # The movers are no longer used. Delete them.
    
    /usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
        635     if handle is None:
        636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
    --> 637                            target_list, options, run_metadata)
        638     else:
        639       return self._do_call(_prun_fn, self._session, handle, feed_dict,
    
    /usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
        657       # pylint: disable=protected-access
        658       raise errors._make_specific_exception(node_def, op, error_message,
    --> 659                                             e.code)
        660       # pylint: enable=protected-access
        661 
    
    InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]
    Caused by op u'MatMul', defined at:
      File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py", line 3, in <module>
        app.launch_new_instance()
      File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 596, in launch_instance
        app.start()
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 442, in start
        ioloop.IOLoop.instance().start()
      File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 162, in start
        super(ZMQIOLoop, self).start()
      File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 883, in start
        handler_func(fd_obj, events)
      File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
        return fn(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
        self._handle_recv()
      File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
        self._run_callback(callback, msg)
      File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
        callback(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
        return fn(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
        return self.dispatch_shell(stream, msg)
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
        handler(stream, idents, msg)
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 391, in execute_request
        user_expressions, allow_stdin)
      File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 199, in do_execute
        shell.run_cell(code, store_history=store_history, silent=silent)
      File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2723, in run_cell
        interactivity=interactivity, compiler=compiler, result=result)
      File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2825, in run_ast_nodes
        if self.run_code(code, result):
      File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2885, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-4-d7414c4b6213>", line 4, in <module>
        y = tf.nn.softmax(tf.matmul(x, W) + b)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1036, in matmul
        name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 911, in _mat_mul
        transpose_b=transpose_b, name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
        self._traceback = _extract_stack()

    Stack: EC2 g2.8xlarge machine, Ubuntu 14.04

      December 17, 2020 3:03 PM IST
    0
  • For me, I got this error when using Keras, and Tensorflow was the the backend. It was because the deep learning environment in Anaconda was not activated properly, as a result, Tensorflow didn't kick in properly either. I noticed this since the last time I activated my deep learning environment (which is called dl), the prompt changed in my Anaconda Prompt to this:

    (dl) C:\Users\georg\Anaconda3\envs\dl\etc\conda\activate.d>set "KERAS_BACKEND=tensorflow"

    While it only had the dl before then. Therefore, what I did to get rid of the above error was to close my jupyter notebook and Anaconda prompt, then relaunch, for several times.
      August 28, 2021 1:07 PM IST
    0
  • Old question, but may help others.
    Try to close interactive sessions active in other processes (if IPython Notebook - just restart kernels). This helped me!

    Additionally, I use this code to close local sessions in this kernel during experiments:
    if 'session' in locals() and session is not None:
        print('Close interactive session')
        session.close()
      December 23, 2020 12:26 PM IST
    0
  • I encountered this error after changing OS to Windows 10 recently, and I never encountered this before when using windows 7.

    The error occurs if I load my GPU Tensorflow model when an another GPU program is running; it's my JCuda model loaded as socket server, which is not large. If I close my other GPU program(s), this Tensorflow model can be loaded very successfully.

    This JCuda program is not large at all, just around 70M, and in comparison this Tensorflow model is more than 500M and much larger. But I am using 1080 ti, which has much memory. So it would be probably not an out-of-memory progblem, and it would perhaps be some tricky internal issue of Tensorflow regarding OS or Cuda. (PS: I am using Cuda version 8.0.44 and haven't downloaded a newer version.)

      September 16, 2021 1:33 PM IST
    0
  • In my case, it is enough to open the Jupyter Notebooks in separate servers.

    This error only occurs with me if I try using more than one tensorflow/keras model in the same server. It doesn't matter if open one notebook, execute it, than close and try opening another. If they are being loaded in the same Jupyter server the error always happens.

      September 17, 2021 1:22 PM IST
    0