QBoard » Supporting Tech Stack » Cloud » Install BigDL in Data Science Experience on Cloud

Install BigDL in Data Science Experience on Cloud

  • I would like to use Intel BigDL in notebooks on Data Science Experience on Cloud.

    How can I install it?

      August 31, 2021 12:19 PM IST
    0
  •   December 29, 2021 1:05 PM IST
    0
  • If your notebooks are backed by an Apache Spark as a Service instance in DSX, installing BigDL is simple. But you have to collect some version information first.
    1. Which Spark version? Currently, 2.1 is the latest supported by DSX.
      With Python, you can only install BigDL for one Spark version per service.
    2. Which BigDL version? Currently, 0.3.0 is the latest, and it supports Spark 2.1.
      If in doubt, check the download page. The Spark fixlevel does not matter.

    With this information, you can determine the URL of the required BigDL JAR file in the Maven repository. For the example versions, BigDL 0.3.0 with Spark 2.1, the download URL is
    https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-SPARK_2.1/0.3.0/bigdl-SPARK_2.1-0.3.0-jar-with-dependencies.jar

    For other versions, replace 0.3.0 and 2.1 in that URL as required. Note that both versions appear twice, once in the path and once in the filename.

    Installing for Python

    You need the JAR, and the matching Python package. The Python package depends only on the version of BigDL, not on the Spark version. The installation steps can be executed from a Python notebook:

    1. Install the JAR.

      !(export sv=2.1 bv=0.3.0 ; cd ~/data/libs/ && wget  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-SPARK_${sv}/${bv}/bigdl-SPARK_${sv}-${bv}-jar-with-dependencies.jar)
      

      Here, the versions of Spark (sv) and BigDL (bv) are defined as environment variables, so you can easily adjust them without having to change the URL.

    2. Install the Python module.

      !pip install bigdl==0.3.0 --no-deps | cat
      

      If you want to switch your notebooks between Python versions, execute this step once with each Python version. (Without --no-deps, a conflicting version of pyspark would be installed.)

    After restarting the notebook kernel, BigDL is ready for use.

    (Not) Installing for Scala

    If you install the JAR as described above for Python, it is also available in Scala kernels.

    If you want to use BigDL exclusively with Scala, better not install the JAR at all. Instead, use the %AddJar magic at the beginning of the notebook. It's best to do this in the very first code cell, to avoid class loading issues.

    %AddJar https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-SPARK_2.1/0.3.0/bigdl-SPARK_2.1-0.3.0-jar-with-dependencies.jar
    

    By not installing the JAR, you gain the flexibility of using different versions of Spark and BigDL in different Scala notebooks sharing the same service. As soon as you install a JAR, you're likely to run into conflicts between that one and the one you pull in with %AddJar.

    Cleanup

    If you want to install a different version of BigDL, you'll have to clean up first. Here are commands to check what is installed, and to get rid of it. Execute these commands from a Python notebook.

    • Check what JAR is installed. If the output is empty, none is installed.

      !find ~/data/libs -name bigdl-\*
      
    • Check what Python module is installed. If the output is empty, BigDL is not installed.

      !pip freeze | grep -i BigDL
      
    • Remove installed BigDL JARs.

      !find ~/data/libs -name bigdl-\* -exec rm -vf {} +
      
    • Remove the installed BigDL Python module for the current Python version.

      !rm -rf ~/.local/lib/python${_py_version_}/site-packages/{bigdl,BigDL}*
      

      If re-installation fails with a "multiple dist-info directories" message, also execute:

      !rm -rf $PIP_BUILD
      September 7, 2021 5:00 PM IST
    0