Search code examples
python-3.xhadoopanacondahortonworks-data-platform

How to Enable Python3 Support on HDP 2.6


HDP 2.6 (HortonWorks Data Platform) does not support Python3. While Python3 (or Anaconda3) are highly appreciated by many Data Scientists.

How to enable Python3 support on HDP 2.6?


Solution

  • The restriction are in few files

    • /usr/bin/hdp-select
    • /etc/hadoop/conf/topology_script.py

    The 2to3 application can be used to convert Python file to support Python3.

    2to3 -w /usr/bin/hdp-select
    2to3 -w /etc/hadoop/conf/topology_script.py
    

    Change a little bit on /etc/hadoop/conf/topology_script.py to support both Python2 & Python3.

    While after these changes, knox are not able to restart.

    After doing investigation, although the change makes python code runnable on both Python2 & Python. The behavior is slightly different.

    The following command will be executed during Knox restart

    ambari-python-wrapper /usr/bin/hdp-select packages
    

    The original script output will be something like

    Packages:
      accumulo-client
      accumulo-gc
    
    ...
    

    After 2to3 change, the script output will be something like

    Packages:
    (' ', 'accumulo-client')
    (' ', 'accumulo-gc')      
    

    The two output are certain different. And HDP using these stdout as the interface.

    The effect can be nailed down to the following code. The two print statements have different output.

    pkg = "knox-server"
    print " ", pkg
    
    print(" ", pkg)
    

    Change the print statement to print(" ", pkg) fixed this problem