Hadoop, BigData, Cloud Computing .... these are the Buzz words these days.
Earlier for BigData or Hadoop people used to work on servers.
Now a days there are many cloud services available. aws is one among them.
EMR(Elastic Map Reduce) is the service provided by aws for Hadoop. In this all the platforms and software's required for Hadoop are available.
In this blog we are going to see how to upgrade python version.
Background:
Currently we are having python 2 and Python 3 in market. We know Python 4 is going to come to market very soon. Most of the organizations are using Python 3 these days. When we create a EMR cluster it comes with Python 2 and Python 3. But the default version used by Spark is Python 2.
How to check the Python Version used by Spark?
By running the following command in the console we can see the python version used by spark.
command : "pyspark"
from the picture we can see that default version is python 2.7
Now we see what versions of python are available in EMR by using the following steps:
first we go to the /usr/bin folder by using following command. generally this is where all the software are available.
command: "cd /usr/bin"
we do the list command it shows all the software's available at the folder. We see all the python versions here.
command : "ls"
In this blog I want to upgrade to Python 3 so I am checking whether that version is available or not.
I can see Python 3 here..
What to do now...?
We just need to change the path in environment variable to Python 3.
for that aws provided Document.
you can follow my Video for up-gradation.
commands: sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
we can check the updated python version by using pyspark command
Now we upgrade the all packages by using following command.
command: "sudo yum update"
if the above command works then it is fine. If it throws following error then we need to make some changes.
ERROR:
13 packages excluded due to repository priority protections
No packages marked for update
If the above Error comes then go to the /etc/yum/pluginconf.d and edit the priorities.conf file. The steps are as follows.
commands:
cd /etc/yum/pluginconf.d
sudo vi priorities.conf
Edit: enable=0
after this try to update all software by using sudo yum update
Now all the packages are updated... :)
Next thing we need to do is Update Pip and pandas.
we do that by following commands
Updating PIP : sudo python3 -m pip install --upgrade pip
Updating Pandas : sudo python3 -m pip install --upgrade pandas -t
/usr/lib/python3.6/dist-packages/
Now our EMR with upgraded python version is ready to use...
Happy Learning :)
0 Comments