Spark Standalone I will explain how to setup properly Spark Standalone mode. Sometimes we want to develop POCs or try small changes and we don’t want to try it on production or wait until release them. For that case and learning purposes I prefer to try Spark staff in my laptop. First of all, download Latests/Desired stable version from Spark – Downloads. Here I will use pre-build version for Hadoop. Once we download it we can proceed to setup the environment.
#- Create a Hadoop Folder in your $HOME to group all tools that you setup.
#- Copy from Download folder to Hadoop folder
cp $HOME/Downloads/spark-1.6.1-bin-hadoop2.6.tgz $HOME/hadoop
#- Uncompress tgz
tar -zxf $HOME/hadoop/spark-1.6.1-bin-hadoop2.6.tgz -C $HOME/hadoop
#- Create a link to point Spark current folder, so we don't need to know which version is running to execute commands.
ln -s $HOME/hadoop/spark-1.6.1-bin-hadoop2.6/ $HOME/hadoop/spark
#- Configure logs
cp $HOME/hadoop/spark/conf/log4j.properties.template $HOME/hadoop/spark/conf/log4j.properties
#- Configure properties, by default spark-defaults.conf comes empty.
cat <<EOT >> $HOME/hadoop/spark/conf/spark-defaults.conf
#- Create logs folder
mkdir -p $HOME/hadoop/spark/logs/events