Distributed data warehouse in the concept of Data Lake: installation CDH
We continue to share our experience in organizing the data warehouse, which we began to talk about in previous post . This time we want to talk about how we solved the tasks of installing CDH.
) Www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html ). In this case, you need to download the Spark 2 CSD file (available on the Version and Packaging Information page - cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html ), Install it on the host with Cloudera Manager and reload the last one. We do this - download this file, transfer it to the desired host and execute the commands from the instruction:
mv SPARK2_ON_YARN-???.cloudera1.jar /opt /cloudera /csd /
chown cloudera-scm: cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-???.cloudera1.jar
chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-???.cloudera1.jar
systemctl restart cloudera-scm-server
When Cloudera Manager rises, everything will be ready to install Spark 2. On the main screen, click the arrow to the right of the cluster name and select "Add Service" drop-down menu:
In the list of services available for installation, select the one that we need:
On the next tab, select the dependency set for the new service. For example, one where the list is wider:
Next is the tab with the choice of roles and hosts to which they will be installed, similar to the one that was during the CDH installation. It is recommended to put the History Server role in a single instance on one of the Master Nodes, and Gateway on all cluster servers:
After choosing the roles, it is suggested to check and confirm the changes made to the cluster during the installation of the service. Here you can leave everything as default:
Confirmation of changes starts installation of service in a cluster. If everything is done correctly, the installation will be completed successfully:
Congratulations! Spark 2 was successfully installed in the cluster:
To complete the installation, you must restart the cluster. After that, everything is ready to go.
At the stage of installing the service, errors can occur. For example, if you installed on one of the environments, you could not deploy the Spark 2 Gateway role. This problem was solved by copying the contents of the file /var /lib /alternatives /spark2-conf from the host on which this role was successfully installed to a similar file of the problem machine. To diagnose installation errors, it is convenient to use the log files of the corresponding processes that are stored in the /var /run /cloudera-scm-agent /process /folder.
On this for today everything. In the next post of the series, the topic of administering the CDH cluster will be revealed.
It may be interesting