Apache Zeppelin

In this post I would like to show you a tool called Apache Zeppelin.

Zeppelin is an Apache project meant to be a notebook live environment that can run multiple languages like Python and Scala or tools like Apache Spark, Ignite etc. each one of these ability is called Interpreter. In there own words - "Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more".

Running Apache Zeppelin is very simple, just download the zipped binaries from the download section of the Zeppelin project page - here.
On Linux, in terminal from the Zeppelin home folder, run:
$ zeppelin-0.8.2-bin-all ./bin/zeppelin.sh   
This will start the web server that is the GUI for Zeppelin.

The main screen shows you the available notebooks.

Lets create a new notebook by clicking Create new note. After entering the notebook name and pressing Create.


Alternatively you can click on one of the existing notebooks to open and displays.


A notebook is a set of paragraphs that can use different interpreters:

Python 

The line starts with % sign indicates what interpreter Zeppelin will use to run this code. For example %python will tell Zeppelin to execute the following code using Python interpreter. Lets try to add the code:
%python
print("hello world")
Pressing the play button or Shift+Enter will execute the paragraph and display it output bellow it.

PySpark

Create a new paragraph and enter the following code:
%pyspark
list_of_persons = [('Arike', 28, 78.6), ('Bob', 32, 45.32), ('Corry', 65, 98.47)]
df = sc.parallelize(list_of_persons).toDF(['name', 'age', 'score'])
df.printSchema()
df.show()
The %pyspark will start a spark session in Spark standalone (supplied with Zeppelin) or in Spark cluster if you specify one and then execute the code.

Angular 

Another example for a useful interrupter is %angular that render angular code.
Modified from: https://www.w3schools.com/angular/ 

These are just the very basics of Apache Zeppelin, there are additional interrupters and more features. That we explore in next posts.

Comments

Popular posts from this blog

5 ways to calculate max of row in Spark

Create Custom Datasource With Spark 3 - Part 1

Spark - How to read a JSON file