Spark - Get Spark Session

In order to start any Spark job, you must get a SparkSession object. It is the main object to interface with when writing Spark code.

You can create a Spark session by following the below code.
  • The master method sets the Spark master URL. To run Spark locally use 'local'. If you wish to specify the number of core Spark can use add it in square brackets (local[4] uses 4 cores).
  • appName a name that will be shown in the Spark web UI can be any string but it should be something that identifies the job.
  • The getOrCreate method creates a Spark session or gets one if it already exists.
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .master('local') \
    .appName('Your App Name') \
    .getOrCreate()

Once you have the spark session object you can start working with it.
For example, creating a DataFrame from a list of values or reading CSV files.
l = [
    (1, 'hello', 123.33),
    (2, 'beautiful', 622.3),
    (3, 'world', 72.2)
]
df = spark.createDataFrame(l)
df.show()

# data.csv file is located one folder above this script therefor the .. in the file path
df = spark.read.csv('../data.csv', header=True)
df.show()

And the output should like:
+---+---------+------+
| _1|       _2|    _3|
+---+---------+------+
|  1|    hello|123.33|
|  2|beautiful| 622.3|
|  3|    world|  72.2|
+---+---------+------+

+------+---+----------+----------+--------------------+-----+
|  Name|Age|Birth Date|        Id|             Address|Score|
+------+---+----------+----------+--------------------+-----+
| Arike| 28|1991-02-14|3698547891|New York, NY 1002...| 78.6|
|   Bob| 32|1987-06-07|6984184782|1843-1701 S Osage...|45.32|
| Corry| 65|1954-12-26|9782472174|R. Olavo Bilac, 1...|98.47|
| David| 18|2001-10-22|2316324177|20-16 B5036, Wirk...| 3.aN|
+------+---+----------+----------+--------------------+-----+



Comments

Popular posts from this blog

5 ways to calculate max of row in Spark

Create Custom Datasource With Spark 3 - Part 1

Spark - How to read a JSON file