Installation Steps

Tutorial: Apache Spark Development Setup on Ubuntu/Windows (Eclipse, Git & Maven)

Apache Spark Development Setup on Ubuntu/Windows (Eclipse, Git & Maven)
Linux
Windows

This guide explains how to set up an Apache Spark development environment on Linux using Eclipse IDE, Git, and Maven. By the end, you’ll be able to clone, build, and run Spark projects directly from Eclipse.

1. Verify Spark Installation

Run a built-in Spark example to ensure Spark is installed correctly:

spark-submit --class org.apache.spark.examples.SparkPi \
  --master local[*] \
  ~/spark-3.5.6-bin-hadoop3/examples/jars/spark-examples_2.12-3.5.6.jar 100

2. Install Git and Maven

Update your system and install Git + Maven:

sudo apt update
sudo apt install -y git maven

3. Download and Install Eclipse IDE

  • Download Eclipse IDE for Java Developers: Eclipse Downloads
  • Extract the downloaded .tar.gz file:
tar -xvzf eclipse-java-*-linux-gtk-x86_64.tar.gz
mv eclipse ~/eclipse
  • Run Eclipse:
~/eclipse/eclipse

4. Clone a Sample Spark Project

Use Git to clone an example Spark project:

git clone https://github.com/jgperrin/net.jgp.books.spark.ch01.git

5. Import Project into Eclipse

  • Open Eclipse.
  • Select File → Import → Existing Maven Project.
  • Browse to the folder you cloned with Git.
  • Eclipse will detect the pom.xml and configure Maven dependencies.

6. Build with Maven

From terminal (inside the project folder):

mvn clean install

7. Run Spark Applications

Once built, you can run applications using spark-submit:

spark-submit --class <YourMainClass> target/<your-jar-file>.jar

You now have a complete Spark development setup on Linux with Eclipse, Git, and Maven 🚀

This guide explains how to set up an Apache Spark development environment using Eclipse IDE, Git, and Maven. It covers both Windows and Linux setups.

1. Run a Spark Example (Quick Test)

Before setting up the development environment, verify your Spark installation by running an example job:

spark-submit --class org.apache.spark.examples.SparkPi --master local[*] C:\Users\sthithapragna\spark-3.5.6\examples\jars\spark-examples_2.12-3.5.6.jar 100

2. Install Eclipse IDE

  • Download Eclipse IDE from Eclipse Downloads.
  • Choose the “Eclipse IDE for Java Developers” package.
  • Install and launch Eclipse.

3. Install Chocolatey (Windows Only)

Chocolatey is a package manager for Windows.

# Open PowerShell as Administrator
Set-ExecutionPolicy Bypass -Scope Process
Set-ExecutionPolicy Bypass -Scope Process -Force; `
  [System.Net.ServicePointManager]::SecurityProtocol = `
  [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; `
  iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

4. Install Maven (Windows)

Using Chocolatey:

choco install maven

5. Install Git (Windows)

  • Download Git from Git Downloads.
  • Install Git with default settings.

6. Clone Spark Example Project

Use Git to clone a sample Spark project:

git clone https://github.com/jgperrin/net.jgp.books.spark.ch01.git

Comments

No comments yet. Be the first!

You must log in to comment.