Tutorial: Apache Spark + (R, Scala, IntelliJ & Zeppelin) on Ubuntu & Windows
This guide covers installing and running R with SparkR, Scala with sbt, IntelliJ IDEA, and Apache Zeppelin on Ubuntu.
1. R on Ubuntu
Install R and development tools:
sudo apt update
sudo apt install --yes r-base r-base-dev
Start R from terminal:
R
Using SparkR
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
# Initialize Spark session
sparkR.session(master = "local[*]", appName = "RWithSpark")
# Test DataFrame API
df <- as.DataFrame(faithful)
head(df)
# Stop session
sparkR.session.stop()
2. Scala on Ubuntu
Install Scala with Coursier:
curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
Add to ~/.bashrc:
export PATH="$HOME/.local/share/coursier/bin:$PATH"
source ~/.bashrc
Verify installation:
scala -version
sbt --version
Create a Spark Project with Scala
mkdir ~/spark-scala-example
cd ~/spark-scala-example
Create build.sbt:
name := "spark-scala-example"
version := "0.1"
scalaVersion := "2.12.18"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.1" % "provided",
"org.apache.spark" %% "spark-sql" % "3.5.1" % "provided"
)
Create src/main/scala/WordCount.scala:
import org.apache.spark.sql.SparkSession
object WordCount {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder
.appName("WordCountExample")
.master("local[*]")
.getOrCreate()
val data = spark.read.textFile("README.md")
val counts = data.flatMap(_.split(" "))
.groupByKey(identity)
.count()
counts.show(20, truncate=false)
spark.stop()
}
}
Build & Run
sbt package
spark-submit --class WordCount target/scala-2.12/spark-scala-example_2.12-0.1.jar
3. IntelliJ IDEA on Ubuntu
- Download IntelliJ IDEA Community edition from App Center or JetBrains website.
- Install and open IntelliJ IDEA.
- Go to
File → Settings → Plugins, search for Scala, and install the Scala plugin.
3.1 Create a New sbt Project
- Click New Project → choose Scala.
- Select sbt as the build tool.
- Set Project JDK to Java 17 (installed earlier).
- Set project name:
spark-scala-example.
3.2 Add build.sbt
Inside your project root, create/edit build.sbt:
name := "spark-scala-example"
version := "0.1"
scalaVersion := "2.12.18" // Must match your Spark build
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.1" % "provided",
"org.apache.spark" %% "spark-sql" % "3.5.1" % "provided"
)
3.3 Add Example Scala File
Create the folder structure:
mkdir -p src/main/scala
Create src/main/scala/WordCount.scala:
import org.apache.spark.sql.SparkSession
object WordCount {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder
.appName("WordCountExample")
.master("local[*]") // run locally
.getOrCreate()
val data = spark.read.textFile("README.md") // any text file
val counts = data.flatMap(_.split(" "))
.groupByKey(identity)
.count()
counts.show(20, truncate=false)
spark.stop()
}
}
Go to edit Run configurations,0 click on Modify Options, select Add VM Options, then copy paste the below :
--add-exports java.base/sun.nio.ch=ALL-UNNAMED \
-cp /home/sthithapragna/IdeaProjects/untitled/target/scala-2.12/classes:\
/home/sthithapragna/spark-3.5.6-bin-hadoop3/jars/*
3.4 Build & Run the Project
Click the run button or ctrl+F10
4. Apache Zeppelin on Ubuntu
Download and extract Zeppelin:
wget https://dlcdn.apache.org/zeppelin/zeppelin-0.12.0/zeppelin-0.12.0-bin-all.tgz
tar -xvzf zeppelin-0.12.0-bin-all.tgz
cd zeppelin-0.12.0-bin-all
Install all interpreters
./bin/install-interpreter.sh --all
Start Zeppelin
bin/zeppelin-daemon.sh start
Access Zeppelin at: http://localhost:8080
✅ You now have R, Scala, IntelliJ IDEA, and Apache Zeppelin ready to use on Ubuntu 🚀
🔹 Tutorial: R, Scala, IntelliJ & Zeppelin on Windows
This guide walks you through installing and running R with SparkR, Scala + sbt, IntelliJ IDEA, and Apache Zeppelin on Windows.
1. R on Windows
Download & install R and RStudio:
Using SparkR
Open R studio and run the following commands:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
# Initialize Spark session
sparkR.session(master = "local[*]", appName = "RWithSpark")
# Test DataFrame API
df <- as.DataFrame(faithful)
head(df)
# Stop Spark session
sparkR.session.stop()
2. Scala + sbt on Windows
Download & install Scala and sbt:
Verify installation:
scala -version
sbt --version
Create a Sample Spark Project
mkdir spark-scala-example
cd spark-scala-example
Create build.sbt:
name := "spark-scala-example"
version := "0.1"
scalaVersion := "2.12.18"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.1" % "provided",
"org.apache.spark" %% "spark-sql" % "3.5.1" % "provided"
)
Add src/main/scala/WordCount.scala:
import org.apache.spark.sql.SparkSession
object WordCount {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder
.appName("WordCountExample")
.master("local[*]")
.getOrCreate()
val data = spark.read.textFile("README.md")
val counts = data.flatMap(_.split(" "))
.groupByKey(identity)
.count()
counts.show(20, truncate=false)
spark.stop()
}
}
Build & Run
sbt package
spark-submit --class WordCount target/scala-2.12/spark-scala-example_2.12-0.1.jar
3. IntelliJ IDEA on Windows
Download IntelliJ IDEA (Community edition recommended):
3.1 Create a New sbt Project
- Click New Project → choose Scala.
- Select sbt as the build tool.
- Set Project JDK to Java 17 (installed earlier).
- Set project name:
spark-scala-example.
3.2 Add build.sbt
Inside your project root, create/edit build.sbt:
name := "spark-scala-example"
version := "0.1"
scalaVersion := "2.12.18" // Must match your Spark build
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.1" % "provided",
"org.apache.spark" %% "spark-sql" % "3.5.1" % "provided"
)
3.3 Add Example Scala File
Create the folder structure:
mkdir -p src/main/scala
Create src/main/scala/WordCount.scala:
import org.apache.spark.sql.SparkSession
object WordCount {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder
.appName("WordCountExample")
.master("local[*]") // run locally
.getOrCreate()
val data = spark.read.textFile("README.md") // any text file
val counts = data.flatMap(_.split(" "))
.groupByKey(identity)
.count()
counts.show(20, truncate=false)
spark.stop()
}
}
Go to edit Run configurations, click on Modify Options, select Add VM Options, then copy paste the below :
--add-exports java.base/sun.nio.ch=ALL-UNNAMED ^
-cp C:\Users\sthithapragna\IdeaProjects\spark-scala-examples\target\scala-2.12\classes;C:\Users\sthithapragna\spark-3.5.6\jars\*
3.4 Build & Run the Project
Click the run button or ctrl+F10
4. Apache Zeppelin on Windows (via Docker)
Create folders for notebooks and logs:
mkdir C:\docker\zeppelin\notebooks
mkdir C:\docker\zeppelin\logs
Run Zeppelin with Spark mounted:
docker pull apache/zeppelin:0.12.0
docker run --name zeppelin ^
-p 8080:8080 ^
-e SPARK_HOME=/opt/spark ^
-e ZEPPELIN_ADDR=0.0.0.0 ^
-e ZEPPELIN_LOG_DIR=/logs ^
-v "C:\Users\sthithapragna\spark-3.5.6:/opt/spark" ^
-v "C:\docker\zeppelin\notebooks:/zeppelin/notebook" ^
-v "C:\docker\zeppelin\logs:/logs" ^
apache/zeppelin:0.12.0
Prepare Spark inside container:
cd /tmp
curl -O https://downloads.apache.org/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz
tar xvf spark-3.5.6-bin-hadoop3.tgz
ln -s spark-3.5.6-bin-hadoop3 spark
export SPARK_HOME=/tmp/spark
Configure Zeppelin Interpreter
In Zeppelin UI → Interpreter → spark & spark-submit, set:
spark.home = /tmp/spark
master = local[*]
Verify in Zeppelin
Open: http://localhost:8080/classic
%pyspark
spark
%pyspark
df = spark.createDataFrame(
[
("sue", 32),
("li", 3),
("bob", 75),
("heo", 13),
],
["first_name", "age"],
)
df.show()
✅ You now have R, Scala, IntelliJ IDEA, and Zeppelin set up on Windows 🚀

Comments
No comments yet. Be the first!
You must log in to comment.