π E03 β Hands-On Exercises: Spark for BI (DAU, Avg Cart Size, Top 10 Items)
Exercise 1: Create Sample Activity Data
Objective: Set up a Spark DataFrame simulating ecommerce activity.
import spark.implicits._
import org.apache.spark.sql.functions._
case class Activity(userId: String, cartId: String, itemId: String)
val activities = Seq(
Activity("u1", "c1", "i1"),
Activity("u1", "c1", "i2"),
Activity("u2", "c2", "i1"),
Activity("u3", "c3", "i3"),
Activity("u4", "c4", "i3")
)
val df = spark.createDataFrame(activities)
.toDF("user_id", "cart_id", "item_id")
df.show()
Expected Outcome:
+-------+-------+-------+ |user_id|cart_id|item_id| +-------+-------+-------+ | u1| c1| i1| | u1| c1| i2| | u2| c2| i1| | u3| c3| i3| | u4| c4| i3| +-------+-------+-------+
Exercise 2: Daily Active Users (DAU)
Objective: Count unique users who visited.
df.select("user_id").distinct().count()
Expected Outcome:
4 (there are 4 unique users in the sample data).Exercise 3: Average Cart Size
Objective: Calculate the average number of items per cart.
val avg_cart_items = df
.select("cart_id","item_id")
.groupBy("cart_id")
.agg(count(col("item_id")) as "total")
.agg(avg(col("total")) as "avg_cart_items")
avg_cart_items.show()
Expected Outcome:
+--------------+ |avg_cart_items| +--------------+ | 1.25| +--------------+
Exercise 4: Inspect Execution Plan
Objective: View the query plan for average cart size computation.
avg_cart_items.explain()
Expected Outcome: Spark prints a physical plan (DAG) showing stages, aggregations, and shuffles.
Exercise 5: Top 10 Most Added Items
Objective: Identify the most frequently added items across all carts.
df.select("item_id")
.groupBy("item_id")
.agg(count(col("item_id")) as "total")
.sort(desc("total"))
.limit(10)
.show()
Expected Outcome:
+-------+-----+ |item_id|total| +-------+-----+ | i3| 2| | i1| 2| | i2| 1| +-------+-----+
Exercise 6: Daily Active Users with Spark SQL
Objective: Use Spark SQL to count distinct users.
df.createOrReplaceTempView("activities")
spark.sql("""
select count(distinct(user_id)) as unique_users
from activities
""").show()
Expected Outcome:
+------------+ |unique_users| +------------+ | 4| +------------+

Comments
No comments yet. Be the first!
You must log in to comment.