Apache Spark MasterClass Chapter 2 – Episode 7
-
MySQL auth plugins: In MySQL 8, when would you switch a user from caching_sha2_password to mysql_native_password? What are the security and compatibility tradeoffs? Show answer
MySQL 8 defaults to caching_sha2_password. Switch to mysql_native_password only when legacy clients/drivers (older JDBC/ODBC, certain apps) can’t handle the default. Trade-offs: caching_sha2_password is stronger (better hashing & auth flow) while mysql_native_password is older and less secure. If switching, scope it to a single user, enable TLS, and plan an upgrade path for clients.
-
Initial hardening: Walk through your checklist after sudo apt install mysql-server on Ubuntu. Show answer
Run mysql_secure_installation (remove anonymous users, disallow remote root, remove test DB, reload privileges). Change/confirm root auth method (e.g., ALTER USER …). Create a dedicated, least-privileged app user. Review bind-address in mysqld.cnf (default 127.0.0.1 for dev). Turn on backups and slow-query log; consider TLS if remote connections are required.
-
Sakila ingestion troubleshooting: If loading Sakila via mysql -u root -h 127.0.0.1 -p < file.sql fails, how do you diagnose path issues, server connectivity, or SQL mode problems? Show answer
Check the file path and permissions; try quoting the path. Confirm the server is listening (ss -ltnp
-
MongoDB install paths: Explain when you’d use dpkg -i on a single .deb vs adding the official apt repo and installing packages via apt. What do you gain/lose? Show answer
dpkg -i is quick for a single .deb but doesn’t resolve dependencies automatically (you may need apt -f install). Adding the official apt repo lets you apt install with dependencies resolved and receive updates via apt upgrade. For anything beyond one-off testing, prefer the official repo.
-
Mongo service fails to start: systemctl start mongod shows failed. What logs and checks do you run? Show answer
Check journalctl -u mongod and /var/log/mongodb/mongod.log. Validate data directory exists and ownership (/data/db or /var/lib/mongo, chown -R mongodb:mongodb). Ensure port 27017 isn’t in use (ss -ltnp). Verify repo/GPG key health if installation errors preceded it. Check SELinux/AppArmor and unit file correctness.
-
mongosh vs legacy mongo: What changed with mongosh? How do you ensure compatibility? Show answer
mongosh is the modern Node.js-based shell with better UX, completions, and support for new server features; the old mongo is deprecated. For teams, standardize on mongosh, document installs, and verify scripts that used the old shell (adjust CLI flags or migrate scripts to drivers).
-
Cassandra tarball setup: You’ve unpacked Cassandra to ~/cassandra. What prerequisites do you confirm, and which configs do you tune first? Show answer
Confirm supported Java (e.g., OpenJDK 17 for 5.x), correct ulimit/nofile, disable swap or set vm.swappiness, and ensure adequate RAM/disk. Initial configs: cassandra.yaml (listen/broadcast addresses, seeds, data dirs), heap in jvm-server.options (≈1/4 system RAM, within caps), and GC settings. For dev: RF=1; for multi-node: plan RF and topology.
-
Cluster health & ops: Interpret nodetool status output—what do UN, UJ, DN, UL mean, and recovery steps? Show answer
First letter: U/D = Up/Down. Second: N/J/L/M = Normal/Joining/Leaving/Moving. UN healthy; UJ joining/streaming; DN down; UL up but leaving. To recover: check logs, fix cause (disk, network, heap), then restart. Avoid forced rebuilds unless needed; use nodetool repair/rebuild appropriately.
-
Data modeling: Compare Cassandra’s wide-column model to MySQL’s relational model and MongoDB’s document model. Show answer
MySQL (relational): strong schema, joins, transactions—OLTP, normalized data (e.g., order management). MongoDB (document): hierarchical, flexible schema—event payloads, catalogs, user profiles. Cassandra (wide-column): time-series and large-scale write-heavy workloads with predictable queries (IoT metrics, timelines). Choose based on query patterns and SLAs.
-
Replication & consistency (Cassandra): Explain SimpleStrategy vs NetworkTopologyStrategy and effect of CLs. Show answer
SimpleStrategy is single-DC (dev only). NetworkTopologyStrategy is for multi-DC, sets RF per DC. Consistency levels: ONE/LOCAL_ONE low latency but less durability; QUORUM/LOCAL_QUORUM balance latency/consistency; ALL highest consistency, lowest availability. LOCAL_QUORUM is common default in multi-DC.
-
Securing services: Outline steps for MySQL, MongoDB, Cassandra. Show answer
MySQL: least-privilege users, strong auth plugin, TLS, bind-address/firewall, rotate creds, enable logs. MongoDB: enable auth, create users/roles per DB, set bindIp, enable TLS, consider keyFile/x.509 for clustering. Cassandra: enable role-based auth, enforce TLS client-to-node & node-to-node, restrict JMX, firewall ports (7000/7001/7199/9042), avoid public exposure.
-
JDBC with PySpark: Show how you’d read a MySQL table and contrast with MongoDB/Cassandra. Show answer
MySQL (JDBC): spark.read.format(\jdbc\).option(\url\,\jdbc:mysql://127.0.0.1:3306/sakila\).option(\dbtable\,\film\).option(\user\,\msqluser\).option(\password\,\*****\).option(\driver\,\com.mysql.cj.jdbc.Driver\).load(). MongoDB: use Mongo Spark Connector with .format(\mongodb\). Cassandra: use Spark-Cassandra connector with spark.read.format(\org.apache.spark.sql.cassandra\). All require correct connector JARs/packages.
-
Windows constraints: Cassandra doesn’t run natively. How do you productionize with Docker or WSL? Show answer
Prefer Docker: docker run –name cassandra -d -p 9042:9042 cassandra; mount a volume (-v C:\\data\\cassandra:/var/lib/cassandra); allocate memory/CPU. For WSL2: run Docker in WSL backend or Linux VM. In prod, use Linux nodes; Windows is for dev/test via containers.
-
Backup & restore: Compare MySQL, MongoDB, Cassandra approaches. Show answer
MySQL: mysqldump (portable, slower) or XtraBackup (hot, incremental). MongoDB: mongodump/mongorestore for small/medium; snapshots for large sets. Cassandra: nodetool snapshot creates hard links; back up SSTables; restore via streaming/sstableloader. Choice depends on dataset size, uptime, RTO.
-
Troubleshooting ports & firewalls: Defaults and safe verification? Show answer
Defaults: MySQL 3306, MongoDB 27017, Cassandra CQL 9042 (plus internode 7000/7001, JMX 7199). Verify listeners with ss -ltnp or netstat. Open safely with ufw allow 3306/tcp or iptables, restrict to source CIDRs/security groups, and always combine with auth/TLS.

Comments
No comments yet. Be the first!
You must log in to comment.