Apache Spark MasterClass Chapter 2 – Episode 7

  1. MySQL auth plugins: In MySQL 8, when would you switch a user from caching_sha2_password to mysql_native_password? What are the security and compatibility tradeoffs? Show answer

    MySQL 8 defaults to caching_sha2_password. Switch to mysql_native_password only when legacy clients/drivers (older JDBC/ODBC, certain apps) can’t handle the default. Trade-offs: caching_sha2_password is stronger (better hashing & auth flow) while mysql_native_password is older and less secure. If switching, scope it to a single user, enable TLS, and plan an upgrade path for clients.

  2. Initial hardening: Walk through your checklist after sudo apt install mysql-server on Ubuntu. Show answer

    Run mysql_secure_installation (remove anonymous users, disallow remote root, remove test DB, reload privileges). Change/confirm root auth method (e.g., ALTER USER …). Create a dedicated, least-privileged app user. Review bind-address in mysqld.cnf (default 127.0.0.1 for dev). Turn on backups and slow-query log; consider TLS if remote connections are required.

  3. Sakila ingestion troubleshooting: If loading Sakila via mysql -u root -h 127.0.0.1 -p < file.sql fails, how do you diagnose path issues, server connectivity, or SQL mode problems? Show answer

    Check the file path and permissions; try quoting the path. Confirm the server is listening (ss -ltnp

  4. MongoDB install paths: Explain when you’d use dpkg -i on a single .deb vs adding the official apt repo and installing packages via apt. What do you gain/lose? Show answer

    dpkg -i is quick for a single .deb but doesn’t resolve dependencies automatically (you may need apt -f install). Adding the official apt repo lets you apt install with dependencies resolved and receive updates via apt upgrade. For anything beyond one-off testing, prefer the official repo.

  5. Mongo service fails to start: systemctl start mongod shows failed. What logs and checks do you run? Show answer

    Check journalctl -u mongod and /var/log/mongodb/mongod.log. Validate data directory exists and ownership (/data/db or /var/lib/mongo, chown -R mongodb:mongodb). Ensure port 27017 isn’t in use (ss -ltnp). Verify repo/GPG key health if installation errors preceded it. Check SELinux/AppArmor and unit file correctness.

  6. mongosh vs legacy mongo: What changed with mongosh? How do you ensure compatibility? Show answer

    mongosh is the modern Node.js-based shell with better UX, completions, and support for new server features; the old mongo is deprecated. For teams, standardize on mongosh, document installs, and verify scripts that used the old shell (adjust CLI flags or migrate scripts to drivers).

  7. Cassandra tarball setup: You’ve unpacked Cassandra to ~/cassandra. What prerequisites do you confirm, and which configs do you tune first? Show answer

    Confirm supported Java (e.g., OpenJDK 17 for 5.x), correct ulimit/nofile, disable swap or set vm.swappiness, and ensure adequate RAM/disk. Initial configs: cassandra.yaml (listen/broadcast addresses, seeds, data dirs), heap in jvm-server.options (≈1/4 system RAM, within caps), and GC settings. For dev: RF=1; for multi-node: plan RF and topology.

  8. Cluster health & ops: Interpret nodetool status output—what do UN, UJ, DN, UL mean, and recovery steps? Show answer

    First letter: U/D = Up/Down. Second: N/J/L/M = Normal/Joining/Leaving/Moving. UN healthy; UJ joining/streaming; DN down; UL up but leaving. To recover: check logs, fix cause (disk, network, heap), then restart. Avoid forced rebuilds unless needed; use nodetool repair/rebuild appropriately.

  9. Data modeling: Compare Cassandra’s wide-column model to MySQL’s relational model and MongoDB’s document model. Show answer

    MySQL (relational): strong schema, joins, transactions—OLTP, normalized data (e.g., order management). MongoDB (document): hierarchical, flexible schema—event payloads, catalogs, user profiles. Cassandra (wide-column): time-series and large-scale write-heavy workloads with predictable queries (IoT metrics, timelines). Choose based on query patterns and SLAs.

  10. Replication & consistency (Cassandra): Explain SimpleStrategy vs NetworkTopologyStrategy and effect of CLs. Show answer

    SimpleStrategy is single-DC (dev only). NetworkTopologyStrategy is for multi-DC, sets RF per DC. Consistency levels: ONE/LOCAL_ONE low latency but less durability; QUORUM/LOCAL_QUORUM balance latency/consistency; ALL highest consistency, lowest availability. LOCAL_QUORUM is common default in multi-DC.

  11. Securing services: Outline steps for MySQL, MongoDB, Cassandra. Show answer

    MySQL: least-privilege users, strong auth plugin, TLS, bind-address/firewall, rotate creds, enable logs. MongoDB: enable auth, create users/roles per DB, set bindIp, enable TLS, consider keyFile/x.509 for clustering. Cassandra: enable role-based auth, enforce TLS client-to-node & node-to-node, restrict JMX, firewall ports (7000/7001/7199/9042), avoid public exposure.

  12. JDBC with PySpark: Show how you’d read a MySQL table and contrast with MongoDB/Cassandra. Show answer

    MySQL (JDBC): spark.read.format(\jdbc\).option(\url\,\jdbc:mysql://127.0.0.1:3306/sakila\).option(\dbtable\,\film\).option(\user\,\msqluser\).option(\password\,\*****\).option(\driver\,\com.mysql.cj.jdbc.Driver\).load(). MongoDB: use Mongo Spark Connector with .format(\mongodb\). Cassandra: use Spark-Cassandra connector with spark.read.format(\org.apache.spark.sql.cassandra\). All require correct connector JARs/packages.

  13. Windows constraints: Cassandra doesn’t run natively. How do you productionize with Docker or WSL? Show answer

    Prefer Docker: docker run –name cassandra -d -p 9042:9042 cassandra; mount a volume (-v C:\\data\\cassandra:/var/lib/cassandra); allocate memory/CPU. For WSL2: run Docker in WSL backend or Linux VM. In prod, use Linux nodes; Windows is for dev/test via containers.

  14. Backup & restore: Compare MySQL, MongoDB, Cassandra approaches. Show answer

    MySQL: mysqldump (portable, slower) or XtraBackup (hot, incremental). MongoDB: mongodump/mongorestore for small/medium; snapshots for large sets. Cassandra: nodetool snapshot creates hard links; back up SSTables; restore via streaming/sstableloader. Choice depends on dataset size, uptime, RTO.

  15. Troubleshooting ports & firewalls: Defaults and safe verification? Show answer

    Defaults: MySQL 3306, MongoDB 27017, Cassandra CQL 9042 (plus internode 7000/7001, JMX 7199). Verify listeners with ss -ltnp or netstat. Open safely with ufw allow 3306/tcp or iptables, restrict to source CIDRs/security groups, and always combine with auth/TLS.

Comments

No comments yet. Be the first!

You must log in to comment.