Tutorial: Hadoop & Hive Setup on Ubuntu/Windows with PostgreSQL Metastore
This guide covers installation and setup of Hadoop, Hive, and PostgreSQL as a Hive Metastore on Ubuntu.
1. Install and Configure Hadoop
Download Hadoop:
https://hadoop.apache.org/releases.html
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
tar -xvzf hadoop-3.4.0.tar.gz
mv hadoop-3.4.0 ~/hadoop
Update Hadoop Configuration
- Edit
hadoop-env.sh→ set JAVA_HOME:
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/
- Edit
core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9745</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/sthithapragna/hadoop/tmp</value>
</property>
</configuration>
- Edit
hdfs-site.xml(NameNode + DataNode paths):
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:/home/sthithapragna/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/home/sthithapragna/hdfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
- Edit
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- Edit
yarn-site.xml:
<property>
<name>yarn.web-proxy.address</name>
<value>localhost:9099</value>
</property>
Update .bashrc
export HADOOP_HOME=/home/sthithapragna/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
Enable SSH for Hadoop
sudo apt update
sudo apt install openssh-server -y
sudo systemctl enable ssh
sudo systemctl start ssh
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 755 ~/.ssh/authorized_keys
ssh localhost
Format & Start Hadoop
hdfs namenode -format
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
2. Install and Configure Hive
Download and extract Hive:
wget https://dlcdn.apache.org/hive/hive-4.1.0/apache-hive-4.1.0-bin.tar.gz
tar -zxvf apache-hive-4.1.0-bin.tar.gz
mv apache-hive-4.1.0-bin ~/hive
Update Hive configuration:
cd ~/hive/conf
mv hive-default.xml.template hive-site.xml
mv hive-env.sh.template hive-env.sh
Edit hive-site.xml (temporary Derby DB config):
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/sthithapragna/hive/metastore/metastore_db;create=true</value>
Update .bashrc
export HIVE_HOME=/home/sthithapragna/hive
export PATH=$PATH:$HIVE_HOME/bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
Prepare HDFS directories
hdfs dfs -mkdir -p /user/hive/warehouse /tmp/hive
hdfs dfs -chmod -R 777 /user/hive /tmp/hive
3. Configure PostgreSQL as Hive Metastore
Install PostgreSQL:
sudo apt install postgresql -y
sudo systemctl status postgresql
Create Hive Metastore user + DB:
sudo -i -u postgres
psql
CREATE USER psqluser WITH PASSWORD 'sthithapragna';
CREATE DATABASE pymetastore OWNER psqluser;
\q
Edit pg_hba.conf to allow local connections (change auth to trust):
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
Update Hive to use PostgreSQL
Edit hive-site.xml:
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/pymetastore</value>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<name>javax.jdo.option.ConnectionUserName</name>
<value>psqluser</value>
<name>javax.jdo.option.ConnectionPassword</name>
<value>sthithapragna</value>
Initialize Hive Metastore
- Download PostgreSQL JDBC driver: PostgreSQL JDBC
- Copy driver to Hive lib:
cp postgresql-42.7.2.jar ~/hive/lib/
schematool -dbType postgres -initSchema --verbose
Start Hive Services
$HIVE_HOME/bin/hive --service metastore 1>$HIVE_HOME/logs/metastore.out 2>$HIVE_HOME/logs/metastore.log &
$HIVE_HOME/bin/hiveserver2 1>$HIVE_HOME/logs/hive.out 2>$HIVE_HOME/logs/hive.log &
Connect with Beeline
beeline --verbose=true -u "jdbc:hive2://127.0.0.1:10000/default"
4. Handy Queries in PostgreSQL
- List Hive databases:
SELECT "NAME", "DB_LOCATION_URI" FROM "DBS";
- Show all tables in a database (example:
smoke_test_db):
SELECT "TBL_NAME", "TBL_TYPE"
FROM "TBLS" t
JOIN "DBS" d ON t."DB_ID"=d."DB_ID"
WHERE d."NAME"='smoke_test_db';
- Show columns of a table:
SELECT c."COLUMN_NAME", c."TYPE_NAME"
FROM "COLUMNS_V2" c
JOIN "SDS" s ON c."CD_ID"=s."CD_ID"
JOIN "TBLS" t ON s."SD_ID"=t."SD_ID"
JOIN "DBS" d ON t."DB_ID"=d."DB_ID"
WHERE d."NAME"='smoke_test_db' AND t."TBL_NAME"='your_table';
✅ You now have Hadoop, Hive, and PostgreSQL Metastore configured on Ubuntu 🚀
This guide covers installation and setup of Hadoop and Hive on Windows, using PostgreSQL as the Hive Metastore database.
1. Install and Configure Hadoop
Verify Java installation:
java -version
Download Hadoop binaries:
https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
Download hadoop.dll and winutils.exe from:
https://github.com/cdarlint/winutils/tree/master/hadoop-3.3.6/bin
Place them in C:\hadoop-3.3.6\bin\.
Edit Configuration Files
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/C:/Users/sthithapragna/hadoop-3.3.6/tmp</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/C:/Users/sthithapragna/hadoop-3.3.6/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/C:/Users/sthithapragna/hadoop-3.3.6/data/datanode</value>
</property>
</configuration>
Set Java Path in hadoop-env.cmd
set JAVA_HOME="C:\Progra~1\Java\jre1.8.0_451"
Set Environment Variables
setx JAVA_HOME C:\Program Files\Java\jdk-1.8
setx HADOOP_HOME C:\hadoop-3.3.6
setx HADOOP_OPTS "-Djava.library.path=%HADOOP_HOME%\bin"
setx PATH "%PATH%;%JAVA_HOME%\bin;%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin"
Check Diagnostics
winutils
hadoop version
Create Directories and Set Permissions
New-Item -ItemType Directory -Force C:\hadoop-3.3.6\data\namenode
New-Item -ItemType Directory -Force C:\hadoop-3.3.6\data\datanode
New-Item -ItemType Directory -Force C:\hadoop-3.3.6\tmp
winutils chmod -R 777 C:\hadoop-3.3.6\data
winutils chmod -R 777 C:\hadoop-3.3.6\tmp
Format HDFS and Start Hadoop
hdfs namenode -format
start-dfs.cmd
start-yarn.cmd
Check if services are running:
netstat -ano | findstr ":9000"
netstat -ano | findstr ":9870"
Cluster health:
hdfs dfsadmin -report
yarn node -list
2. Install and Configure Hive
Ensure prerequisites:
- Hadoop HDFS is running (
start-dfs.cmd / start-yarn.cmd) - PostgreSQL is installed (metastore user & DB created)
- Java JDK (not just JRE) is installed
winutils.exeandhadoop.dllare in%HADOOP_HOME%\bin
Download Hive
https://dlcdn.apache.org/hive/
Extract Hive to C:\hive-4.0.1.
Set Environment Variables
setx HIVE_HOME C:\hive-4.0.1
setx PATH "%PATH%;%HIVE_HOME%\bin"
Configure PostgreSQL Metastore
In PostgreSQL:
CREATE USER psqluser WITH PASSWORD 'sthithapragna';
CREATE DATABASE pymetastore OWNER psqluser;
\c pymetastore;
Copy JDBC driver to:
C:\hive-4.0.1\lib
Create hive-site.xml
Save under C:\hive-4.0.1\conf:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/pymetastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>psqluser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>sthithapragna</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
</configuration>
Initialize Schema
schematool -dbType postgres -initSchema
Start Hive Services
hive --service metastore
hive --service hiveserver2
Check services:
netstat -ano | findstr "9083"
netstat -ano | findstr "10000"
Configure Beeline
Edit beeline.cmd and replace with:
@echo off
set JAVA_OPTS=-Djline.terminal=none
set HIVE_HOME=C:\hive-4.0.1
set HADOOP_HOME=C:\hadoop-3.3.6
set CLASSPATH=%HIVE_HOME%\conf;%HIVE_HOME%\lib\*;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*
java %JAVA_OPTS% -cp "%CLASSPATH%" org.apache.hive.beeline.BeeLine %*
Connect with Beeline
beeline -u jdbc:hive2://localhost:10000
Expected prompt:
0: jdbc:hive2://localhost:10000>
✅ Hadoop and Hive are now configured on Windows 🚀

Comments
No comments yet. Be the first!
You must log in to comment.