Lightweight Virtual Machine Hadoop Distributed File System Deployment
If you just want to experiment with the Hadoop Distributed File System (HDFS), there is no need to set it up permanently on your local machine. Furthermore, if Windows is your primary operating system, you do not have to install Linux on your machine in order to give HDFS a try.
You can set up HDFS on a virtual machine and interact with it through your local operating system. Nevertheless, running a virtual machine can be very frustrating if you don't have a large amount of memory. Therefore, in this tutorial we are going to use a light weight operating system for the virtual machine: Arch Linux.
Prerequisites:
- Download and install VirtualBox for your host machine.
- Download Arch Linux: use any of the HTTP direct download links to download the "archlinux-****.iso" file.
Add networking interfaces from the general network preferences:
- NAT Networks
- Host-only Networks
Create a new virtual machine and set up the operating system and memory:
Click on Settings to set up the created VM instance:
- Storage
- Network
- Shared Folders (optional but preferred)
The following section is based on David Goguen's tutorial:
How to Install Arch Linux
I suggest that you watch the whole video ('speeded playback') first, then continue with this tutorial. There is only a slight variation in some steps.
I suggest that you watch the whole video ('speeded playback') first, then continue with this tutorial. There is only a slight variation in some steps.
Install the operating system on the virtual hard disk:
Partition the /dev/sda Hard Disk:
- Swap Partition
- Root Partition
- Swap Partition Type
- Save Created Partitions
- Format and Mount Root Partition
- Make and Enable Swap Area on Swap Partition
- Install OS on Root Partition
- Configure OS
- Users
- Language
- Time Zone
- Hostname
- Bootloader
- FSTAB
- Boot Order
- Login
- Internet
- Refresh Packages
- SSH
- Headers and Guest Additions
- Java
- Shared Folder
In this section, we are going to download the Hadoop binaries to configure HDFS. For connivance, you can download the Hadoop files into the shared folder and edit them using a text editor with a GUI.
Get a download link for an HTTP mirror to download the binary tarball from hadoop.apache.org/releases.html or Hadoop 2.7.3.
Get a download link for an HTTP mirror to download the binary tarball from hadoop.apache.org/releases.html or Hadoop 2.7.3.
- Download Hadoop Binaries
- Pseudo-Distributed Configuration
Edit the following files and copy the corresponding content:
- core-site.xml
- yarn-site.xml
- mapred-site.xml
- hdfs-site.xml
- Hadoop Variables
- Hadoop Directory
- Passphraseless SSH
- Format HDFS
- Start HDFS and YARN
- Checkout the Web UI for HDFS and YARN
- Test HDFS
Use the following pom.xml to automatically download all of the requirements:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>edu.umkc.sce.csee.dbis.hadoop</groupId>
<artifactId>Hadoop</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Hadoop</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
</project>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>edu.umkc.sce.csee.dbis.hadoop</groupId>
<artifactId>Hadoop</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Hadoop</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
</project>
- End
Can't create symlinks in virtualbox shared folders
Hadoop - MapReduce
HADOOP 2.7.0 SINGLE NODE CLUSTER SETUP ON UBUNTU 15.04
Hadoop: How to read a file from HDFS in Hadoop classes in Java
Hadoop: Setting up a Single Node Cluster. 1
Hadoop: Setting up a Single Node Cluster. 2
How to Install Arch Linux
Installation steps for Arch Linux guests
Running Hadoop on Ubuntu Linux (Single-Node Cluster)
Setting Up VirtualBox Shared Folders
VirtualBox: mount.vboxsf Question 28328775
Hadoop - MapReduce
HADOOP 2.7.0 SINGLE NODE CLUSTER SETUP ON UBUNTU 15.04
Hadoop: How to read a file from HDFS in Hadoop classes in Java
Hadoop: Setting up a Single Node Cluster. 1
Hadoop: Setting up a Single Node Cluster. 2
How to Install Arch Linux
Installation steps for Arch Linux guests
Running Hadoop on Ubuntu Linux (Single-Node Cluster)
Setting Up VirtualBox Shared Folders
VirtualBox: mount.vboxsf Question 28328775