Integrating LVM with Hadoop and providing the Elasticity to the Slave Storage

Rutujakonde
7 min readNov 4, 2020

Introduction

LVM (Logical Volume Management) , is a storage device management technology that gives users the power to pool and abstract the physical layout of component storage devices for easier and flexible administration. Utilizing the device mapper Linux kernel framework, the current iteration, LVM2, can be used to gather existing storage devices into groups and allocate logical units from the combined space as needed.

The main advantages of LVM are increased abstraction, flexibility, and control. Logical volumes can have meaningful names like “databases” or “root-backup”. Volumes can be resized dynamically as space requirements change and migrated between physical devices within the pool on a running system or exported easily. LVM also offers advanced features like snapshotting, striping, and mirroring.

LVM Storage Management Structures

LVM functions by layering abstractions on top of physical storage devices. The basic layers that LVM uses, starting with the most primitive, are.

  • Physical Volumes:
  • LVM utility prefix: pv...
  • Description: Physical block devices or other disk-like devices (for example, other devices created by device mapper, like RAID arrays) are used by LVM as the raw building material for higher levels of abstraction. Physical volumes are regular storage devices. LVM writes a header to the device to allocate it for management.

Volume Groups:

LVM utility prefix: vg...

Description: LVM combines physical volumes into storage pools known as volume groups. Volume groups abstract the characteristics of the underlying devices and function as a unified logical device with combined storage capacity of the component physical volumes.

Logical Volumes:

LVM utility prefix: lv... (generic LVM utilities might begin with lvm...)

Description: A volume group can be sliced up into any number of logical volumes. Logical volumes are functionally equivalent to partitions on a physical disk, but with much more flexibility. Logical volumes are the primary component that users and applications will interact with.

Objective

To achieve elasticity in data node by increasing the capacity of data node on the fly (without touching the previously present data)

Setting up Logical volume as a Data Node directory

STEP I : Attach two Hard Disks to the Slave

Our first task is to create an LVM inside the virtual box for this first we need hard disk but in case of virtual box we can also say this we are going to connect virtual hard disk

  • Go to virtual and right click on the OS and go to machine setting
  • Click on storage
  • Click on add hard disk option

Here, I created the first Hard Disk of 10 GB size. Similarly, also create the second Hard Disk. In my case, I created HD1(10 GB) and HD2(20 GB) .

By using command fdisk -l , we can check all attached volumes and their size also.

STEP II : Creating Physical Volume (PV) ,Volume group(VG) and Logical Volume (LV)

  • first we need to create pv (physical volume) use pvcreate /dev/sdb command
  • Now we are going to create vg(volume group) by the command vgcreate taskvg /dev/sdb command (taskvg is just a name for user purpose you can give any name)
  • Now lets find out how much size we can allocate to our lv(logical volume). Use vgdisplay taskvg command to show the information about vg

Here , we have approximately 10 GB of free size available. (Slightly less than 10 GB )

  • now its time to create lv(logical volume) with the above created vg use lvcreate --size 9.9G --name datalv taskvg command

size:- to define size of lv in my case its 9.9 GB (G=GB, M=MB, K=KB)

name:- you can give any name (Here I gave datalv)

taskvg:- name of vg from which lv is going to take their storage from

when we create or attach hard disk drive does not load from memory this command helps to load driver from memory udevadm settle

lvdisplay /dev/taskvg/datalv command to display the information about lv that we created

Here, the LV Size is 9.90 GB

STEP III : Format and mount the LV

  • format hard disk in ext4 (fourth extended filesystem or extension version 4 )format which standard format in linux use mkfs.ext4 /dev/taskvg/datalv command
  • Mount the partition on the directory which is created for the datanode. You can get the file name from hdfs-site.xml file present in /etc/hadoop folder

In my case, the directory name is dn1 present in / drive

Now mount LV on this directory
use mount /dev/taskvg/datavg /slave_dir/ command

  • to check whether lv is successfully mounted use df -hT command

STEP IV : Start the service of datanode as well as namenode

After successfully creating the cluster check the datanode’s report by the command hadoop dfsadmin -report

Here, the Configured capacity is 9.68 GB. But now certain use-case come up and now I required 20 GB more hard disk in this cluster without touching the previous data. That means I need to attach 20GB more on the fly.

STEP V : Add another Hard Disk to vg

  • to extend vg first we need new pv for new pv we need new hard disk and we already have /dev/sdc use pvcreate /dev/sdc command
  • now use vgextend taskvg /dev/sdc command
  • use lvextend --size +10G /dev/taskvg/datalv command to extend the size of lv by 20GB
  • now use vgdisplay taskvg command the size available after extending vg

As 10 GB (approx) space is previously created and now we extended the size by 20 GB(approx) . So now the available space is 2.99GB (approx)

STEP VI : Format the extended volume

  • we extend the lv but we didn’t format again and user can’t store the data in unformatted i.e extended hard disk that why df -hT command it showing that our lv was not extended so basically it’s showing only that portion of lv in which user can store their data not the actually size of lv
  • can’t we just unmount and format lv again yes we can do this but the problem is this our all data has been lost if we do this so for solving this issue we can use resize2fs command this command check the starting block find their format type and format all the unformatted block to the previous format type
  • Now format only the extended partition by using the command resize2fs /dev/taskvg/datalv

( Note : Do not format the entire partition , as it will delete the inode table of previously formatted Hard disk also. So format only the newly attached Hard-Disk )

Now use the command df -hT command to see whether the size is increased or not. Here, 30 G Size Hard Disk mounted on /dn1

STEP VII : Check the dfsadmin report

Now the 29.37GB (30G approx) storage is being shared.

Conclusion

We finally achieve elasticity in data node by increasing the capacity of data node on the fly by using the concept of LVM .

THANK YOU :)

--

--