Integrating LVM with Hadoop and providing the Elasticity to the Slave Storage
Introduction
LVM (Logical Volume Management) , is a storage device management technology that gives users the power to pool and abstract the physical layout of component storage devices for easier and flexible administration. Utilizing the device mapper Linux kernel framework, the current iteration, LVM2, can be used to gather existing storage devices into groups and allocate logical units from the combined space as needed.
The main advantages of LVM are increased abstraction, flexibility, and control. Logical volumes can have meaningful names like “databases” or “root-backup”. Volumes can be resized dynamically as space requirements change and migrated between physical devices within the pool on a running system or exported easily. LVM also offers advanced features like snapshotting, striping, and mirroring.
LVM Storage Management Structures
LVM functions by layering abstractions on top of physical storage devices. The basic layers that LVM uses, starting with the most primitive, are.
- Physical Volumes:
- LVM utility prefix:
pv...
- Description: Physical block devices or other disk-like devices (for example, other devices created by device mapper, like RAID arrays) are used by LVM as the raw building material for higher levels of abstraction. Physical volumes are regular storage devices. LVM writes a header to the device to allocate it for management.
Volume Groups:
LVM utility prefix:
vg...
Description: LVM combines physical volumes into storage pools known as volume groups. Volume groups abstract the characteristics of the underlying devices and function as a unified logical device with combined storage capacity of the component physical volumes.
Logical Volumes:
LVM utility prefix:
lv...
(generic LVM utilities might begin withlvm...
)Description: A volume group can be sliced up into any number of logical volumes. Logical volumes are functionally equivalent to partitions on a physical disk, but with much more flexibility. Logical volumes are the primary component that users and applications will interact with.
Objective
To achieve elasticity in data node by increasing the capacity of data node on the fly (without touching the previously present data)
Setting up Logical volume as a Data Node directory
STEP I : Attach two Hard Disks to the Slave
Our first task is to create an LVM inside the virtual box for this first we need hard disk but in case of virtual box we can also say this we are going to connect virtual hard disk
- Go to virtual and right click on the OS and go to machine setting
- Click on storage
- Click on add hard disk option
Here, I created the first Hard Disk of 10 GB size. Similarly, also create the second Hard Disk. In my case, I created HD1(10 GB) and HD2(20 GB) .
By using command fdisk -l , we can check all attached volumes and their size also.
STEP II : Creating Physical Volume (PV) ,Volume group(VG) and Logical Volume (LV)
- first we need to create pv (physical volume) use
pvcreate /dev/sdb
command
- Now we are going to create vg(volume group) by the command
vgcreate taskvg /dev/sdb
command (taskvg is just a name for user purpose you can give any name)
- Now lets find out how much size we can allocate to our lv(logical volume). Use
vgdisplay taskvg
command to show the information about vg
Here , we have approximately 10 GB of free size available. (Slightly less than 10 GB )
- now its time to create lv(logical volume) with the above created vg use
lvcreate --size 9.9G --name datalv taskvg
command
size:- to define size of lv in my case its 9.9 GB (G=GB, M=MB, K=KB)
name:- you can give any name (Here I gave datalv)
taskvg:- name of vg from which lv is going to take their storage from
when we create or attach hard disk drive does not load from memory this command helps to load driver from memory udevadm settle
lvdisplay /dev/taskvg/datalv
command to display the information about lv that we created
Here, the LV Size is 9.90 GB
STEP III : Format and mount the LV
- format hard disk in ext4 (fourth extended filesystem or extension version 4 )format which standard format in linux use
mkfs.ext4 /dev/taskvg/datalv
command
- Mount the partition on the directory which is created for the datanode. You can get the file name from hdfs-site.xml file present in /etc/hadoop folder
In my case, the directory name is dn1 present in / drive
Now mount LV on this directory
use mount /dev/taskvg/datavg /slave_dir/
command
- to check whether lv is successfully mounted use
df -hT
command
STEP IV : Start the service of datanode as well as namenode
After successfully creating the cluster check the datanode’s report by the command hadoop dfsadmin -report
Here, the Configured capacity is 9.68 GB. But now certain use-case come up and now I required 20 GB more hard disk in this cluster without touching the previous data. That means I need to attach 20GB more on the fly.
STEP V : Add another Hard Disk to vg
- to extend vg first we need new pv for new pv we need new hard disk and we already have /dev/sdc use
pvcreate /dev/sdc
command
- now use
vgextend taskvg /dev/sdc
command
- use
lvextend --size +10G /dev/taskvg/datalv
command to extend the size of lv by 20GB
- now use
vgdisplay taskvg
command the size available after extending vg
As 10 GB (approx) space is previously created and now we extended the size by 20 GB(approx) . So now the available space is 2.99GB (approx)
STEP VI : Format the extended volume
- we extend the lv but we didn’t format again and user can’t store the data in unformatted i.e extended hard disk that why df -hT command it showing that our lv was not extended so basically it’s showing only that portion of lv in which user can store their data not the actually size of lv
- can’t we just unmount and format lv again yes we can do this but the problem is this our all data has been lost if we do this so for solving this issue we can use resize2fs command this command check the starting block find their format type and format all the unformatted block to the previous format type
- Now format only the extended partition by using the command
resize2fs /dev/taskvg/datalv
( Note : Do not format the entire partition , as it will delete the inode table of previously formatted Hard disk also. So format only the newly attached Hard-Disk )
Now use the command df -hT
command to see whether the size is increased or not. Here, 30 G Size Hard Disk mounted on /dn1
STEP VII : Check the dfsadmin report
Now the 29.37GB (30G approx) storage is being shared.
Conclusion
We finally achieve elasticity in data node by increasing the capacity of data node on the fly by using the concept of LVM .
THANK YOU :)