2015年1月1日 星期四

Hadoop 查詢切割檔案儲存位置


一、假設有三台機器,vi /etc/hosts分別為

172.17.0.2 namenode
172.17.0.3 secondarynamenode
172.17.0.4 slave1

二、假設有一檔案186MB上傳到HDFS中的/tmp資料夾。
三、用bin/hadoop fs -ls /tmp查詢

root@namenode:/usr/local/hadoop/tmp/hdfs/namenode/current# hadoop fs -ls /tmp
Found 1 items
-rw-r--r--   2 root supergroup  195257604 2014-12-25 01:31 /tmp/hadoop-2.6.0.tar.gz

四、使用fsck指令,主要用於檢查整個文件系統的健康狀況,可查出該檔案被分成幾個區塊,分別在幾台datanode

root@namenode:/usr/local/hadoop/tmp/hdfs/namenode/current# hadoop fsck /tmp/hadoop-2.6.0.tar.gz -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://namenode:50070
FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /tmp/hadoop-2.6.0.tar.gz at Thu Dec 25 02:50:41 UTC 2014
/tmp/hadoop-2.6.0.tar.gz 195257604 bytes, 2 block(s):  OK
0. BP-1142070096-172.17.0.2-1419470024422:blk_1073741825_1001 len=134217728 repl=2 [172.17.0.4:50010, 172.17.0.3:50010]
1. BP-1142070096-172.17.0.2-1419470024422:blk_1073741826_1002 len=61039876 repl=2 [172.17.0.3:50010, 172.17.0.4:50010]

     Status: HEALTHY
     Total size:    195257604 B
     Total dirs:    0
     Total files:   1
     Total symlinks:                0
     Total blocks (validated):      2 (avg. block size 97628802 B)
     Minimally replicated blocks:   2 (100.0 %)
     Over-replicated blocks:        0 (0.0 %)
     Under-replicated blocks:       0 (0.0 %)
     Mis-replicated blocks:         0 (0.0 %)
     Default replication factor:    2
     Average block replication:     2.0
     Corrupt blocks:                0
     Missing replicas:              0 (0.0 %)
     Number of data-nodes:          2
     Number of racks:               1
     FSCK ended at Thu Dec 25 02:50:41 UTC 2014 in 2 milliseconds
     The filesystem under path '/tmp/hadoop-2.6.0.tar.gz' is HEALTHY

可以得知這個檔案被分成兩個block分別是
blk_1073741825_1001大小為134217728 Byte(128MB),位於172.17.0.4跟172.17.0.3
blk_1073741825_1002大小為61039876 Byte(58MB),位於172.17.0.4跟172.17.0.4
五、我們想從slave1(172.17.0.4)裡面查看該檔案位置,所以先以ssh登入slave1

root@namenode:/usr/local/hadoop/tmp/hdfs/namenode/current# ssh root@slave1
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)
* Documentation:  https://help.ubuntu.com/
Last login: Thu Dec 25 02:45:28 2014 from namenode
root@slave1:~#

六、由於我們在設定Datanode時將檔案路徑設為:file:/usr/local/hadoop/tmp/hdfs/datanode,所以先進到此目錄

root@slave1:~# cd /usr/local/hadoop/tmp/hdfs/datanode/
root@slave1:/usr/local/hadoop/tmp/hdfs/datanode#

七、接著我們找出「blk_1073741825」這個檔案

root@slave1:/usr/local/hadoop/tmp/hdfs/datanode# find . -name "blk_1073741825*"
./current/BP-1142070096-172.17.0.2-1419470024422/current/finalized/subdir0/subdir0/blk_1073741825
./current/BP-1142070096-172.17.0.2-1419470024422/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta

上述結果得知有一檔為meta另一檔案就會是我們所切割的檔案。
八、因此我們可以查看該檔案大小來加以驗證,使用du指令
     ※du 這個指令其實會直接到檔案系統內去搜尋所有的檔案資料。
          -s:列出總量,而不列出各別目錄占用容量
          -h:顯示出較易讀的容量格式(MB/GM...)


root@slave1:/usr/local/hadoop/tmp/hdfs/datanode# du -sh ./current/BP-1142070096-172.17.0.2-1419470024422/current/finalized/subdir0/subdir0/blk_1073741825
128M    ./current/BP-1142070096-172.17.0.2-1419470024422/current/finalized/subdir0/subdir0/blk_1073741825

沒有留言:

張貼留言