?????????Distcp???????????
???????????? ???????[ 2013/11/4 11:11:21 ] ????????
????????
????DistCp??????hadoop???????????????????λ??hadoop tools???У????????1300???У??????????????HDFS?????????????????DistCp?????????????????????????ù????????????????????????MR?????????????hdfs???????????????????
?????÷?
????DistCp??÷??????±??????
OPTIONS:
-p[rbugp] Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
-p alone is equivalent to -prbugp
-i Ignore failures
-log <logdir> Write logs to <logdir>
-m <num_maps> Maximum number of simultaneous copies
-overwrite Overwrite destination
-update Overwrite if src size different from dst size
-f <urilist_uri> Use list at <urilist_uri> as src list
-filelimit <n> Limit the total number of files to be <= n
-sizelimit <n> Limit the total size to be <= n bytes
-delete Delete the files existing in the dst but not in src
????????-p??-m??-overwrite??????ò?????????????????????????????????????????£????-p?????????????????????????ж?????-m???????????????-overwrite??????-delete???????????dst??src?????diff?????????-update???????????????????е?????????????С??????distcp???????????????С??£??????????distcp??????????????????????????
?????????????????
????DistCp?????org.apache.hadoop.util.Tool??????????????????????????????????????“int run(InputStream in?? OutputStream out?? OutputStream err??String... arguments);”???ToolRunner???????????С?
????DistCp???????????????????·????????????????????????setup??????
????private static void setup(Configuration conf?? JobConf jobConf??
????final Arguments args)
?????÷?????DistCp??????????????????????????????????????????????????????·??????????????Mapper???????????????????????“_distcp_src_files”??“_distcp_dst_files”???????????????SequenceFile????Key/Value???????л?????????????????????????????/???????б??????_distcp_src_files ??key????????size??????????????0??value?????????Writable?????FilePair???????????org.apache.hadoop.fs.FileStatus??·????_distcp_dst_files??key?????·?????????FileStatus?????????????DistCp???????????setup?????У?DistCp?????????????????????????????б???????????????????
???????DistCp????268435456????256MB????з??λ????map?????????????????-sizelimit???????????????DistCp???????????InputSplit????_distcp_src_files????????????????λ?????з??????趨??-m???????????ò????趨??map???????????з?????????????з??map????????????-m?????趨?????????????????????????????????????趨???
??????
???·???
??????????????????
2023/3/23 14:23:39???д?ò??????????
2023/3/22 16:17:39????????????????????Щ??
2022/6/14 16:14:27??????????????????????????
2021/10/18 15:37:44???????????????
2021/9/17 15:19:29???·???????·
2021/9/14 15:42:25?????????????
2021/5/28 17:25:47??????APP??????????
2021/5/8 17:01:11