hive数据导入

转自:http://blog.csdn.net/yfkiss/article/details/7776406

可以通过多种方式将数据导入hive表

1.通过外部表导入

用户在hive上建external表,建表的同时指定hdfs路径,在数据拷贝到指定hdfs路径的同时,也同时完成数据插入external表。

例如:

编辑文件test.txt

$cattest.txt

1hello

2world

3test

4case

字段之间以'\t'分割

启动hive:

$hive

建external表:

hive>CREATEEXTERNALTABLEMYTEST(numINT,nameSTRING)

>COMMENT'thisisatest'

>ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'

>STOREDASTEXTFILE

>LOCATION'/data/test';

OK

Timetaken:0.714seconds

hive>showtables;

OK

mytest

partition_test

partition_test_input

test

Timetaken:0.07seconds

hive>descmytest;

OK

numint

namestring

Timetaken:0.121seconds|

数据拷贝到hdfs:

$hadoopfs-puttest.txt/data/test

查看hive表数据:

hive>select*frommytest;

OK

1hello

2world

3test

4case

Timetaken:0.375seconds

hive>selectnumfrommytest;

TotalMapReducejobs=1

LaunchingJob1outof1

......

TotalMapReduceCPUTimeSpent:510msec

OK

1

2

3

4

Timetaken:27.157seconds

这种方式常常用于当hdfs上有一些历史数据,而我们需要在这些数据上做一些hive的操作时使用。这种方式避免了数据拷贝开销

2.从本地导入

数据不在hdfs上,直接从本地导入hive表

文件/home/work/test.txt内容同上

建表:

hive>CREATETABLEMYTEST2(numINT,nameSTRING)

>COMMENT'thisisatest2'

>ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'

>STOREDASTEXTFILE;

OK

Timetaken:0.077seconds

导数据入表:

hive>LOADDATALOCALINPATH'/home/work/test.txt'INTOTABLEMYTEST2;

Copyingdatafromfile:/home/work/test.txt

Copyingfile:file:/home/work/test.txt

Loadingdatatotabledefault.mytest2

OK

Timetaken:0.24seconds

查看数据:

hive>select*fromMYTEST2;

OK

1hello

2world

3test

4case

Timetaken:0.11seconds

这种方式导入的本地数据可以是一个文件,一个文件夹或者通配符,需要注意的是,如果是文件夹,文件夹内不能包含子目录,同样,通配符只能通配文件。

3.从hdfs导入

上述test.txt文件已经导入/data/test

则可以使用下述命令直接将数据导入hive表:

hive>CREATETABLEMYTEST3(numINT,nameSTRING)

>COMMENT"thisisatest3"

>ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'

>STOREDASTEXTFILE;

OK

Timetaken:4.735seconds

hive>LOADDATAINPATH'/data/test/test.txt'INTOTABLEMYTEST3;

Loadingdatatotabledefault.mytest3

OK

Timetaken:0.337seconds

hive>select*fromMYTEST3;

OK

1hello

2world

3test

4case

Timetaken:0.227seconds

4.从其它表导入数据:

hive>CREATEEXTERNALTABLEMYTEST4(numINT);

OK

Timetaken:0.091seconds

hive>FROMMYTEST3test3

>INSERTOVERWRITETABLEMYTEST4

>selecttest3.numwherename='world';

TotalMapReducejobs=2

LaunchingJob1outof2

Numberofreducetasksissetto0sincethere'snoreduceoperator

StartingJob=job_201207230024_0002,TrackingURL=http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

KillCommand=/home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoopjob-Dmapred.job.tracker=localhost:9001-killjob_201207230024_0002

HadoopjobinformationforStage-1:numberofmappers:1;numberofreducers:0

2012-07-2318:59:02,365Stage-1map=0%,reduce=0%

2012-07-2318:59:08,417Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:09,435Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:10,445Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:11,455Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:12,470Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:13,489Stage-1map=100%,reduce=0%,CumulativeCPU0.62sec

2012-07-2318:59:14,508Stage-1map=100%,reduce=100%,CumulativeCPU0.62sec

MapReduceTotalcumulativeCPUtime:620msec

EndedJob=job_201207230024_0002

EndedJob=-174856900,jobisfilteredout(removedatruntime).

Movingdatato:hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

Loadingdatatotabledefault.mytest4

Deletedhdfs://localhost:9000/user/hive/warehouse/mytest4

Tabledefault.mytest4stats:[num_partitions:0,num_files:1,num_rows:0,total_size:2,raw_data_size:0]

1Rowsloadedtomytest4

MapReduceJobsLaunched:

Job0:Map:1AccumulativeCPU:0.62secHDFSRead:242HDFSWrite:2SUCESS

TotalMapReduceCPUTimeSpent:620msec

OK

Timetaken:30.663seconds

hive>select*frommytest4;

OK

2

Timetaken:0.103seconds

相关推荐