Hive UDF test

get dual table by

CREATE TABLE dual (dummy STRING); 
INSERT INTO select 'x' from another_table limit 1;
select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;

http://stackoverflow.com/questions/9795668/does-hive-have-something-equivalent-to-dual

select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;

https://issues.apache.org/jira/browse/HIVE-3298

http://stackoverflow.com/questions/17425492/hive-insert-query-like-sql

http://my.oschina.net/repine/blog/193867#OSC_h3_15

Advertisements

insert into and insert overwrite

 

  • INSERT INTO to append data to a table.
  • INSERT OVERWRITE to replace the data in a table; each new set of inserted rows replaces any existing data in the table.

 

# from a table with the same definition 
insert into table text_table select * from default.tab1;


# The VALUES clause is a general-purpose way to specify all the columns of a row or multiple rows
insert into val_example values (1,true,100.0);

 

union all

The SQL UNION ALL operator is used to combine the result sets of 2 or more SELECT statements. It returns all rows from the query (even if the row exists in more than one of the SELECT statements).

hive

http://en.wikipedia.org/wiki/Apache_Hive

HDFS=hadoop distributed file system

Hadoop = HDFS + MapReduce

Hive = provide a SQL layer over HDFS or other file system

 

Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilizes Apache Hadoop‘s MapReduce implementation to process graphs. Facebook used Giraph with some performance improvements to analyze one trillion edges using 200 machines in 4 minute