precision and recall

precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved.


Hive UDF test

get dual table by

CREATE TABLE dual (dummy STRING); 
INSERT INTO select 'x' from another_table limit 1;
select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;

select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;

insert into and insert overwrite


  • INSERT INTO to append data to a table.
  • INSERT OVERWRITE to replace the data in a table; each new set of inserted rows replaces any existing data in the table.


# from a table with the same definition 
insert into table text_table select * from default.tab1;

# The VALUES clause is a general-purpose way to specify all the columns of a row or multiple rows
insert into val_example values (1,true,100.0);


stop words

In computing, stop words are words which are filtered out before or after processing of natural language data (text).[1] There is not one definite list of stop words which all tools use and such a filter is not always used. Some tools specifically avoid removing them to support phrase search.

Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as ‘The Who’, ‘The The’, or ‘Take That’. Other search engines remove some of the most common words—including lexical words, such as “want”—from a query in order to improve performance

bug bash

In software development, a bug bash is a procedure where all the developers, testers, program managers, usability researchers, designers, documentation folks, and even sometimes marketing people, put aside their regular day-to-day duties and “pound on the product”—that is, each exercises the product in every way they can think of. Because each person will use the product in slightly different (or very different) ways, and the product is getting a great deal of use in a short amount of time, this approach may reveal bugs relatively quickly.