¿Por qué la eliminación de particiones no ocurre para esta consulta?

Tengo una tabla de hives que está dividida por año, mes, día y hora. Necesito ejecutar una consulta para get los datos de los últimos 7 días. Esto está en Hive 0.14.0.2.2.4.2-2 . Mi consulta actualmente se ve así:

 SELECT COUNT(column_name) from table_name where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7)); 

Esto lleva mucho time. Cuando sustituya los numbers reales por los de arriba, diga algo como:

 SELECT COUNT(column_name) from table_name where year >= 2017 AND month >= 2 AND day >= 13 

termina en unos minutos. ¿Hay alguna manera de cambiar la secuencia de commands anterior para que realmente incluya solo los numbers en la consulta en lugar de las funciones?

Intenté usar un set como:

 set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7)); SELECT COUNT(column_name) from table_name where year >= ${hiveconf:yearLimit} AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7)); 

pero esto no resuelve el problema.

Solución

 select count (column_name) from table_name where year >= year (date_sub (current_date,7)) and month >= month (date_sub (current_date,7)) and day >= day (date_sub (current_date,7)) ; 

¿Qué salió mal con la consulta original?

unix_timestamp ()

Obtiene la timestamp actual de Unix en segundos. Esta function no es determinista y su valor no se fija para el scope de la ejecución de una consulta, por lo tanto, impide una optimization adecuada de las consultas; esto se ha desaprobado desde 2.0 en favor de la constante CURRENT_TIMESTAMP.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

(Acabo de cambiar la documentation un poco 🙂 )

Como los valores de unix_timestamp () pueden cambiar durante la ejecución, la expresión debe evaluarse para cada fila, evitando así la eliminación de particiones.

¿Por qué usar SET no funcionó?

set no es más que un mecanismo de reemploop de text.
Nada se está calculando durante el set .
Lo único que sucede es que a las variables se les asigna un text .
Antes de que se ejecute la consulta, las variables place holders ( ${hiveconf:...} ) se reemplazan por el text asignado.
Solo entonces la consulta se está parsing y ejecutando.

 hive> set a=sele; hive> set b=ct 1+; hive> set c=1; hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c}; OK 2 

Manifestación

 create table table_name (column_name int) partitioned by (year int,month int,day int); set hive.exec.dynamic.partition.mode=nonstrict; insert into table_name partition (year,month,day) select pos ,year(dt) ,month(dt) ,day(dt) from (select pe.pos ,date_sub (current_date,pe.pos) as dt from (select 1) x lateral view posexplode (split (space (99),' ')) pe ) t ; 

 explain dependency select count (column_name) from table_name where year >= year (date_sub (from_unixtime (unix_timestamp ()),7)) and month >= month (date_sub (from_unixtime (unix_timestamp ()),7)) and day >= day (date_sub (from_unixtime (unix_timestamp ()),7)) ; 

{"input_partitions": [{"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 14"}, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 15 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 16 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 17 " }, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 18"}, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 19"}, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 20"}, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 21"}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 22 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 23 "}, {" partitionName " : "default @ table_name @ year = 2016 / month = 11 / day = 24"}, {"partitionName": "default @ table_name @ year = 2016 / month = 11 / day = 25"}, {"partitionName": " default @ table_name @ year = 2016 / month = 11 / day = 26 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 27 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 28 "}, {" pa rtitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 29 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 11 / day = 30 "}, {" partitionName " : "default @ table_name @ year = 2016 / month = 12 / day = 1"}, {"partitionName": "default @ table_name @ year = 2016 / month = 12 / day = 10"}, {"partitionName": " default @ table_name @ year = 2016 / month = 12 / day = 11 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 12 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 13 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 14 "}, {" partitionName ":" default @ table_name @ año = 2016 / mes = 12 / día = 15 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 16 "}, {" partitionName ":" default @ table_name @ year = 2016 / mes = 12 / día = 17 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 18 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 19 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 2 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / día = 20 "}, {" partitionName ":" tabla @ pnetworkingeterminada " _name @ year = 2016 / month = 12 / day = 21 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 22 "}, {" partitionName ":" default @ table_name @ año = 2016 / mes = 12 / día = 23 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 24 "}, {" partitionName ":" default @ table_name @ year = 2016 / mes = 12 / día = 25 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 26 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 27 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 28 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / día = 29 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 3 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / día = 30 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 31 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 4 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 5 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 6 " }, {"partitionName": "default @ table_name @ year = 2016 / month = 12 / day = 7 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 8 "}, {" partitionName ":" default @ table_name @ year = 2016 / month = 12 / day = 9 " }, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 1"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 10"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 11"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 12"}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 13 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 14 "}, {" partitionName " : "default @ table_name @ year = 2017 / month = 1 / day = 15"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 16"}, {"partitionName": " default @ table_name @ year = 2017 / month = 1 / day = 17 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 18 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 19 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 2 "}, {" partitionName ":" default @ table_name @ año = 2017 / mes = 1 / día = 20 "}, {" partitionName ":" default @ table_name @ y ear = 2017 / month = 1 / day = 21 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 22 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 23 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 24 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 25 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 26 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 27 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 28 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / día = 29 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 3 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 30 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 31 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 4 " }, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 5"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 6"}, {"partitionName": "default @ table_name @ year = 2017 / month = 1 / day = 7"}, {"partitionName": "d efault @ table_name @ year = 2017 / month = 1 / day = 8 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 1 / day = 9 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 1 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 10 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 11 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 12 "}, {" partitionName ":" default @ table_name @ year = 2017 / mes = 2 / día = 13 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 14 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 15 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 16 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 17 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 18 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / día = 19 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 2 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 20 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 21 "} , {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 3"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 4"}, { "partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 5"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 6"}, {"partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 7 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 8 "}, {" partitionName ": "default @ table_name @ year = 2017 / month = 2 / day = 9"}], "input_tables": [{"tablename": "default @ table_name", "tabletype": "MANAGED_TABLE"}]}

 explain dependency select count (column_name) from table_name where year >= year (date_sub (current_date,7)) and month >= month (date_sub (current_date,7)) and day >= day (date_sub (current_date,7)) ; 

{"input_partitions": [{"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 14"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 15 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 16 "}, {" partitionName ":" default @ table_name @ year = 2017 / month = 2 / day = 17 " }, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 18"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 19"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 20"}, {"partitionName": "default @ table_name @ year = 2017 / month = 2 / day = 21"}], " input_tables ": [{" tablename ":" default @ table_name "," tabletype ":" MANAGED_TABLE "}]}