Obtenga sobrevuelo en el time superpuesto

Tengo una tabla con sesiones en línea como esta (las filas vacías son solo para una mejor visibilidad):

ip_address | start_time | stop_time ------------|------------------|------------------ 10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:12 10.10.10.10 | 2016-04-02 08:11 | 2016-04-02 08:20 10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:10 10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:08 10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:11 10.10.10.10 | 2016-04-02 09:02 | 2016-04-02 09:15 10.10.10.10 | 2016-04-02 09:10 | 2016-04-02 09:12 10.66.44.22 | 2016-04-02 08:05 | 2016-04-02 08:07 10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11 

Y necesito el time en línea "envolvente":

 ip_address | full_start_time | full_stop_time ------------|------------------|------------------ 10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:20 10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:15 10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11 

Tengo esta consulta que devuelve el resultado deseado:

 WITH t AS -- Determine full time-range of each IP (SELECT ip_address, MIN(start_time) AS min_start_time, MAX(stop_time) AS max_stop_time FROM IP_SESSIONS GROUP BY ip_address), t2 AS -- compose ticks (SELECT DISTINCT ip_address, min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE AS ts FROM t CONNECT BY min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE <= max_stop_time), t3 AS -- get all "online" ticks (SELECT DISTINCT ip_address, ts FROM t2 JOIN IP_SESSIONS USING (ip_address) WHERE ts BETWEEN start_time AND stop_time), t4 AS (SELECT ip_address, ts, LAG(ts) OVER (PARTITION BY ip_address ORDER BY ts) AS previous_ts FROM t3), t5 AS (SELECT ip_address, ts, SUM(DECODE(previous_ts,NULL,1,0 + (CASE WHEN previous_ts + INTERVAL '1' MINUTE <> ts THEN 1 ELSE 0 END))) OVER (PARTITION BY ip_address ORDER BY ts ROWS UNBOUNDED PRECEDING) session_no FROM t4) SELECT ip_address, MIN(ts) AS full_start_time, MAX(ts) AS full_stop_time FROM t5 GROUP BY ip_address, session_no ORDER BY 1,2; 

Sin embargo, estoy preocupado por el performance. La tabla tiene cientos de millones de filas y la resolución de time es de milisegundos (no un minuto como se da en el ejemplo). Por lo tanto, CTE t3 va a ser enorme. ¿Alguien tiene una solución que evita el autoensamblarse y "CONECTAR POR"?

Una única function analítica inteligente sería genial.

Prueba este, también. Lo probé lo mejor que pude, creo que cubre todas las posibilidades, incluidos los intervalos adyacentes coalescentes (10:15 a 10:30 y 10:30 a 10:40 se combinan en un solo intervalo, de 10:15 a 10:40 ) También debería ser bastante rápido, no usa mucho.

 with m as ( select ip_address, start_time, max(stop_time) over (partition by ip_address order by start_time rows between unbounded preceding and 1 preceding) as m_time from ip_sessions union all select ip_address, NULL, max(stop_time) from ip_sessions group by ip_address ), n as ( select ip_address, start_time, m_time from m where start_time > m_time or start_time is null or m_time is null ), f as ( select ip_address, start_time, lead(m_time) over (partition by ip_address order by start_time) as stop_time from n ) select * from f where start_time is not null / 

Pruebe esta solución, funciona para sus ejemplos, pero puede haber algunos casos que no noté. Sin connection, sin auto-unión.

 with io as ( select * from ( select ip_address, t1, io, sum(io) over (partition by ip_address order by t1) sio from ( select ip_address, start_time t1, 1 io from ip_sessions union all select ip_address, stop_time, -1 io from ip_sessions ) ) where (io = 1 and sio = 1) or (io = -1 and sio = 0) ) select ip_address, t1, t2 from ( select io.*, lead(t1) over (partition by ip_address order by t1) as t2 from io) where io = 1 

Datos de testing:

 create table ip_sessions (ip_address varchar2(15), start_time date, stop_time date); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 08:00:00', timestamp '2016-04-02 08:12:00'); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 08:11:00', timestamp '2016-04-02 08:20:00'); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 09:00:00', timestamp '2016-04-02 09:10:00'); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 09:05:00', timestamp '2016-04-02 09:08:00'); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 09:02:00', timestamp '2016-04-02 09:15:00'); insert into ip_sessions values ('10.10.10.10', timestamp '2016-04-02 09:10:00', timestamp '2016-04-02 09:12:00'); insert into ip_sessions values ('10.66.44.22', timestamp '2016-04-02 08:05:00', timestamp '2016-04-02 08:07:00'); insert into ip_sessions values ('10.66.44.22', timestamp '2016-04-02 08:03:00', timestamp '2016-04-02 08:11:00'); 

Salida:

 IP_ADDRESS T1 T2 ----------- ------------------- ------------------- 10.10.10.10 2016-04-02 08:00:00 2016-04-02 08:20:00 10.10.10.10 2016-04-02 09:00:00 2016-04-02 09:15:00 10.66.44.22 2016-04-02 08:03:00 2016-04-02 08:11:00 

Creo que usar lag() y sum acumulativa tendrá un performance mucho mejor:

 select ip_address, min(start_time) as full_start_time, max(end_time) as full_end_time from (select t.*, sum(case when prev_et >= start_time then 0 else 1 end) over (partition by ip_address order by start_time) as grp from (select s.*, lag(end_time) over (partition by ip_address order by end_time) as prev_et from ip_seesions s) ) t group by grp, ip_address order by 1, 2; 

da resultado:

 ip_address | full_start_time | full_stop_time ------------|------------------|------------------ 10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 09:15 10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:12 10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11 10.66.44.22 | 2016-04-02 08:05 | 2016-04-02 08:07 

Al final terminé con una function que cumple mis requisitos. Creo que va en una dirección similar a la respuesta de Ponder Stibbons.

 CREATE OR REPLACE TYPE SESSION_REC AS OBJECT (START_TIME TIMESTAMP_UNCONSTRAINED, STOP_TIME TIMESTAMP_UNCONSTRAINED); CREATE OR REPLACE TYPE SESSION_TYPE AS TABLE OF SESSION_REC; CREATE OR REPLACE TYPE TIMESTAMP_TAB AS TABLE OF TIMESTAMP_UNCONSTRAINED; CREATE OR REPLACE FUNCTION ENVELOP_SESSIONS(v_ipaddress IN VARCHAR2) RETURN SESSION_TYPE PIPELINED IS rec SESSION_REC; startTimes TIMESTAMP_TAB; stopTimes TIMESTAMP_TAB; TYPE ActionRecType IS RECORD (TS TIMESTAMP_UNCONSTRAINED, ACTION INTEGER); TYPE ActionTableType IS TABLE OF ActionRecType; actions ActionTableType; onlineCount INTEGER := 0; BEGIN SELECT START_TIME, STOP_TIME BULK COLLECT INTO startTimes, stopTimes FROM IP_SESSIONS WHERE IP_ADDRESS = v_ipaddress; WITH t AS (SELECT COLUMN_VALUE AS ts, 1 AS action FROM TABLE(startTimes) UNION ALL SELECT COLUMN_VALUE AS ts, -1 AS action FROM TABLE(stopTimes)) SELECT ts, action BULK COLLECT INTO actions FROM t ORDER BY ts, action; IF actions.COUNT > 0 THEN FOR i IN actions.FIRST..actions.LAST LOOP IF onlineCount = 0 AND actions(i).ACTION = 1 THEN -- session starts rec := SESSION_REC(actions(i).TS, NULL); ELSIF onlineCount = 1 AND actions(i).ACTION = -1 THEN -- session ends rec := SESSION_REC(rec.START_TIME, actions(i).TS); PIPE ROW(rec); END IF; onlineCount := onlineCount + actions(i).ACTION; END LOOP; END IF; RETURN; END ENVELOP_SESSIONS; 
    Intereting Posts