T-SQL: cuente el número de fallas antes del primer éxito (2)

Tengo un DB con events para bashs de tareas y sus resultados (fracaso o éxito). Para cada usuario, me gustaría contar el número de fallas antes del primer éxito. Los fallos y éxitos posteriores no deberían afectar el resultado: solo estoy interesado en el primer éxito de una tarea determinada. Además, el DB contiene filas con otros events que deberían ignorarse.

¿Cómo formulo esto en T-SQL en una database de Vertica?

(Eventualmente me gustaría calcular el número promedio de bashs por tarea, pero mantengamos eso fuera del scope de esta pregunta para mantener las cosas manejables).

Esta es una actualización de la pregunta aquí: T-SQL: cuente el número de fallas hasta el primer éxito

En la pregunta original, di datos de muestra mal construidos que no reflejaban completamente mi escenario de uso, y conduje a respuestas que no eran aplicables con mis datos reales y que no pude verificar.

La solución no debe depender del order de las filas; es posible que las filas no se llenen en el order de la timestamp.

Aquí está la configuration de DB:

CREATE TABLE events { eventID int -- unused in this example, should be excluded from output , eventName varchar(256) , userName varchar(256) , timestamp timestamp , taskName varchar(256) , sessionID int -- unused in this example, should be excluded from output }; INSERT INTO events VALUES (2363460186192576512, 'beginSession', 'John', '2017-08-14 09:46:46.712', NULL, 145031357) , (2363460852537008128, 'success', 'John', '2017-08-14 09:49:32.471', 'TaskOne', 145031357) , (2363461162974437376, 'success', 'John', '2017-08-14 09:50:48.781', 'TaskOne', 145031357) , (2363460390131740672, 'fail', 'John', '2017-08-14 09:47:37.349', 'TaskOne', 145031357) , (2363460556662710272, 'fail', 'John', '2017-08-14 09:48:23.024', 'TaskOne', 145031357) , (2363460730671505408, 'fail', 'John', '2017-08-14 09:48:58.646', 'TaskOne', 145031357) , (2363461032111800320, 'fail', 'John', '2017-08-14 09:50:10.726', 'TaskOne', 145031357) , (2363460389896859648, 'beginTask', 'John', '2017-08-14 09:47:05.32', 'TaskOne', 145031357) , (2363460463137751040, 'beginTask', 'John', '2017-08-14 09:47:52.166', 'TaskOne', 145031357) , (2363460556205531136, 'beginTask', 'John', '2017-08-14 09:48:12.615', 'TaskOne', 145031357) , (2363460692671205376, 'beginTask', 'John', '2017-08-14 09:48:36.155', 'TaskOne', 145031357) , (2363460852268572672, 'beginTask', 'John', '2017-08-14 09:49:12.047', 'TaskOne', 145031357) , (2363460962524327936, 'beginTask', 'John', '2017-08-14 09:49:47.951', 'TaskOne', 145031357) , (2363461162714390528, 'beginTask', 'John', '2017-08-14 09:50:23.645', 'TaskOne', 145031357) , (2363474741421064192, 'beginSession', 'John', '2017-08-14 10:44:36.042', NULL, 145031392) , (2363474885491200000, 'success', 'John', '2017-08-14 10:45:14.577', 'TaskTwo', 145031392) , (2363475342389641216, 'success', 'John', '2017-08-14 10:47:04.098', 'TaskTwo', 145031392) , (2363475473998635008, 'success', 'John', '2017-08-14 10:47:34.135', 'TaskOne', 145031392) , (2363475822079254528, 'success', 'John', '2017-08-14 10:48:53.381', 'TaskTwo', 145031392) , (2363476096949104640, 'success', 'John', '2017-08-14 10:50:07.441', 'TaskThree', 145031392) , (2363475066098266112, 'fail', 'John', '2017-08-14 10:45:53.526', 'TaskTwo', 145031392) , (2363475195152531456, 'fail', 'John', '2017-08-14 10:46:32.81', 'TaskTwo', 145031392) , (2363475654638821376, 'fail', 'John', '2017-08-14 10:48:13.71', 'TaskThree', 145031392) , (2363476247751114752, 'beginSession', 'Mike', '2017-08-14 10:50:37.67', NULL, 145030476) , (2363476335819063296, 'success', 'Mike', '2017-08-14 10:51:06.841', 'TaskOne', 145030476) , (2363476485643796480, 'success', 'Mike', '2017-08-14 10:51:41.086', 'TaskTwo', 145030476) , (2363476806063038464, 'success', 'Mike', '2017-08-14 10:52:53.174', 'TaskTwo', 145030476) , (2363477266119335936, 'success', 'Mike', '2017-08-14 10:54:32.053', 'TaskThree', 145030476) , (2363477619191631872, 'success', 'Mike', '2017-08-14 10:56:01.783', 'TaskThree', 145030476) , (2363476705131655168, 'fail', 'Mike', '2017-08-14 10:52:21.312', 'TaskThree', 145030476) , (2363476939634896896, 'fail', 'Mike', '2017-08-14 10:53:28.906', 'TaskThree', 145030476) , (2363477390937976832, 'fail', 'Mike', '2017-08-14 10:55:05.499', 'TaskThree', 145030476) , (2363476335592570880, 'beginTask', 'Mike', '2017-08-14 10:50:50.074', 'TaskOne', 145030476) , (2363476485501190144, 'beginTask', 'Mike', '2017-08-14 10:51:20.784', 'TaskTwo', 145030476) , (2363476704779333632, 'beginTask', 'Mike', '2017-08-14 10:51:54.829', 'TaskThree', 145030476) , (2363476805752659968, 'beginTask', 'Mike', '2017-08-14 10:52:34.001', 'TaskTwo', 145030476) , (2363476939496484864, 'beginTask', 'Mike', '2017-08-14 10:53:06.468', 'TaskThree', 145030476) , (2363477265938980864, 'beginTask', 'Mike', '2017-08-14 10:53:45.631', 'TaskThree', 145030476) , (2363477390635986944, 'beginTask', 'Mike', '2017-08-14 10:54:44.706', 'TaskThree', 145030476) , (2363477573427560448, 'beginTask', 'Mike', '2017-08-14 10:55:17.231', 'TaskThree', 145030476) , (2363474885214375936, 'beginTask', 'John', '2017-08-14 10:44:44.702', 'TaskTwo', 145031392) , (2363474985177161728, 'beginTask', 'John', '2017-08-14 10:45:31.133', 'TaskTwo', 145031392) , (2363475195014119424, 'beginTask', 'John', '2017-08-14 10:46:10.098', 'TaskTwo', 145031392) , (2363475342184120320, 'beginTask', 'John', '2017-08-14 10:46:45.357', 'TaskTwo', 145031392) , (2363475473616953344, 'beginTask', 'John', '2017-08-14 10:47:17.911', 'TaskOne', 145031392) , (2363475654437494784, 'beginTask', 'John', '2017-08-14 10:47:47.681', 'TaskThree', 145031392) , (2363475771776864256, 'beginTask', 'John', '2017-08-14 10:48:27.1', 'TaskTwo', 145031392) , (2363476006456762368, 'beginTask', 'John', '2017-08-14 10:49:06.151', 'TaskThree', 145031392) ; 

Con estos datos, aquí está el resultado que trato de lograr:

 userName taskName numFailuresBeforeFirstSuccess John TaskOne 3 John TaskTwo 0 John TaskThree 1 Mike TaskOne 0 Mike TaskTwo 0 Mike TaskThree 3 

Aquí hay un método:

 select e.username, e.taskname, sum(case when timestamp < first_success_ts and e.eventname = 'fail' then 1 else 0 end) as numFailuresBeforeSuccess from (select e.*, min(case when e.eventname = 'success' then e.timestamp end) over (partition by e.username, e.taskname) as first_success_ts from events e ) e group by e.username, e.taskname order by e.username, e.taskname; 

Esto calcula el primer time de éxito usando una function de window. Esto debería funcionar en ambas bases de datos (al less en SQL Server 2012+)

Una vez más, esto es TSQL y no Vertica, pero es bastante estándar SQL, siempre que Vertica soporte CTE.

 ; WITH cte1 AS ( SELECT t1.userName, t1.taskName, t1.ts , LAG(t1.ts) OVER (PARTITION BY t1.userName, t1.taskName ORDER BY t1.ts) AS PreviousTS , ROW_NUMBER() OVER (PARTITION BY t1.userName ORDER BY t1.ts) AS rn FROM #taskevents t1 WHERE t1.eventName = 'Success' ) SELECT s1.userName, s1.taskName, AVG(s1.failCount) AS avgFailCount FROM ( SELECT cte1.userName, cte1.taskName , cte1.rn, COALESCE(COUNT(t2.ts),0) AS failCount FROM cte1 LEFT OUTER JOIN #taskevents t2 ON t2.userName = cte1.userName AND t2.taskName = cte1.taskName AND t2.ts < cte1.ts AND ( t2.ts >= cte1.PreviousTS OR cte1.PreviousTS IS NULL ) AND t2.eventName = 'fail' GROUP BY cte1.userName, cte1.taskName, cte1.rn ) s1 GROUP BY s1.userName, s1.taskName ORDER BY s1.userName, s1.taskName 

Esto le da sus promedios. Elimine la consulta externa para get los datos con los que estoy trabajando. Produce resultados ligeramente diferentes de los que había enumerado, pero debería dar el promedio adecuado que usted dijo que estaba buscando. Por favor, avíseme si entendí mal los requisitos.

NOTA: en los datos de mi testing, también agregué dos personas que tuvieron fallas sin éxito, solo para verificar que se excluyeron de los resultados.

 , (2363476006456762398, 'fail', 'Steve', '2017-08-14 11:29:06.151', 'Task42', 145031342) , (2363476046456762368, 'fail', 'Joe', '2017-08-14 11:49:06.151', 'Task42', 145031399) 

==================================

RESULTADOS

 ----------------------------------- |userName| taskName |avgFailCount| ----------------------------------- | John | TaskOne | 1 | | John | TaskThree | 1 | | John | TaskTwo | 0 | | Mike | TaskOne | 0 | | Mike | TaskThree | 1 | | Mike | TaskTwo | 0 | ----------------------------------- 

=============================================== ======================

EDITAR: Para el promedio solo por la tarea:

 ; WITH cte1 AS ( SELECT t1.userName, t1.taskName, t1.ts , LAG(t1.ts) OVER (PARTITION BY t1.userName, t1.taskName ORDER BY t1.ts) AS PreviousTS , ROW_NUMBER() OVER (PARTITION BY t1.userName ORDER BY t1.ts) AS rn FROM #taskevents t1 WHERE t1.eventName = 'Success' ) SELECT s1.taskName , AVG(CAST(s1.failCount AS decimal(5,2))) AS avgFailCount FROM ( SELECT cte1.userName, cte1.taskName , cte1.rn, COALESCE(COUNT(t2.ts),0) AS failCount FROM cte1 LEFT OUTER JOIN #taskevents t2 ON t2.userName = cte1.userName AND t2.taskName = cte1.taskName AND t2.ts < cte1.ts AND ( t2.ts >= cte1.PreviousTS OR cte1.PreviousTS IS NULL ) AND t2.eventName = 'fail' GROUP BY cte1.userName, cte1.taskName, cte1.rn ) s1 GROUP BY s1.taskName ORDER BY s1.taskName 

Que te da

 -------------------------- | taskName |avgFailCount| -------------------------- | TaskOne | 1.000000 | | TaskThree | 1.333333 | | TaskTwo | 0.400000 | -------------------------- 

Que es esencialmente

 SELECT (3+1+0+0)/4.0 AS TaskOne SELECT (0+2+0+0+0)/5.0 AS TaskTwo SELECT (1+2+1)/3.0 AS TaskThree 

Derivado de los siguientes puntos de datos.

 -------------------------------- |userName| taskName |FailCount| -------------------------------- | John | TaskOne | 3 | | John | TaskOne | 1 | | John | TaskOne | 0 | | Mike | TaskOne | 0 | | John | TaskTwo | 0 | | John | TaskTwo | 2 | | John | TaskTwo | 0 | | Mike | TaskTwo | 0 | | Mike | TaskTwo | 0 | | John | TaskThree | 1 | | Mike | TaskThree | 2 | | Mike | TaskThree | 1 | -------------------------------- 

Este es un número promedio de fallas antes de un éxito, no el número promedio de fallas por bash. Eso sería un poco diferente.

 --------------------------------------------------- | task | fails | attempts | avg fails per attempt | --------------------------------------------------- | Task1| 4 | 8 | 4/8 = 0.500000 | | Task2| 2 | 7 | 2/7 = 0.285714 | | Task3| 3 | 7 | 3/7 = 0.428571 | --------------------------------------------------- 

Esta consulta:

  with F as ( select * from Evts where eventName = 'fail' ), S as ( select * from Evts E cross apply ( select count(F.eventID) numFailuresBeforeFirstSuccess from F where F.userName = E.userName and E.taskName = F.taskName and F.timestamp < E.timestamp ) K where eventName = 'success' ) select userName, taskName, numFailuresBeforeFirstSuccess from (select *, row_number() over (partition by userName, taskName order by [timestamp] desc) o from S ) S where o = 1 

produce este resultado:

 userName taskName numFailuresBeforeFirstSuccess ----------- ----------- ----------------------------- John TaskOne 4 John TaskThree 1 John TaskTwo 2 Mike TaskOne 0 Mike TaskThree 3 Mike TaskTwo 0 

La explicación anterior se aplica aquí.

Rextester Demo

    Intereting Posts