Where i can find Average Session Duration in Firebase Analytics. How to Extract this Metrics Through Bigquery - google-bigquery

Where to find Avg. Session Duration Metrics in Firebase analytics?
How to extract Avg. Session Duration Metrics data from Bigquery?
Avg. Session Duration Metrics which was previous available in Firebase analytics dashboard. But now, it is not available in Firebase analytics dashboard. Now, we are only seeing "Engagement Per User". Is the Engagement Per User and Avg. Session Duration Both are same? How to extract Avg. Session Duration from Fiebase analytics? How to query in Bigquery to extract Avg. Session duration metrics from Firebase.
enter image description here

Engagement per User is not the same as Avg. Session Duration. Engagement per User is all the time a user spends in the app in a day, not in a session.
You can find Avg. Session Duration in Firebase Analytics under Latest Release.
Here is a query for calculating avg. session length in BigQuery:
with timeline as
(
select
user_pseudo_id
, event_timestamp
, lag(event_timestamp, 1) over (partition by user_pseudo_id order by event_timestamp) as prev_event_timestamp
from
`YYYYY.analytics_XXXXX.events_*`
where
-- at first - a sliding period - how many days in the past we are looking into:
_table_suffix
between format_date("%Y%m%d", date_sub(current_date, interval 10 day))
and format_date("%Y%m%d", date_sub(current_date, interval 1 day))
)
, session_timeline as
(
select
user_pseudo_id
, event_timestamp
, case
when
-- half a hour period - a threshold for a new 'session'
event_timestamp - prev_event_timestamp >= (30*60*1000*1000)
or
prev_event_timestamp is null
then 1
else 0
end as is_new_session_flag
from
timeline
)
, marked_sessions as
(
select
user_pseudo_id
, event_timestamp
, sum(is_new_session_flag) over (partition by user_pseudo_id order by event_timestamp) AS user_session_id
from session_timeline
)
, measured_sessions as
(
select
user_pseudo_id
, user_session_id
-- session duration in seconds with 2 digits after the point
, round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000), 2) as session_duration
from
marked_sessions
group by
user_pseudo_id
, user_session_id
having
-- let's count only sessions longer than 10 seconds
session_duration >= 10
)
select
count(1) as number_of_sessions
, round(avg(session_duration), 2) as average_session_duration_in_sec
from
measured_sessions

Related

google bigQuery realtime does not match ga report

Hello I would like to see real-time data status using google bigquery real time table.
However, simple query statements do not match GA reports. I created a query that shows the number of sessions per hour, but I had an error rate of 10 to 30%.
Is the accuracy of google bigquery realtime not so good? Or am I making a mistake?
WITH noDuplicateTable as (
SELECT
ARRAY_AGG (t ORDER BY exportTimeUsec DESC LIMIT 1) [OFFSET (0)]. *
FROM
`tablename_20 *` AS t
WHERE
_TABLE_SUFFIX = FORMAT_DATE ("% y% m% d", CURRENT_DATE ('Asia / Seoul'))
GROUP BY
T.VisitKey
),
session as (
SELECT
ROW_NUMBER () OVER () sessionRow,
FORMAT_TIMESTAMP ('% H', TIMESTAMP_SECONDS (time), 'Asia / Seoul') AS startTime,
sum (session) as session,
(sum (session) -sum (isvisit)) as uniqueSession,
(sum (isvisit) / sum (session) * 100) as bounce,
sum (totalPageView) as totalPageView
FROM (
SELECT
count (visitId) as session,
visitStartTime as time,
sum (Ifnull (totals.Bounces, 0)) as isVisit,
sum (totals.pageviews) as totalPageView
FROM
noDuplicateTable
GROUP BY
visitStartTime
)
GROUP BY startTime
)
select * from session

Using BigQuery to do a subquery on a date array

I have a table which stores sales targets - these are typically set by month, but entered by day - which means the daily target is the month target divided by the number of days.
This is a labour-intensive way of entering the targets, so I want to recreate the table with start and end dates:
WITH targets AS (
SELECT DATE '2018-01-01' AS dateStart, DATE '2018-01-31' AS dateEnd, 'uk' AS market, NUMERIC '1550' AS quantity
UNION ALL SELECT '2018-02-01', '2018-02-28', "uk", 560
)
In my query, I need to generate a date array (dateStart to dateEnd), then for each date in the array, apply the market and divide the target by number of dates in the array - but I can't get it working. I'm looking to do something like:
SELECT
*,
(SELECT market FROM targets WHERE dr IN GENERATE_DATE_ARRAY(targets.dateStart, targets.dateEnd, INTERVAL 1 DAY)) AS market,
(SELECT SAFE_DIVIDE(budget, COUNT(GENERATE_DATE_ARRAY(targets.dateStart, targets.dateEnd, INTERVAL 1 DAY)) FROM targets WHERE dr IN GENERATE_DATE_ARRAY(targets.dateStart, targets.dateEnd, INTERVAL 1 DAY)) AND targets.market = market AS budget
FROM UNNEST(GENERATE_DATE_ARRAY(targets.dateStart, targets.dateEnd, INTERVAL 1 DAY)) AS dr
This would mean less data entry and fewer rows in the source table (which is a Google Sheet, so limits will eventually be reached). Thanks for your help.
Below is for BigQuery Standard SQL
#standardSQL
WITH targets AS (
SELECT DATE '2018-01-01' AS dateStart, DATE '2018-01-31' AS dateEnd, 'uk' AS market, NUMERIC '1550' AS quantity
UNION ALL SELECT '2018-02-01', '2018-02-28', "uk", 560
)
SELECT market, day, quantity / days AS target
FROM targets,
UNNEST(GENERATE_DATE_ARRAY(dateStart, dateEnd)) day,
UNNEST([DATE_DIFF(dateEnd, dateStart, DAY) + 1]) days
ORDER BY market, day

Google BigQuery aggregate OHLC data over time window

There is time series trading transactions history stored with google's BigQuery.
# Transaction history scheme
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
price FLOAT REQUIRED
size FLOAT REQUIRED
ts TIMESTAMP REQUIRED
is_sell BOOLEAN NULLABLE
_PARTITIONTIME TIMESTAMP NULLABLE
exchange_id - platform where transation occured
from_id - base symbol
to_id - quote symbol
price - trade price
size - trade quantity
I need to aggregate OHLC data over 30 seconds time interval grouped by
exchange_id, from_id, to_id. How can I do this in the BigQuery?
# Required OHLC aggregated data scheme
ts TIMESTAMP REQUIRED
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
open FLOAT REQUIRED
high FLOAT REQUIRED
low FLOAT REQUIRED
close FLOAT REQUIRED
volume FLOAT REQUIRED
_PARTITIONTIME TIMESTAMP NULLABLE
open - first price in interval
high - highest price..
low - lowest price..
close - last price..
volume - SUM of all trade size's in current interval
Most promising ideas were:
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(ts) -
60 * 1000000
) AS time,
exchange_id,
from_id,
to_id,
MIN(price) as low,
MAX(price) as high,
SUM(size) as volume
FROM
`table`
GROUP BY
time, exchange_id, from_id, to_id
ORDER BY
time
And this one:
SELECT
exchange_id,from_id,to_id,
MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM [table];
# returns:
1 1 4445 3808 9.0E-8 9.0E-8 300000.0
2 1 4445 3808 9.0E-8 9.0E-8 300000.0
3 1 4445 3808 9.0E-8 9.0E-8 300000.0
...
14 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
15 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
16 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
But nothing of this works. It seems that I missing something important about sliding window in BigQuery.
Below is for BigQuery Standard SQL
#standardsql
SELECT
exchange_id,
from_id,
to_id,
TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(price) high,
MIN(price) low,
ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(size) volume
FROM `yourproject.yourdataset.yourtable`
GROUP BY 1, 2, 3, 4
Have found an elegant way to make aggregation over predefined date_parts (docs). It's very helpful when you need to aggregate over Mondays or Months.
DATETIME_TRUNC support next parameters:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
MONTH
QUARTER
YEAR
You can aggregate use it like this:
#standardsql
SELECT
TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(high) high,
MIN(low) low,
ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(volume) volume
FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
GROUP BY timestamp
ORDER BY timestamp ASC

Loop over data in BigQuery

We have been trying quite hard to loop over data in (standard sql) BigQuery to no success.
I am not sure if it is the supported functionality of sql, our undestanding of the problem or the way we want to do this as we want to do it within BigQuery.
Anyhow, let's say we have a table of events where each event is described by a user id and a date (there can be many events on the same date by the same user id)
id STRING
dt DATE
One thing we want to know is how many distinct users generated events within a given period of time. This is rather trivial, just a COUNT on the table with the period as constraint in the WHERE clause. For example, if we have four months as our period of time:
SELECT
COUNT(DISTINCT id) AS total
FROM
`events`
WHERE
dt BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -4 MONTH)
AND CURRENT_DATE()
However, our issues come if we want the history as well for other days (or weeks) recursively with the same given period of time. For example, for yesterday, the day before yesterday, etc... till... for example, 3 months ago. So the variable here would be CURRENT_DATE() that goes back by one day or whichever factor but the interval remains the same (in our case, 4 months). We are expecting something like this (with a factor of one day):
2017-07-14 2017-03-14 1760333
2017-07-13 2017-03-13 1856333
2017-07-12 2017-03-12 2031993
...
2017-04-14 2017-01-14 1999352
This is just a loop over every day, week, etc on the same table, and then a COUNT on the distinct events happening within that period of time. But we can't do 'loops' in BigQuery.
One way we thought was a JOIN, and then a COUNT on the GROUP BY intervals (taking advantage of the HAVING clause to simulate the period from a given day back to 4 months), but this is very inefficient and it just doesn't ever finish considering table's size (which has around 254 million records, 173 GB as of today, and it just keeps growing every day).
Another way we thought was using UDFs with the idea that we feed a list of date intervals to the function and then we function would apply the naive query (for counting) for every interval returning the interval and the count for that interval. But... UDFs in BigQuery do not support accessing tables within the UDF so we would have to sort of feed the whole table to the UDF which we haven't tried but doesn't seem reasonable.
So, we have no solution in mind to basically iterate over the same data and do calculations on parts of the data (overlapping parts as you see) within BigQuery and our only solution is doing this outside BigQuery (the loop functionality in the end).
Is there a way or someone can think of a way to do this all within BigQuery? Our goal would be to provide this as a view inside BigQuery so that it doesn't depend on an external system that needs to be triggered at the frequency that we set up (days/weeks/etc...).
Below is example of this technique for BigQuery Standard SQL
#standardSQL
SELECT
DAY,
COUNT(CASE WHEN period = 7 THEN id END) AS days_07,
COUNT(CASE WHEN period = 14 THEN id END) AS days_14,
COUNT(CASE WHEN period = 30 THEN id END) AS days_30
FROM (
SELECT
dates.day AS DAY,
periods.period AS period,
id
FROM yourTable AS activity
CROSS JOIN (SELECT DAY FROM yourTable GROUP BY DAY) AS dates
CROSS JOIN (SELECT period FROM (SELECT 7 AS period UNION ALL
SELECT 14 AS period UNION ALL SELECT 30 AS period)) AS periods
WHERE dates.day >= activity.day
AND CAST(DATE_DIFF(dates.day, activity.day, DAY) / periods.period AS INT64) = 0
GROUP BY 1,2,3
)
GROUP BY DAY
-- ORDER BY DAY
You can play/test with this example using dummy data as below
#standardSQL
WITH data AS (
SELECT
DAY, CAST(10 * RAND() AS INT64) AS id
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-07-13')) AS DAY
)
SELECT
DAY,
COUNT(DISTINCT CASE WHEN period = 7 THEN id END) AS days_07,
COUNT(DISTINCT CASE WHEN period = 14 THEN id END) AS days_14,
COUNT(DISTINCT CASE WHEN period = 30 THEN id END) AS days_30
FROM (
SELECT
dates.day AS DAY,
periods.period AS period,
id
FROM data AS activity
CROSS JOIN (SELECT DAY FROM data GROUP BY DAY) AS dates
CROSS JOIN (SELECT period FROM (SELECT 7 AS period UNION ALL
SELECT 14 AS period UNION ALL SELECT 30 AS period)) AS periods
WHERE dates.day >= activity.day
AND CAST(DATE_DIFF(dates.day, activity.day, DAY) / periods.period AS INT64) = 0
GROUP BY 1,2,3
)
GROUP BY DAY
ORDER BY DAY
Does it work for you?
WITH dates AS(
SELECT GENERATE_DATE_ARRAY(DATE_SUB(CURRENT_DATE(), INTERVAL 4 MONTH), CURRENT_DATE()) arr_dates
),
data AS(
SELECT 1 id, '2017-03-14' dt UNION ALL
SELECT 1 id, '2017-03-14' dt UNION ALL
SELECT 1, '2017-04-20' UNION ALL
SELECT 2, '2017-04-20' UNION ALL
SELECT 3, '2017-03-15' UNION ALL
SELECT 4, '2017-04-20' UNION ALL
SELECT 5, '2017-07-14'
)
SELECT
i_date date,
DATE_ADD(i_date, INTERVAL 4 MONTH) next_date,
(SELECT COUNT(DISTINCT id) FROM data WHERE PARSE_DATE("%Y-%m-%d", data.dt) BETWEEN i_date AND DATE_ADD(i_date, INTERVAL 4 MONTH)) total
FROM dates,
UNNEST(arr_dates) i_date
ORDER BY i_date
Where data is a simulation of your events table.

how do I limit a period of time [closed]

What i want to do is to limit a user to deposit/withdraw money in the account only 5 times a week , and after the 5 times user must wait next week to be able to deposit again.
I have a table named depuser with the following rows : uid(userid), date(date of dep/witdraw), type(deposit/witdraw) and the amount. thanks in advance
This query will count the # of deposits recorded in the last 7 days, based on your table structure:
SELECT COUNT(uid) AS total_deposits FROM depuser
WHERE `type` LIKE 'deposit' AND
DATE(`date`) <= NOW() AND
DATE(`date`) >= DATE_SUB( NOW(), INTERVAL 7 DAY)`
From here you can compare the total_deposits value returned and make logic decisions.

Resources