тестовые заданиясобеседованиеаналитика данныхSQLPythonA/B-тесты

Тестовые задания аналитика данных 2026: 15 реальных примеров с разборами

2026-06-04 30 мин

Тестовые задания — это 70% решения offer/no-offer на собесе аналитика. Они показывают не «знаешь ли ты SQL», а «можешь ли решить реальную задачу». В этом гайде — 15 РЕАЛЬНЫХ заданий (анонимизированных) с собесов 2026 в Yandex/Ozon/Тинькофф/Сбер/Avito/Wildberries/Lamoda/X5/МТС/VK, с подробными разборами.

Главное про тестовые задания
Тестовое задание — это **business problem с ограничениями**. Главное — **structure approach** (clarify → analyze → recommend), а не perfect technical solution. Recruiter оценивает thinking process.

Что обычно ожидается от тестового

Формат

Структура решения

Что recruiter ищет

15 реальных заданий с разборами

Задание 1: SQL — Top sellers (Ozon-style)

Условие: В таблице \orders\ (seller_id, order_date, revenue, status). Найди топ-10 sellers по выручке за май 2026 со статусом 'completed', и для каждого покажи долю в общей выручке.

Решение:

\\\sql

WITH seller_rev AS (

SELECT seller_id, sum(revenue) AS rev

FROM orders

WHERE order_date >= '2026-05-01' AND order_date < '2026-06-01'

AND status = 'completed'

GROUP BY seller_id

)

SELECT seller_id, rev,

rev * 100.0 / sum(rev) OVER () AS pct_of_total

FROM seller_rev

ORDER BY rev DESC

LIMIT 10;

\\\

Что recruiter ценит:

Задание 2: SQL — Cohort retention (Тинькофф-style)

Условие: Посчитай retention Day-7 (% юзеров, вернувшихся через 7 дней после первого визита) по cohort месяца регистрации.

Решение:

\\\sql

WITH first_visit AS (

SELECT user_id, min(visit_date) AS first_date

FROM visits

GROUP BY user_id

),

d7_check AS (

SELECT

fv.user_id,

toStartOfMonth(fv.first_date) AS cohort,

maxIf(1, v.visit_date BETWEEN fv.first_date + 7 AND fv.first_date + 13) AS returned_d7

FROM first_visit fv

LEFT JOIN visits v ON fv.user_id = v.user_id

GROUP BY fv.user_id, fv.first_date

)

SELECT cohort,

count() AS new_users,

sum(returned_d7) AS returned,

100.0 * sum(returned_d7) / count() AS retention_d7_pct

FROM d7_check

GROUP BY cohort

ORDER BY cohort;

\\\

→ Когортный анализ retention

Задание 3: Python — Conversion funnel (Avito-style)

Условие: Из event log csv (event_type, user_id, ts), посчитай conversion funnel: search → view → contact → deal.

Решение:

\\\python

import pandas as pd

df = pd.read_csv('events.csv')

# User-level: did user perform each step?

funnel = df.pivot_table(

index='user_id',

columns='event_type',

values='ts',

aggfunc='count',

fill_value=0

)

# Conversion rates

steps = ['search', 'view', 'contact', 'deal']

funnel_counts = {step: (funnel[step] > 0).sum() for step in steps}

print("Funnel:")

for i, step in enumerate(steps):

cnt = funnel_counts[step]

if i == 0:

print(f" {step}: {cnt:,} (100%)")

else:

prev_cnt = funnel_counts[steps[i-1]]

pct = cnt / prev_cnt * 100

overall = cnt / funnel_counts[steps[0]] * 100

print(f" {step}: {cnt:,} ({pct:.1f}% from prev, {overall:.1f}% overall)")

\\\

→ Воронка конверсии в SQL

Задание 4: Python — Detect anomalies (Yandex-style)

Условие: Daily revenue time series. Detect anomalies (точки которые statistically unusual).

Решение:

\\\python

import pandas as pd

import numpy as np

df = pd.read_csv('daily_revenue.csv', parse_dates=['date'])

df = df.sort_values('date')

# Method 1: Z-score (works for stationary series)

df['z_score'] = (df['revenue'] - df['revenue'].mean()) / df['revenue'].std()

anomalies_z = df[abs(df['z_score']) > 3]

# Method 2: Rolling stats (better for trend data)

df['rolling_mean'] = df['revenue'].rolling(7).mean()

df['rolling_std'] = df['revenue'].rolling(7).std()

df['rolling_z'] = (df['revenue'] - df['rolling_mean']) / df['rolling_std']

anomalies_rolling = df[abs(df['rolling_z']) > 2]

# Method 3: Decompose (для seasonal data)

from statsmodels.tsa.seasonal import seasonal_decompose

decomp = seasonal_decompose(df['revenue'], model='additive', period=7)

residuals = decomp.resid.dropna()

threshold = residuals.std() * 3

anomalies_decomp = residuals[abs(residuals) > threshold]

\\\

Bonus answer: discuss trade-offs — z-score assumes normality, rolling adapts to trend, decompose handles seasonality.

Задание 5: A/B-test analysis (Тинькофф-style)

Условие: Запустили A/B на новой кнопке оплаты. Control 50K юзеров, treatment 50K. Conversion control 8.2%, treatment 8.7%. Стат значимо?

Решение:

\\\python

from scipy import stats

import numpy as np

n_c, p_c = 50000, 0.082

n_t, p_t = 50000, 0.087

successes_c = int(n_c * p_c)

successes_t = int(n_t * p_t)

# Two-proportion z-test

pool_p = (successes_c + successes_t) / (n_c + n_t)

se = np.sqrt(pool_p * (1 - pool_p) * (1/n_c + 1/n_t))

z = (p_t - p_c) / se

p_value = 2 * (1 - stats.norm.cdf(abs(z)))

print(f"Z-statistic: {z:.3f}")

print(f"P-value: {p_value:.4f}")

print(f"Lift: {(p_t - p_c) / p_c * 100:.2f}%")

print(f"Significant at α=0.05: {p_value < 0.05}")

# Confidence interval

se_diff = np.sqrt(p_c * (1-p_c) / n_c + p_t * (1-p_t) / n_t)

ci_low = (p_t - p_c) - 1.96 * se_diff

ci_high = (p_t - p_c) + 1.96 * se_diff

print(f"95% CI for difference: [{ci_low:.4f}, {ci_high:.4f}]")

\\\

Output: p ≈ 0.006 (significant). Lift 6.1%.

Bonus answer: mention guardrails (refund rate, customer NPS), discuss whether lift is practically significant.

→ A/B-тесты на Python

Задание 6: Метрики — Decompose DAU drop (Yandex-style)

Условие: DAU упал с 100K до 92K (-8%). Что делаешь?

Approach (не SQL, а thinking framework):

- DAU = New + Returning

- DAU = unique users × (sessions / user × 1/1 day)

- Tech: app crash rate? push notification failure?

- Marketing: paid spend cut? campaign ended?

- Product: new release broke flow? UX changed?

- External: competitor launched? holiday?

- Seasonality: week-over-week or year-over-year compare

- Daily new vs returning users (last 30 days)

- DAU by platform (iOS / Android / Web)

- DAU by region

- DAU by user cohort

- Build dashboard «DAU decomposition»

- Hypothesis-driven drilling (try one hypothesis at a time)

- Quick win + long-term root cause fix

Что recruiter ценит: structured approach, не jumping to conclusions.

Задание 7: SQL — Sessionization (Avito-style)

Условие: События от users, нужно объединить в сессии (если перерыв > 30 минут → новая сессия).

Решение:

\\\sql

WITH events_lagged AS (

SELECT user_id, event_ts,

lag(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts) AS prev_ts

FROM events

),

session_starts AS (

SELECT user_id, event_ts,

CASE

WHEN prev_ts IS NULL THEN 1

WHEN dateDiff('minute', prev_ts, event_ts) > 30 THEN 1

ELSE 0

END AS new_session

FROM events_lagged

),

sessions AS (

SELECT user_id, event_ts,

sum(new_session) OVER (PARTITION BY user_id ORDER BY event_ts) AS session_id

FROM session_starts

)

SELECT user_id, session_id,

min(event_ts) AS session_start,

max(event_ts) AS session_end,

count() AS event_count

FROM sessions

GROUP BY user_id, session_id

ORDER BY user_id, session_start;

\\\

Задание 8: Python — Customer LTV (Lamoda-style)

Условие: Из транзакций посчитай LTV каждого customer + project forward 12 месяцев.

Решение:

\\\python

import pandas as pd

# Historical LTV

df['amount'] = df['amount'].astype(float)

hist_ltv = df.groupby('user_id').agg(

historical_ltv=('amount', 'sum'),

purchase_count=('order_id', 'nunique'),

first_purchase=('order_date', 'min'),

last_purchase=('order_date', 'max'),

avg_order_value=('amount', 'mean')

).reset_index()

# Projection

hist_ltv['tenure_days'] = (hist_ltv['last_purchase'] - hist_ltv['first_purchase']).dt.days + 1

hist_ltv['daily_revenue'] = hist_ltv['historical_ltv'] / hist_ltv['tenure_days']

hist_ltv['projected_12m_ltv'] = hist_ltv['historical_ltv'] + hist_ltv['daily_revenue'] * 365

# Active customers (purchased in last 90 days)

from datetime import datetime

hist_ltv['days_since_last'] = (datetime.now() - hist_ltv['last_purchase']).dt.days

hist_ltv['is_active'] = hist_ltv['days_since_last'] <= 90

print(hist_ltv[hist_ltv['is_active']].describe())

\\\

Bonus: discuss BG/NBD probabilistic LTV model for production accuracy.

→ LTV/CAC методология

Задание 9: Продуктовый кейс (X5-style)

Условие: Перекрёсток хочет открыть Чижик в этом районе. Что данные нужны для решения?

Решение framework:

Models:

Задание 10: SQL — Self-join для friend recommendations (VK-style)

Условие: В таблице friendships (user_id, friend_id). Найди для user_id=123 топ-10 «friends of friends» которые ещё НЕ его друзья.

Решение:

\\\sql

WITH my_friends AS (

SELECT friend_id AS user_id

FROM friendships WHERE user_id = 123

),

friends_of_friends AS (

SELECT f.friend_id, count() AS mutual_friends

FROM friendships f

JOIN my_friends mf ON f.user_id = mf.user_id

WHERE f.friend_id != 123

AND f.friend_id NOT IN (SELECT user_id FROM my_friends)

GROUP BY f.friend_id

)

SELECT * FROM friends_of_friends

ORDER BY mutual_friends DESC

LIMIT 10;

\\\

Задание 11: Python — Forecasting (МТС-style)

Условие: Прогнози daily revenue на следующие 30 дней.

Решение (Prophet):

\\\python

from prophet import Prophet

import pandas as pd

df = pd.read_csv('daily_revenue.csv', parse_dates=['date'])

df.columns = ['ds', 'y'] # Prophet expects these names

model = Prophet(

yearly_seasonality=True,

weekly_seasonality=True,

daily_seasonality=False,

changepoint_prior_scale=0.05

)

model.fit(df)

future = model.make_future_dataframe(periods=30)

forecast = model.predict(future)

print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(30))

\\\

Bonus: discuss SARIMA, XGBoost для time series, ensemble approaches.

Задание 12: SQL — Window function для running totals (Сбер-style)

Условие: Для каждого транзакции рассчитай running total по user_id (sum amount по дате).

Решение:

\\\sql

SELECT

user_id, tx_date, amount,

sum(amount) OVER (

PARTITION BY user_id

ORDER BY tx_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS running_total

FROM transactions

ORDER BY user_id, tx_date;

\\\

→ Window functions полный гайд

Задание 13: A/B-tests product (Wildberries-style)

Условие: Запустили new search ranking algorithm в A/B. CTR +5%, но GMV per searcher -1%. Что делаешь?

Analysis:

- CTR up на каких queries / categories?

- GMV down где специфически?

- Position bias — может click на cheaper items boost CTR

- Average price per clicked item

- Conversion rate (clicks → purchases)

- Add-to-cart rate

- New ranker promotes cheaper items (CTR up)

- But юзеры не покупают (GMV down)

- Wrong optimization target

- НЕ катить как есть

- Re-train на purchase data (not click data)

- Add price tier balancing

Задание 14: Behavioral case (Yandex-style)

Условие: PM запрашивает analysis с deadline 1 день. Знаешь что для proper analysis нужно 3 дня. Что делаешь?

Answer (STAR-format):

«Situation: PM хочет analysis для quarterly review tomorrow.

Task: Determine scope vs depth trade-off.

Action:

Result: PM выбирает quick version. Я деливеру + flag known limitations. Week 1 — follow-up с full analysis. PM appreciates transparency.»

→ Behavioral interview + STAR

Задание 15: SQL Performance optimization (любая компания)

Условие: Query timeout (60+ секунд). Оптимизируй.

\\\sql

-- Original (slow):

SELECT u.name, sum(o.amount)

FROM users u

JOIN orders o ON u.user_id = o.user_id

WHERE o.order_date >= '2026-01-01'

AND substring(u.email, locate('@', u.email)) = '@gmail.com'

GROUP BY u.user_id, u.name;

\\\

Решение:

\\\sql

-- Optimizations:

-- 1. Pre-filter users by email (non-sargable substring)

-- 2. Add index on (user_id, order_date) for orders

-- 3. Filter orders BEFORE join

WITH gmail_users AS (

SELECT user_id, name FROM users WHERE email LIKE '%@gmail.com'

),

recent_orders AS (

SELECT user_id, sum(amount) AS total

FROM orders

WHERE order_date >= '2026-01-01'

GROUP BY user_id

)

SELECT u.name, COALESCE(o.total, 0) AS total_revenue

FROM gmail_users u

LEFT JOIN recent_orders o ON u.user_id = o.user_id;

\\\

Why faster:

→ SQL антипаттерны

Топ-7 ошибок в тестовых заданиях

Чек-лист для тестового задания

Что дальше

Источники

Тренируй на 612 заданиях из собесов
Тестовые задания + AI разбор. Реальные кейсы РФ-компаний. Первые 5 бесплатно.
Открыть задания →