Avito — самый «двусторонний рынок» среди РФ-компаний. В Яндексе спрашивают про CUPED, в Тинькофф — credit risk, в Сбере — регуляторику. В Avito — про balance между sellers и buyers, listing quality, search ranking, network effects. В этом гайде разберу 25+ реальных вопросов с 5 раундов собеса аналитика Avito (Auto / Realty / Services / Classifieds) — с разбором.
Грейды аналитика в Avito (2026)
| Грейд | Compensation/мес РФ | Опыт | Что спрашивают |
|---|---|---|---|
| Junior | 150-220K ₽ | 0-1 год | SQL базовый, pandas, базовые classifieds-метрики |
| Middle | 220-340K ₽ | 1-3 года | Vertica/CH advanced, A/B, listing dynamics |
| Senior | 340-510K ₽ | 3-6 лет | ML/search ranking, recsys, marketplace economics |
| Lead | 510-680K+ ₽ | 6+ лет | Multi-team projects, business strategy |
5 раундов собеса Avito
| Раунд | Что | Длительность |
|---|---|---|
| 1. HR-скрининг | Мотивация, classifieds опыт, expectations | 30 мин |
| 2. SQL live | 2-3 задачи + classifieds specifics | 60 мин |
| 3. Python + A/B | pandas + статистика + A/B design | 60 мин |
| 4. Продуктовый кейс | Listing quality / search / seller behavior | 60 мин |
| 5. Финал | Behavioral + culture | 45 мин |
7 SQL-вопросов с собеса Avito
Найди listings с высокой конверсией views → contacts.
✅ Сильный ответ:
\\\sql
WITH listing_stats AS (
SELECT
listing_id,
SUM(CASE WHEN action = 'view' THEN 1 ELSE 0 END) AS views,
SUM(CASE WHEN action = 'contact' THEN 1 ELSE 0 END) AS contacts
FROM listing_actions
WHERE action_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY listing_id
HAVING SUM(CASE WHEN action = 'view' THEN 1 ELSE 0 END) >= 100
)
SELECT
listing_id,
views,
contacts,
ROUND(contacts * 100.0 / views, 2) AS conv_rate_pct
FROM listing_stats
WHERE contacts * 100.0 / views > 5
ORDER BY conv_rate_pct DESC
LIMIT 100;
\\\
Senior follow-up:
- «100 views минимум — почему? — для статистической значимости conv rate»
- «Что значит «высокая» — top 5%? Outliers? Используй percentile vs threshold»
- «Какие categories разные baseline? — Auto имеет higher conv than Services. Normalize by category»
Cohort analysis: seller retention по неделе первого listing.
✅ Сильный ответ:
\\\sql
WITH seller_first_listing AS (
SELECT
seller_id,
date_trunc('week', MIN(listing_created_at)) AS cohort_week
FROM listings
GROUP BY seller_id
),
seller_activity AS (
SELECT
sfl.seller_id,
sfl.cohort_week,
date_trunc('week', l.listing_created_at) AS active_week,
EXTRACT(week FROM AGE(l.listing_created_at, sfl.cohort_week)) AS week_offset
FROM seller_first_listing sfl
JOIN listings l ON l.seller_id = sfl.seller_id
)
SELECT
cohort_week,
COUNT(DISTINCT seller_id) FILTER (WHERE week_offset = 0) AS week_0,
COUNT(DISTINCT seller_id) FILTER (WHERE week_offset = 1) AS week_1,
COUNT(DISTINCT seller_id) FILTER (WHERE week_offset = 2) AS week_2,
COUNT(DISTINCT seller_id) FILTER (WHERE week_offset = 4) AS week_4
FROM seller_activity
GROUP BY cohort_week
ORDER BY cohort_week;
\\\
→ Cohort retention SQL deep dive
Search ranking: топ-N queries без хороших results.
✅ Сильный ответ:
\\\sql
WITH search_stats AS (
SELECT
query_text,
COUNT(*) AS search_count,
COUNT(CASE WHEN clicked_position IS NULL THEN 1 END) AS zero_click_count,
COUNT(CASE WHEN clicked_position IS NOT NULL THEN 1 END) AS clicked_count,
AVG(clicked_position) AS avg_click_position
FROM searches
WHERE search_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY query_text
HAVING COUNT(*) >= 1000
)
SELECT query_text, search_count, zero_click_count,
zero_click_count * 100.0 / search_count AS zero_click_pct,
avg_click_position
FROM search_stats
WHERE zero_click_count * 100.0 / search_count > 50
ORDER BY search_count DESC
LIMIT 50;
\\\
Senior follow-up: «Это search quality issue detection. Queries с >50% zero-click — индикатор плохого ranking или missing inventory. Avito регулярно мониторит это для improvement search.»
Funnel: search → view → contact → deal.
✅ Сильный ответ:
\\\sql
WITH funnel AS (
SELECT
user_id,
COUNT(DISTINCT CASE WHEN event = 'search' THEN session_id END) AS searches,
COUNT(DISTINCT CASE WHEN event = 'view' THEN session_id END) AS views,
COUNT(DISTINCT CASE WHEN event = 'contact' THEN session_id END) AS contacts,
COUNT(DISTINCT CASE WHEN event = 'deal' THEN session_id END) AS deals
FROM user_events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY user_id
)
SELECT
SUM(CASE WHEN searches > 0 THEN 1 ELSE 0 END) AS users_searched,
SUM(CASE WHEN views > 0 THEN 1 ELSE 0 END) AS users_viewed,
SUM(CASE WHEN contacts > 0 THEN 1 ELSE 0 END) AS users_contacted,
SUM(CASE WHEN deals > 0 THEN 1 ELSE 0 END) AS users_dealt
FROM funnel;
\\\
Senior follow-up: «Это user-level funnel. Альтернативные подходы — session-level (более точный для UX analysis) или time-bound (within X minutes of search).»
Find duplicate listings (same item posted multiple times).
✅ Сильный ответ:
\\\sql
WITH listing_hashes AS (
SELECT
listing_id,
seller_id,
md5(LOWER(REGEXP_REPLACE(title || description, '\s+', '', 'g'))) AS content_hash,
created_at
FROM listings
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
seller_id,
content_hash,
COUNT(*) AS duplicate_count,
ARRAY_AGG(listing_id ORDER BY created_at) AS listing_ids
FROM listing_hashes
GROUP BY seller_id, content_hash
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC
LIMIT 100;
\\\
Senior: «Duplicate detection — anti-spam priority в classifieds. Real-world: shingle hashes (n-gram) + SimHash для near-duplicates, не exact match.»
Time-on-listing analysis (sessions, dwell time).
✅ Сильный ответ:
\\\sql
WITH session_dwell AS (
SELECT
listing_id,
session_id,
MAX(event_ts) - MIN(event_ts) AS dwell_time,
COUNT(*) AS events_in_session
FROM listing_events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY listing_id, session_id
HAVING MAX(event_ts) - MIN(event_ts) > INTERVAL '5 seconds' -- filter bouncers
)
SELECT
listing_id,
COUNT(*) AS sessions,
AVG(EXTRACT(EPOCH FROM dwell_time)) AS avg_dwell_sec,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM dwell_time)) AS median_dwell_sec
FROM session_dwell
GROUP BY listing_id
HAVING COUNT(*) >= 50
ORDER BY median_dwell_sec DESC;
\\\
Senior follow-up: «Dwell time — proxy for listing quality. High dwell → interested buyers. Low dwell → bad photos/description/price. Avito uses dwell в ranking signals.»
Self-join: пары sellers продававших похожие items в одном городе.
✅ Сильный ответ:
\\\sql
SELECT
a.seller_id AS seller1,
b.seller_id AS seller2,
a.category_id,
a.city_id,
COUNT(*) AS shared_items
FROM listings a
JOIN listings b
ON a.category_id = b.category_id
AND a.city_id = b.city_id
AND a.seller_id < b.seller_id -- избегаем дублей (A,B) и (B,A)
AND ABS(a.price - b.price) / GREATEST(a.price, b.price) < 0.1 -- цены в пределах 10%
WHERE a.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY a.seller_id, b.seller_id, a.category_id, a.city_id
HAVING COUNT(*) >= 3;
\\\
Senior follow-up: «Self-join для competitor analysis или spam detection (один seller под разными аккаунтами).»
5 Python-вопросов с собеса Avito
Реализуй listing quality score (composite metric).
✅ Сильный ответ:
\\\python
import pandas as pd
def listing_quality_score(df):
# Normalize each feature to [0, 1]
df['photo_score'] = (df['photo_count'].clip(1, 10) / 10)
df['desc_score'] = (df['description_length'].clip(50, 500) - 50) / 450
df['price_score'] = 1 - abs(df['price'] - df['category_median_price']) / df['category_median_price']
df['price_score'] = df['price_score'].clip(0, 1)
df['seller_score'] = df['seller_rating'].fillna(3) / 5
# Weighted composite
weights = {'photo': 0.25, 'desc': 0.20, 'price': 0.30, 'seller': 0.25}
df['quality_score'] = (
weights['photo'] * df['photo_score'] +
weights['desc'] * df['desc_score'] +
weights['price'] * df['price_score'] +
weights['seller'] * df['seller_score']
)
return df
\\\
Senior follow-up: «Weights — обычно подбираются через regression на ground truth (deal completion). Это proxy metric — нужно validate на A/B vs business outcome.»
Calculate seller LTV (lifetime value) per category.
✅ Сильный ответ:
\\\python
import pandas as pd
# Seller lifetime metrics
seller_ltv = (df.groupby('seller_id').agg(
total_listings=('listing_id', 'nunique'),
total_revenue=('paid_amount', 'sum'),
first_listing=('listing_created_at', 'min'),
last_listing=('listing_created_at', 'max'),
avg_price=('price', 'mean')
).reset_index())
seller_ltv['lifetime_days'] = (seller_ltv['last_listing'] - seller_ltv['first_listing']).dt.days
seller_ltv['arpu_per_month'] = seller_ltv['total_revenue'] / (seller_ltv['lifetime_days'] / 30 + 1)
seller_ltv['ltv'] = seller_ltv['total_revenue'] # уже накопленный LTV
# По категории — добавь category
\\\
Senior: «Это observed LTV. Predictive LTV — нужен ML model (gradient boosting на features + retention curve). В Avito используется и то, и то — observed для accounting, predicted для targeting.»
Detect bot traffic в search queries.
✅ Сильный ответ:
\\\python
import pandas as pd
def detect_bot_patterns(df):
# Features per user
user_stats = df.groupby('user_id').agg(
queries_per_minute=('query_id', lambda x: len(x) / max(1, (df.loc[x.index, 'ts'].max() - df.loc[x.index, 'ts'].min()).total_seconds() / 60)),
unique_categories=('category_id', 'nunique'),
avg_query_length=('query_text', lambda x: x.str.len().mean()),
unique_user_agents=('user_agent', 'nunique')
).reset_index()
# Rules
user_stats['is_likely_bot'] = (
(user_stats['queries_per_minute'] > 10) |
(user_stats['unique_user_agents'] > 5) |
(user_stats['avg_query_length'] < 3)
)
return user_stats
\\\
Senior: «Rules-based first pass. Для production — ML (Isolation Forest, Random Forest на 50+ features включая behavior patterns, network features). Botnets evolving — нужна continuous re-training.»
A/B test analysis: ratio metric (CTR with delta method).
✅ Сильный ответ:
\\\python
import numpy as np
from scipy import stats
def delta_ratio_test(clicks_a, views_a, clicks_b, views_b):
"""Delta method для разницы ratios (CTR_a vs CTR_b)"""
ctr_a = clicks_a.sum() / views_a.sum()
ctr_b = clicks_b.sum() / views_b.sum()
# Variance via delta method
var_a = (
clicks_a.var() / views_a.mean()**2
- 2 * ctr_a * np.cov(clicks_a, views_a)[0, 1] / views_a.mean()2
+ ctr_a2 * views_a.var() / views_a.mean()2
) / len(clicks_a)
var_b = (
clicks_b.var() / views_b.mean()2
- 2 * ctr_b * np.cov(clicks_b, views_b)[0, 1] / views_b.mean()2
+ ctr_b2 * views_b.var() / views_b.mean()**2
) / len(clicks_b)
diff = ctr_b - ctr_a
se_diff = np.sqrt(var_a + var_b)
z = diff / se_diff
p = 2 * (1 - stats.norm.cdf(abs(z)))
return {'lift': diff, 'p_value': p, 'ci_95': (diff - 1.96 * se_diff, diff + 1.96 * se_diff)}
\\\
Senior follow-up: «Standard t-test на user-level CTR works, но suboptimal. Delta method учитывает корреляцию между numerator и denominator → tighter CI.»
Build sessionization из event log.
✅ Сильный ответ:
\\\python
df = df.sort_values(['user_id', 'event_ts'])
df['prev_ts'] = df.groupby('user_id')['event_ts'].shift()
df['gap_minutes'] = (df['event_ts'] - df['prev_ts']).dt.total_seconds() / 60
# New session if gap > 30 min or first event
df['new_session'] = (df['gap_minutes'].isna()) | (df['gap_minutes'] > 30)
df['session_id'] = df.groupby('user_id')['new_session'].cumsum()
# Session-level aggregation
session_summary = df.groupby(['user_id', 'session_id']).agg(
session_start=('event_ts', 'min'),
session_end=('event_ts', 'max'),
n_events=('event_ts', 'count'),
n_listings_viewed=('listing_id', 'nunique'),
converted=('event_type', lambda x: 'contact' in x.values)
).reset_index()
\\\
5 продуктовых кейсов с собеса Avito (раунд 4)
Number of new sellers падает на 10%. Что делаешь?
✅ Сильный ответ:
«Декомпозиция:
New sellers = visitors × signup rate × first-listing rate
1. Visitors:
- Маркетинг траты по каналам — что изменилось?
- Sезонность (зима vs лето в realty, например)
- SEO трафик — позиция в Yandex поиска
2. Signup rate:
- Сложность регистрации (новый verification step?)
- Performance (страница медленнее grузится?)
3. First-listing rate:
- UX flow для создания listing (новый редизайн?)
- Категории с photo requirement — adoption?
Segmentation:
- By category (Auto / Realty / Services / Goods)
- By region (Moscow vs регионы)
- By acquisition channel
Action plan:
- Day 1-2: data deep dive по сегментам
- Day 3-5: hypothesis testing с PM/UX
- Week 2: A/B test или fix
- Week 3-4: rollout + monitoring»
Search relevance улучшила CR на 8% но revenue упало на 2%. Что происходит?
✅ Сильный ответ:
«Это classic relevance vs monetization trade-off в classifieds.
Гипотезы:
- Free listings ранжируются выше paid promotions → CR ↑ но revenue ↓ (paid items проигрывают)
- High-quality items приоритизируются → cheaper transactions → CR ↑ но GMV ↓
- Long-tail discovery улучшилась → niche items с lower commission
Analysis:
- Decomposition: GMV = transactions × AOV per transaction
- Если transactions ↑ а AOV ↓ — это basket size shift (cheaper items)
- Если transactions ↓ а AOV ↑ — обратная история
Action:
- Прямой trade-off: revenue важнее CR в short term, но CR predicts long-term retention (happy buyers come back)
- Compromise: re-rank chunk «paid promotions» на 1-3 позиции выше organic top
- Test: новый balanced ranking vs текущие 2 версии
Senior insight: «classifieds бизнес на marginal value per query, not just transactions. CR ↑ + LTV ↑ может оправдать short-term revenue dip.»»
Auto category: car listing creation flow редизайн. Как измеришь?
✅ Сильный ответ:
«Primary: listing creation completion rate (started → published)
Secondary metrics:
- Time to publish (efficiency)
- Quality score (photos, description completeness)
- Listing performance post-publish (views, contacts, deal rate)
- Re-edit rate (user fixes mistakes)
Guardrails:
- Crash rate / errors
- Customer support tickets ratio
- Spam/duplicate detection accuracy
A/B Design:
- 50/50 split by user_id
- Minimum 6 weeks для capture seasonality
- Sample size: power analysis на completion rate (baseline 60%, MDE +3pp → ~10K per group)
Analysis:
- Primary metric: Welch's t-test
- Funnel decomposition: где улучшилось / ухудшилось
- Heatmap interaction with photo upload step
- Qualitative: 20 user interviews to validate quantitative
Decision criteria:
- Primary +3pp & p<0.05 ✓
- No guardrail flags ✓
- Quality score not worse ✓
- → rollout 100%»
Seller-buyer matching: ML algorithm changes. Как оцениваешь impact?
✅ Сильный ответ:
«Метрика "matching success":
- Buyer side: contact rate, deal completion rate, time to deal
- Seller side: listing-to-deal rate, time-on-listing, follow-up contacts
A/B по обоим сторонам:
- Treatment: новый matching
- Control: старый matching
- Random by user_id (с careful network effects analysis)
Network effect risk:
- Если treatment лучше — buyers довольнее, но они переходят к те же sellers, что control buyers
- Это interference — нарушение SUTVA assumption
- Решение: cluster randomization (по городам/категориям) или temporal split (period A vs period B)
Metric depth:
- Short-term: contact rate, deals
- Medium-term: repeat usage, NPS
- Long-term: retention, revenue per active user
В Avito это знание critical — наивные A/B на двустороннем рынке часто mislead.»
Listing categorization: новые алгоритм auto-classification. Какие метрики?
✅ Сильный ответ:
«Auto-categorization quality:
Accuracy metrics:
- Top-1 accuracy: % правильно в самую глубокую category
- Top-3 accuracy: правильно в top-3 suggested
- Confusion matrix по top-level categories (Auto/Realty/Services/Goods)
Business impact:
- Listing publish-completion rate (faster = more publishes)
- Quality score post-publish (better category → better search match)
- Search-to-listing conversion (correct cat → correct discovery)
Edge cases:
- Ambiguous items (что в Auto, что в Services — car repair listing?)
- Niche categories (новые / rare items)
- Multi-language titles (русский + английский)
Monitoring:
- Daily accuracy drift
- Category distribution shift (PSI на ML output)
- User correction rate (% who manually change suggested)
Avito context: classifieds with 5000+ subcategories → ML categorization critical для UX.»
3 behavioral вопроса (раунд 5)
Расскажи про самый influential анализ за карьеру.
✅ Сильный ответ:
«Ситуация: в Avito category Auto, наблюдал что листинги в верхней части search ranking имели 3x более низкий CR чем 5-7 позиция.
Гипотеза: topе ranking promoted listings — это «легко продаваемые» дорогие машины. Buyers скроллят дальше к лучшим deals.
Действие:
- SQL analysis: ranking position × CR × AOV
- Подтвердил: top-3 позиции — high AOV, low CR. 5-15 позиции — high CR, lower AOV
- Предложил re-ranking: balance commission revenue с buyer satisfaction (CR)
Результат:
- A/B test показал GMV +5%, CR +12%, revenue сохранился (lift compensated by extra deals)
- Strategy adopted across all categories
- Long-term retention improvement на cohort buyers
Learning: prima facie «лучше» metric (revenue) может скрывать second-order effects (CR → retention → LTV). Always think in causal chain.»
Опиши кейс работы с UX/Design team.
✅ Сильный ответ:
«Ситуация: UX team хотел упростить listing creation form (убрать 3 «optional» fields).
Issue: ML data team полагался на эти fields для quality scoring (year of car, mileage, condition).
Подход:
- Не атаковал решение — спросил что UX оптимизируют? Time to publish.
- Принёс data: time-to-publish vs listing quality vs deal completion. Showed that slightly longer publish = significantly better deal rate (less buyer ghosting).
- Compromise: keep fields, но better UI — auto-fill from photo OCR где возможно.
- Pilot: A/B новый vs старый UI.
Результат:
- Time-to-publish -25% (UI improvements)
- Quality score stable
- Deal rate +3%
- UX team happy (their primary KPI achieved)
- Data team happy (quality preserved)
Learning: find ways to make both team's KPIs win. Не «vs», а «и».»
Случай когда твой анализ был неправильным.
✅ Сильный ответ:
«Ситуация: анализировал seller churn в category Realty. Conclusion: agencies (профессиональные риелторы) — primary driver churn.
Действие: предложил retention campaign для agencies.
Issue (через 3 недели): retention campaign не работала. Углубился — оказалось my analysis confounded by price tier: люди в premium segment churn быстрее, и agencies overrepresented в premium.
Correct conclusion: churn driver — premium tier, not agency status.
Действие:
- Re-analysis с controlled comparison (matching на price tier)
- Эскалация: «my previous recommendation was wrong, here's correct one»
- Re-pivot campaign на premium tier segments
Learning:
- Confounding variables — always check
- Communicate mistakes prominently and quickly (better than discover late)
- Multivariate analysis > simple correlations для business decisions»
Red flags на собесе Avito (не делай)
- Игнорировать двусторонний рынок — все ответы только с buyer side = red flag
- Не упоминать network effects / interference на A/B вопросах для marketplace
- Сравнивать с маркетплейсом (Ozon/Wildberries) как с classifieds — модели разные
- Не знать listing quality / search ranking basics
- Описывать solutions без trade-offs — Avito ценит «balanced» thinking
- Игнорировать spam / fraud — classifieds problem #1
- Слишком academic ML responses — Avito product-driven, не research-driven
Как готовиться к Avito
Месяц 1: classifieds metrics + SQL
- Listing quality, search ranking, seller-buyer dynamics
- 50+ SQL на cohorts, funnels, window functions
- SQL-тренажёр
Месяц 2: marketplace economics + ML basics
- Network effects, two-sided market dynamics
- ML for search ranking (LambdaRank, Learning to Rank)
- Recommender systems for sellers
Месяц 3: кейсы + behavioral
- 30+ кейсов из /cases фильтр marketplace
- AI мок-собес
- 7 STAR-историй
FAQ
Avito vs Ozon — кто сложнее?
Технически schemes одинаковые (SQL, Python, A/B). Domain отличается: Avito = двусторонний рынок физлиц, Ozon = маркетплейс с sellers как поставщиками. Если ты в Ozon working на marketplace side — Avito feels familiar.
Можно ли пройти Avito без classifieds experience?
Да, для junior/middle. Senior+ помогает understanding classifieds dynamics (но не required if strong general analytics).
Какие команды самые конкурентные?
Avito Auto (heavy ML, search), Avito Realty (complex domain), Avito Services (B2C marketplace). Менее конкурентные — Generals (Goods), Pro (B2B).
Стек технологий?
Vertica + ClickHouse + Airflow + dbt. Python (pandas, sklearn). PostgreSQL для OLTP.
Behavioral раунд глубокий?
Medium depth. Меньше чем Yandex, но больше чем Сбер. Focus на collaboration with cross-functional teams (UX, ML, PM).
Что дальше
- 521 SQL-задача с авто-проверкой
- 532 Python-задачи
- 453 кейса фильтр marketplace/classifieds
- AI мок-собес — первый сценарий бесплатно
- 150+ вопросов общего собеса
- Вопросы Ozon аналитика (маркетплейс)
- Вопросы Яндекс аналитика
- Вопросы Тинькофф аналитика
- Вопросы Сбер аналитика
- Behavioral interview + STAR
- Мок-собес: форматы
- Подробнее о компании Avito →
Сравнить Free и Pro → (1999 ₽/мес)
Источники
- Levels.fyi РФ 2026 (Avito grades, salary bands)
- Habr / avito.tech blog — engineering и аналитика material
- VLDB / KDD papers — classifieds search ranking research
- exp-platform.com — A/B testing methodology adapted to two-sided markets