Bookwise Social Activation — Hypothesis Tests

Test Population & Methodology

Mature users in test population

Each hypothesis asks: do users who cross a social threshold in their first 7 days read more in the subsequent 30-day outcome window?

In plain terms: we split users into two groups (met the threshold vs. didn't), then ask whether one group reads significantly more than the other. A small p-value means the difference is unlikely due to chance. Effect size measures how large the practical difference is (0 = none, 1 = complete separation).

Statistical Test

Mann-Whitney U (non-parametric)

Why Non-Parametric?

Reading-hour distributions are heavily right-skewed — most users read 0–1 hours, a few read 20+. Parametric tests assume normality and would give misleading results.

Effect Size

Rank-biserial correlation (0–1). Probability that a random "met threshold" user outranks a random "not met" user minus the reverse. >0.3 = medium, >0.5 = large.

Lift vs Effect Size

Lift is the ratio of group means (how many more hours). Effect size (rank-biserial correlation) measures how consistently one group outranks the other — it's more robust to outliers and unequal group sizes. They can diverge: a small group with very high reading can have high lift but moderate effect size.

H1 — Following Threshold

Users who follow ≥ N people by day 7 vs. those who don't. Outcome: reading hours in days 8–37.

Following ≥1 person in the first week is the sharpest threshold — users who follow anyone at all read 8.8x more. The effect plateaus after ~5 follows.

H2 — Follower Threshold

Users who have ≥ N followers by day 7 vs. those who don't. Outcome: reading hours in days 8–37.

Gaining even 1 follower matters, but the effect is weaker than following — being discovered matters less than actively connecting.

H3a — Kudos Received Threshold

Users who receive ≥ N kudos by day 7 vs. those who don't. Outcome: reading hours in days 8–37.

Kudos received shows the strongest raw effect, but this is confounded: kudos require reading sessions to exist. See H6 for the control.

H3b — Kudos Given Threshold

Users who give ≥ N kudos by day 7 vs. those who don't. Outcome: reading hours in days 8–37.

Giving kudos also predicts reading — users who engage with others' content tend to read more themselves.

H4 — Combined Social Score

Combined score = following + followers + kudos received at day 7. Users with score ≥ N vs. those below.

The combined score aggregates all social signals. It's useful but doesn't outperform the individual metrics.

H5 — Bidirectional Engagement

Users with bidirectional engagement score (min of following, followers) ≥ N at day 7 vs. those below. Outcome: reading hours in days 8–37.

Bidirectional engagement (both following and being followed) captures a qualitative shift: the user is participating in the community, not just observing.

H6 — Reading Momentum (Control)

Users with ≥ N reading sessions in their first 7 days vs. those with fewer. Outcome: reading hours in days 8–37. This is the baseline control.

Reading momentum is the strongest predictor — unsurprisingly, users who read in week 1 continue reading. This is the baseline that social metrics must beat to prove independent value.

Summary — Best Threshold per Hypothesis

Comparing the single best-performing threshold from each hypothesis, ranked by effect size.

Kudos received (H3a) and reading momentum (H6) have the highest effect sizes, but both are confounded by existing reading behavior. The most actionable metrics are following (H1) and bidirectional engagement (H5), which represent social connections we can directly influence through experiments.