Opened 4 years ago

Closed 4 years ago

#2162 closed change (fixed)

[meta] Send notifications to a subset of the user base

Reported by: fhd Assignee: fhd
Priority: P2 Milestone:
Module: Unknown Keywords: meta 2015q2
Cc: trev, sebastian, Kirill Blocked By: #2274, #2275, #2276, #2277, #2698, #2707, #2739, #2868, #2869
Blocking: Platform: Unknown
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by fhd)

Background

Currently, notifications show up for everyone. That's sufficient for the emergency use case, but when it comes to using information notifications, we want to test the message with a smaller group first.

We need a mechanism that allows us to define test groups and their size (size based on probability will do, e.g. "0.5% of the user base"), which will receive additional notifications from the server.

What to change

We need to add a new URL handler for notification.json, that'll put users into tests and test groups based on the desired probability.

The notification identifiers and test groups should be appended to lastVersion using the following format: -someinfo/0-otherinfo/2. Here someinfo and otherinfo are notification identifiers. The number after the slash designates the test group.

Notifications are configured in the notifications repository, one file per notification with the notification identifier being determined by the file name. The format of these files should be extended as follows:

  • Optional start and end parameters should determine when the notification should become active/inactive (specified as an ISO 8601 subset in the format %Y-%m-%dT%H:%M, e.g. 2015-06-20T07:24).
  • The general notification settings can be followed up by sections determining variants of a notification. Names of the sections should be irrelevant beyond establishing an ordering. Each section should contain sample (e.g. 0.01 meaning 1%), title, message and optionally links settings for this variant.
  • If the notification has title, message and links settings before the sections these become the default notification.

Test group identifiers should be generated as following: sort section names and number them in that order. Test group 0 should be reserved for the default notification or no notification if there is no default. The probability for a user to get into test group 0 should be: prob(0) = 1 - sum(prob(1)..prob(10000)).

Example: a notification notifications/someinfo:

severity = information
start = 2015-03-17T11:00
end = 2015-03-22T11:00
target = adblockpluschrome

[message shown]
sample = 0.05
title.en-US = Some info.
message.en-US = This is some info
links = foobar

And a notification notifications/otherinfo:

severity = information
start = 2015-03-17T11:00
end = 2015-03-22T11:00
target = adblockpluschrome
title.en-US = Other info
message.en-US = This is some other info

[alternative]
sample = 0.03
title.en-US = Other info
message.en-US = Alternative message for other info.

Both notification run at the same time and have two test groups: someinfo/0, someinfo/1, otherinfo/0, otherinfo/1. Now, if the handler receives a request such as:

/notifications.json?...&lastVersion=201503160400

It will first check which notifications are currently active. On March 20, 2015 both someinfo and otherinfo notifications will be active. However, the lastVersion parameter has no test group for the someinfo notification - so it will assign the user randomly (5% for 1, 95% for 0). There is also no test identifier for the otherinfo notification - so it will assign the user randomly as well here (3% for 1, 97% for 0).

So the user will most likely get a response with lastVersion: "201503202215-otherinfo/0-someinfo/0". In that case he will get the default notification for someinfo (meaning no notification) and the default notification for otherinfo (meaning "This is some other info"). If he gets a response with lastVersion: "201503202215-otherinfo/1-someinfo/0" he will still get no notification for someinfo but "Alternative message for other info." for otherinfo.

On the next day the handler will receive a request like the following:

/notifications.json?...&lastVersion=201503202215-otherinfo/1-someinfo/0

Since both notifications are still active and no new notifications became active the response should keep the test identifiers: lastVersion: "201503212150-otherinfo/1-someinfo/0". The list of messages will be exactly the same as on the day before.

The same request on March 22 on the other hand will yield different results: the notifications expired and are no longer active. So the handler should remove the corresponding test identifiers and serve a response with lastVersion: "201503222205".

Note that the order of test identifiers has be fixed, meaning that we should sort them by notification identifier.

Caching should make this handler scale (test identifiers are the only relevant parameter here).

Ticket Status Resolution Summary Component Owner
#2274 closed fixed [Send notifications to a subset of the user base] Move notification parsing into a module Sitescripts fhd
#2275 closed fixed [Send notifications to a subset of the user base] Parse notification groups Sitescripts fhd
#2276 closed fixed [Send notifications to a subset of the user base] Handle groups in notification.json requests Sitescripts fhd
#2277 closed fixed [Send notifications to a subset of the user base] Use the new notification.json handler Infrastructure fhd
#2698 closed worksforme Notifications do not display included links Adblock-Plus-for-Firefox
#2707 closed fixed Notification start / end keys do not work Sitescripts fhd
#2739 closed invalid [Send notifications to a subset of the user base] Test notification handler with caching enabled Sitescripts
#2868 closed fixed Pull processes piling up after deploying the new notification handler Sitescripts fhd
#2869 closed fixed Keep the notifications repository on the notification servers up-to-date Infrastructure fhd


Change History (42)

comment:1 Changed 4 years ago by fhd

  • Cc trev sebastian added

Sebastian/Wladimir: Haven't set this to Ready yet, please have a look.

How tests are configured is just something I came up with this morning, let me know if you can think of a better way. Also, I'm not entirely sure about calling the whole thing "tests" to begin with, but couldn't come up with anything less confusing.

Last edited 4 years ago by fhd (previous) (diff)

comment:2 Changed 4 years ago by sebastian

  • Ready set

comment:3 follow-up: Changed 4 years ago by sebastian

This should generally work, but caching won't. Since the timestamp given in the lastVersion parameter will be different for virtually every request, but is a required parameter to determine the (new) test designator, we can't actually cache any request here. Also this logic seems to be too complex to be implemented by pure nginx configuration.

comment:4 Changed 4 years ago by fhd

Yes I think you're right. We came to the conclusion that caching is going to work last time we discussed this, but it's not as simple as I thought. And the logic ended up becoming too complex for nginx configuration indeed. We won't be generating responses on the fly luckily, but we will still need to have all notification requests go through this handler.

But I think we should first make this run, then make it fast. We have ten times as many notification servers as we would need, since every filter server serves notifications too. So I'm not entirely sure that this will actually be a problem (although I have strong suspicions). And if we need to optimise, there's still some less elegant things we can do, a native nginx plugin should really do the trick if all else fails.

comment:5 in reply to: ↑ 3 ; follow-up: Changed 4 years ago by trev

  • Keywords meta added
  • Summary changed from Send notifications to a subset of the user base to [meta] Send notifications to a subset of the user base

Replying to sebastian:

This should generally work, but caching won't. Since the timestamp given in the lastVersion parameter will be different for virtually every request, but is a required parameter to determine the (new) test designator, we can't actually cache any request here.

There are two ways to deal with that issue:

  1. Just ignore it. We get millions of requests, yet the number of possible lastVersion values is comparably low. Yesterday one of our servers received 2,456,366 requests but only 116,992 unique lastVersion values. We could get that number down again by increasing the caching interval (currently 10 minutes). The point isn't getting that script to run only the absolutely required number of times, not having it run for each and every request would already suffice.
  2. Don't use $arg_lastVersion as cache parameter but only the channel part of it. Extracting the channel into a separate nginx variable is actually fairly easy and should make sure there aren't all too many responses to be cached.

I've marked this as a meta issue - implementing this will require a number of subissues.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 4 years ago by sebastian

Replying to trev:

  1. Don't use $arg_lastVersion as cache parameter but only the channel part of it. Extracting the channel into a separate nginx variable is actually fairly easy and should make sure there aren't all too many responses to be cached.

As far as I understand it, this isn't an option, since the timestamp of the lastVersion parameter needs to be considered to identify users that have a channel assigned from a previous test and to identify those that don't have a channel assigned but were already excluded from the current test.

I've marked this as a meta issue - implementing this will require a number of subissues.

I agree that there should multiple issues, but for a meta issue this issue has too much implementation details. But on the other hand it has all the details required to write the web app serving those requests.

comment:7 in reply to: ↑ 6 Changed 4 years ago by trev

Replying to sebastian:

As far as I understand it, this isn't an option, since the timestamp of the lastVersion parameter needs to be considered to identify users that have a channel assigned from a previous test and to identify those that don't have a channel assigned but were already excluded from the current test.

No, we don't want to consider the timestamp, the channel IDs should be sufficient for that. Note that there might be more than one channel ID, e.g. if two notifications are active at the same time. Each channel ID is only valid for the respective notification. So channel IDs for notifications that are no longer active should be ignored, and if a channel ID for an active notification is missing then the user should be assigned one.

Note that caching might affect the random distributions if we make the caching interval too long, so caching only on channel IDs should be the preferable solution.

And - yes, we should be moving the implementation details into subissues. I think it should be the following:

  • Implement a web application to generate notification responses on the fly.
  • Extend the notifications repository parser for the additional parameters.
  • Make the web application consider channels (assign users to channels / keep them in the channels that are still relevant).
  • Update stats processing to consider the new lastVersion format.
  • Change notification server configuration to use the web application and cache response.
Last edited 4 years ago by trev (previous) (diff)

comment:8 Changed 4 years ago by sebastian

  • Ready unset

I don't mind. But this different from what fhd told me, and also different from what the issue description says. Please sort it out with him if you have objections, and update the description accordingly.

Last edited 4 years ago by sebastian (previous) (diff)

comment:9 follow-ups: Changed 4 years ago by fhd

  • Description modified (diff)

Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...

As far as caching the group designator goes: That should work, but does that really make much of a difference? The majority of the requests can't be cached that way - only a small minority of the user base will be in a test at any given time. Unless of course, we're putting people not part of a test into a special test group that means "not part of this test". I guess that's what you mean, Wladimir?

comment:10 in reply to: ↑ 9 Changed 4 years ago by trev

There is lots of confusion here...

Replying to fhd:

Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...

Determining whether a notification is active (and the corresponding channel IDs valid) - that should be done based on the current time, no reason to use lastVersion for that.

As far as caching the group designator goes: That should work, but does that really make much of a difference? The majority of the requests can't be cached that way - only a small minority of the user base will be in a test at any given time.

No. Once a notification becomes active, each user should be assigned a channel ID for it. Typically, it will be either the "default" channel (no notification) or the "selected" channel (notification visible). Not assigning a user to a channel would mean that on next request the assigning process runs again and probabilities will be messed up.

Note that "no channel ID whatsoever" is also a valid cache identifier.

comment:11 in reply to: ↑ 9 Changed 4 years ago by sebastian

Replying to fhd:

Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...

We still need a way to figure out if a given test designator belongs to a previous test (for example when the user were offline for a few days/weeks or when we schedule multiple tests subsequently). However, this can be done by making the test designators unique across all tests.

Also we need to distinguish users sending a request for the first time during the current test and users that have already been excluded from the test. So if we don't want to rely on the timestamp of the last download, we also have to assign test designators to users excluded from the test.

comment:12 Changed 4 years ago by trev

  • Description modified (diff)

comment:13 Changed 4 years ago by fhd

Yes, start and end are not necessary at all if we just use the default channel for that, as you said earlier actually.

Discussed this (and a different format for tests in the notifications repository) a bit on IRC with Wladimir, who remembers our meeting on this remarkably well. He just updated the description accordingly.

comment:14 Changed 4 years ago by trev

  • Description modified (diff)

comment:15 Changed 4 years ago by trev

  • Description modified (diff)

comment:16 Changed 4 years ago by fhd

Sounds good to me like this.

comment:17 Changed 4 years ago by sebastian

  • Ready set

comment:19 Changed 4 years ago by Kirill

  • Cc Kirill added

comment:24 Changed 4 years ago by fhd

  • Sensitive set

comment:25 Changed 4 years ago by fhd

  • Sensitive unset
  • Verified working unset

comment:26 Changed 4 years ago by fhd

  • Owner set to fhd

comment:27 Changed 4 years ago by annlee@…

  • Blocked By 1951 added

comment:28 Changed 4 years ago by fhd

  • Blocked By 2274 added

comment:29 Changed 4 years ago by fhd

  • Blocked By 2275 added

comment:30 Changed 4 years ago by fhd

  • Blocked By 2276 added

comment:31 Changed 4 years ago by fhd

  • Description modified (diff)

comment:32 Changed 4 years ago by fhd

  • Blocked By 1951 removed

comment:33 Changed 4 years ago by fhd

  • Component changed from Sitescripts to Unknown

comment:34 Changed 4 years ago by fhd

  • Blocked By 2277 added

comment:35 Changed 4 years ago by fhd

  • Keywords 2015q2 added

comment:36 Changed 4 years ago by Kirill

The probability for a user to get into test group 0 should be: prob(0) = 1 - sum(prob(1)..prob(10000)).

Technically this is wrong. If the user can get assigned to many groups, the probability to be unassigned is p(0) = (1-p(1))*(1-p(2))*(1-p(3))*...*(1-p(n))

comment:37 Changed 4 years ago by Ross

  • Blocked By 2705 added

comment:38 Changed 4 years ago by Ross

  • Blocked By 2707 added

comment:39 Changed 4 years ago by Ross

  • Blocked By 2698 added

comment:40 Changed 4 years ago by Ross

Performed first pass of testing. Have not tested Chrome yet (was not in devbuild at time).

Working:

  • Titles and messages with lots of text.
  • Titles and messages with special characters.
  • Group assignment is working as expected.
  • Probability is working as expected.
  • Alternative messages are working as expected.

Problems:

  • Links are not displayed: #2698
  • Start / end keys do not work: #2707
  • Lack of error reporting/logging: #2705

ABP 2.6.9.3953
Firefox 38.0.5 / Windows 8.1 x64

comment:41 Changed 4 years ago by fhd

  • Blocked By 2705 removed

comment:42 Changed 4 years ago by fhd

  • Description modified (diff)

Updated the date format based on the discussion in https://codereview.adblockplus.org/29321041/.

comment:43 Changed 4 years ago by Ross

  • Blocked By 2730 added

comment:44 Changed 4 years ago by Ross

  • Blocked By 2730 removed

comment:45 Changed 4 years ago by Ross

  • Blocked By 2739 added

comment:46 Changed 4 years ago by fhd

  • Blocked By 2868, 2869 added
  • Tester set to Unknown

comment:47 Changed 4 years ago by fhd

  • Resolution set to fixed
  • Status changed from new to closed

This is fully rolled out now.

Note: See TracTickets for help on using tickets.