Opened on 03/18/2015 at 05:59:46 AM
Closed on 08/12/2015 at 10:01:47 PM
#2162 closed change (fixed)
[meta] Send notifications to a subset of the user base
Reported by: | fhd | Assignee: | fhd |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Unknown | Keywords: | meta 2015q2 |
Cc: | trev, sebastian, Kirill | Blocked By: | #2274, #2275, #2276, #2277, #2698, #2707, #2739, #2868, #2869 |
Blocking: | Platform: | Unknown | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by fhd)
Background
Currently, notifications show up for everyone. That's sufficient for the emergency use case, but when it comes to using information notifications, we want to test the message with a smaller group first.
We need a mechanism that allows us to define test groups and their size (size based on probability will do, e.g. "0.5% of the user base"), which will receive additional notifications from the server.
What to change
We need to add a new URL handler for notification.json, that'll put users into tests and test groups based on the desired probability.
The notification identifiers and test groups should be appended to lastVersion using the following format: -someinfo/0-otherinfo/2. Here someinfo and otherinfo are notification identifiers. The number after the slash designates the test group.
Notifications are configured in the notifications repository, one file per notification with the notification identifier being determined by the file name. The format of these files should be extended as follows:
- Optional start and end parameters should determine when the notification should become active/inactive (specified as an ISO 8601 subset in the format %Y-%m-%dT%H:%M, e.g. 2015-06-20T07:24).
- The general notification settings can be followed up by sections determining variants of a notification. Names of the sections should be irrelevant beyond establishing an ordering. Each section should contain sample (e.g. 0.01 meaning 1%), title, message and optionally links settings for this variant.
- If the notification has title, message and links settings before the sections these become the default notification.
Test group identifiers should be generated as following: sort section names and number them in that order. Test group 0 should be reserved for the default notification or no notification if there is no default. The probability for a user to get into test group 0 should be: prob(0) = 1 - sum(prob(1)..prob(10000)).
Example: a notification notifications/someinfo:
severity = information start = 2015-03-17T11:00 end = 2015-03-22T11:00 target = adblockpluschrome [message shown] sample = 0.05 title.en-US = Some info. message.en-US = This is some info links = foobar
And a notification notifications/otherinfo:
severity = information start = 2015-03-17T11:00 end = 2015-03-22T11:00 target = adblockpluschrome title.en-US = Other info message.en-US = This is some other info [alternative] sample = 0.03 title.en-US = Other info message.en-US = Alternative message for other info.
Both notification run at the same time and have two test groups: someinfo/0, someinfo/1, otherinfo/0, otherinfo/1. Now, if the handler receives a request such as:
/notifications.json?...&lastVersion=201503160400
It will first check which notifications are currently active. On March 20, 2015 both someinfo and otherinfo notifications will be active. However, the lastVersion parameter has no test group for the someinfo notification - so it will assign the user randomly (5% for 1, 95% for 0). There is also no test identifier for the otherinfo notification - so it will assign the user randomly as well here (3% for 1, 97% for 0).
So the user will most likely get a response with lastVersion: "201503202215-otherinfo/0-someinfo/0". In that case he will get the default notification for someinfo (meaning no notification) and the default notification for otherinfo (meaning "This is some other info"). If he gets a response with lastVersion: "201503202215-otherinfo/1-someinfo/0" he will still get no notification for someinfo but "Alternative message for other info." for otherinfo.
On the next day the handler will receive a request like the following:
/notifications.json?...&lastVersion=201503202215-otherinfo/1-someinfo/0
Since both notifications are still active and no new notifications became active the response should keep the test identifiers: lastVersion: "201503212150-otherinfo/1-someinfo/0". The list of messages will be exactly the same as on the day before.
The same request on March 22 on the other hand will yield different results: the notifications expired and are no longer active. So the handler should remove the corresponding test identifiers and serve a response with lastVersion: "201503222205".
Note that the order of test identifiers has be fixed, meaning that we should sort them by notification identifier.
Caching should make this handler scale (test identifiers are the only relevant parameter here).
Attachments (0)
Change History (42)
comment:1 Changed on 03/18/2015 at 06:30:05 AM by fhd
- Cc trev sebastian added
comment:2 Changed on 03/18/2015 at 04:13:15 PM by sebastian
- Ready set
comment:3 follow-up: ↓ 5 Changed on 03/19/2015 at 07:49:04 AM by sebastian
This should generally work, but caching won't. Since the timestamp given in the lastVersion parameter will be different for virtually every request, but is a required parameter to determine the (new) test designator, we can't actually cache any request here. Also this logic seems to be too complex to be implemented by pure nginx configuration.
comment:4 Changed on 03/19/2015 at 09:51:37 AM by fhd
Yes I think you're right. We came to the conclusion that caching is going to work last time we discussed this, but it's not as simple as I thought. And the logic ended up becoming too complex for nginx configuration indeed. We won't be generating responses on the fly luckily, but we will still need to have all notification requests go through this handler.
But I think we should first make this run, then make it fast. We have ten times as many notification servers as we would need, since every filter server serves notifications too. So I'm not entirely sure that this will actually be a problem (although I have strong suspicions). And if we need to optimise, there's still some less elegant things we can do, a native nginx plugin should really do the trick if all else fails.
comment:5 in reply to: ↑ 3 ; follow-up: ↓ 6 Changed on 03/19/2015 at 12:12:42 PM by trev
- Keywords meta added
- Summary changed from Send notifications to a subset of the user base to [meta] Send notifications to a subset of the user base
Replying to sebastian:
This should generally work, but caching won't. Since the timestamp given in the lastVersion parameter will be different for virtually every request, but is a required parameter to determine the (new) test designator, we can't actually cache any request here.
There are two ways to deal with that issue:
- Just ignore it. We get millions of requests, yet the number of possible lastVersion values is comparably low. Yesterday one of our servers received 2,456,366 requests but only 116,992 unique lastVersion values. We could get that number down again by increasing the caching interval (currently 10 minutes). The point isn't getting that script to run only the absolutely required number of times, not having it run for each and every request would already suffice.
- Don't use $arg_lastVersion as cache parameter but only the channel part of it. Extracting the channel into a separate nginx variable is actually fairly easy and should make sure there aren't all too many responses to be cached.
I've marked this as a meta issue - implementing this will require a number of subissues.
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed on 03/19/2015 at 12:29:51 PM by sebastian
Replying to trev:
- Don't use $arg_lastVersion as cache parameter but only the channel part of it. Extracting the channel into a separate nginx variable is actually fairly easy and should make sure there aren't all too many responses to be cached.
As far as I understand it, this isn't an option, since the timestamp of the lastVersion parameter needs to be considered to identify users that have a channel assigned from a previous test and to identify those that don't have a channel assigned but were already excluded from the current test.
I've marked this as a meta issue - implementing this will require a number of subissues.
I agree that there should multiple issues, but for a meta issue this issue has too much implementation details. But on the other hand it has all the details required to write the web app serving those requests.
comment:7 in reply to: ↑ 6 Changed on 03/19/2015 at 12:44:38 PM by trev
Replying to sebastian:
As far as I understand it, this isn't an option, since the timestamp of the lastVersion parameter needs to be considered to identify users that have a channel assigned from a previous test and to identify those that don't have a channel assigned but were already excluded from the current test.
No, we don't want to consider the timestamp, the channel IDs should be sufficient for that. Note that there might be more than one channel ID, e.g. if two notifications are active at the same time. Each channel ID is only valid for the respective notification. So channel IDs for notifications that are no longer active should be ignored, and if a channel ID for an active notification is missing then the user should be assigned one.
Note that caching might affect the random distributions if we make the caching interval too long, so caching only on channel IDs should be the preferable solution.
And - yes, we should be moving the implementation details into subissues. I think it should be the following:
- Implement a web application to generate notification responses on the fly.
- Extend the notifications repository parser for the additional parameters.
- Make the web application consider channels (assign users to channels / keep them in the channels that are still relevant).
- Update stats processing to consider the new lastVersion format.
- Change notification server configuration to use the web application and cache response.
comment:8 Changed on 03/19/2015 at 12:49:37 PM by sebastian
- Ready unset
I don't mind. But this different from what fhd told me, and also different from what the issue description says. Please sort it out with him if you have objections, and update the description accordingly.
comment:9 follow-ups: ↓ 10 ↓ 11 Changed on 03/19/2015 at 01:06:47 PM by fhd
- Description modified (diff)
Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...
As far as caching the group designator goes: That should work, but does that really make much of a difference? The majority of the requests can't be cached that way - only a small minority of the user base will be in a test at any given time. Unless of course, we're putting people not part of a test into a special test group that means "not part of this test". I guess that's what you mean, Wladimir?
comment:10 in reply to: ↑ 9 Changed on 03/19/2015 at 01:18:51 PM by trev
There is lots of confusion here...
Replying to fhd:
Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...
Determining whether a notification is active (and the corresponding channel IDs valid) - that should be done based on the current time, no reason to use lastVersion for that.
As far as caching the group designator goes: That should work, but does that really make much of a difference? The majority of the requests can't be cached that way - only a small minority of the user base will be in a test at any given time.
No. Once a notification becomes active, each user should be assigned a channel ID for it. Typically, it will be either the "default" channel (no notification) or the "selected" channel (notification visible). Not assigning a user to a channel would mean that on next request the assigning process runs again and probabilities will be messed up.
Note that "no channel ID whatsoever" is also a valid cache identifier.
comment:11 in reply to: ↑ 9 Changed on 03/19/2015 at 01:19:45 PM by sebastian
Replying to fhd:
Yeah, Wladimir has a point - we don't have to inspect the timestamp in lastVersion at all when it comes to moving people out of the test again. We'll just do that based on the current time...
We still need a way to figure out if a given test designator belongs to a previous test (for example when the user were offline for a few days/weeks or when we schedule multiple tests subsequently). However, this can be done by making the test designators unique across all tests.
Also we need to distinguish users sending a request for the first time during the current test and users that have already been excluded from the test. So if we don't want to rely on the timestamp of the last download, we also have to assign test designators to users excluded from the test.
comment:12 Changed on 03/19/2015 at 02:16:12 PM by trev
- Description modified (diff)
comment:13 Changed on 03/19/2015 at 02:18:41 PM by fhd
Yes, start and end are not necessary at all if we just use the default channel for that, as you said earlier actually.
Discussed this (and a different format for tests in the notifications repository) a bit on IRC with Wladimir, who remembers our meeting on this remarkably well. He just updated the description accordingly.
comment:14 Changed on 03/19/2015 at 02:20:51 PM by trev
- Description modified (diff)
comment:15 Changed on 03/19/2015 at 02:25:57 PM by trev
- Description modified (diff)
comment:16 Changed on 03/19/2015 at 02:52:44 PM by fhd
Sounds good to me like this.
comment:17 Changed on 03/20/2015 at 10:09:59 AM by sebastian
- Ready set
comment:19 Changed on 03/24/2015 at 08:29:20 AM by Kirill
- Cc Kirill added
comment:24 Changed on 03/24/2015 at 09:16:02 AM by fhd
- Sensitive set
comment:25 Changed on 03/24/2015 at 02:29:50 PM by fhd
- Sensitive unset
- Verified working unset
comment:26 Changed on 03/24/2015 at 02:46:00 PM by fhd
- Owner set to fhd
comment:27 Changed on 03/30/2015 at 08:58:12 AM by annlee@adblockplus.org
- Blocked By 1951 added
comment:28 Changed on 04/04/2015 at 11:24:29 PM by fhd
- Blocked By 2274 added
comment:29 Changed on 04/04/2015 at 11:26:55 PM by fhd
- Blocked By 2275 added
comment:30 Changed on 04/04/2015 at 11:30:57 PM by fhd
- Blocked By 2276 added
comment:31 Changed on 04/04/2015 at 11:34:35 PM by fhd
- Description modified (diff)
comment:32 Changed on 04/04/2015 at 11:35:44 PM by fhd
- Blocked By 1951 removed
comment:33 Changed on 04/04/2015 at 11:48:16 PM by fhd
- Component changed from Sitescripts to Unknown
comment:34 Changed on 04/04/2015 at 11:51:48 PM by fhd
- Blocked By 2277 added
comment:35 Changed on 04/21/2015 at 09:11:46 AM by fhd
- Keywords 2015q2 added
comment:36 Changed on 06/18/2015 at 06:53:58 AM by Kirill
The probability for a user to get into test group 0 should be: prob(0) = 1 - sum(prob(1)..prob(10000)).
Technically this is wrong. If the user can get assigned to many groups, the probability to be unassigned is p(0) = (1-p(1))*(1-p(2))*(1-p(3))*...*(1-p(n))
comment:37 Changed on 06/22/2015 at 11:43:18 AM by Ross
- Blocked By 2705 added
comment:38 Changed on 06/22/2015 at 12:29:14 PM by Ross
- Blocked By 2707 added
comment:39 Changed on 06/22/2015 at 12:29:22 PM by Ross
- Blocked By 2698 added
comment:40 Changed on 06/22/2015 at 12:35:57 PM by Ross
Performed first pass of testing. Have not tested Chrome yet (was not in devbuild at time).
Working:
- Titles and messages with lots of text.
- Titles and messages with special characters.
- Group assignment is working as expected.
- Probability is working as expected.
- Alternative messages are working as expected.
Problems:
- Links are not displayed: #2698
- Start / end keys do not work: #2707
- Lack of error reporting/logging: #2705
ABP 2.6.9.3953
Firefox 38.0.5 / Windows 8.1 x64
comment:41 Changed on 06/22/2015 at 06:22:45 PM by fhd
- Blocked By 2705 removed
comment:42 Changed on 06/24/2015 at 07:56:52 AM by fhd
- Description modified (diff)
Updated the date format based on the discussion in https://codereview.adblockplus.org/29321041/.
comment:43 Changed on 06/29/2015 at 05:48:06 AM by Ross
- Blocked By 2730 added
comment:44 Changed on 06/30/2015 at 11:40:25 AM by Ross
- Blocked By 2730 removed
comment:45 Changed on 06/30/2015 at 01:12:52 PM by Ross
- Blocked By 2739 added
comment:46 Changed on 08/07/2015 at 07:08:50 AM by fhd
- Blocked By 2868, 2869 added
- Tester set to Unknown
comment:47 Changed on 08/12/2015 at 10:01:47 PM by fhd
- Resolution set to fixed
- Status changed from new to closed
This is fully rolled out now.
Sebastian/Wladimir: Haven't set this to Ready yet, please have a look.
How tests are configured is just something I came up with this morning, let me know if you can think of a better way. Also, I'm not entirely sure about calling the whole thing "tests" to begin with, but couldn't come up with anything less confusing.