Changes between Version 14 and Version 15 of Ticket #395


Ignore:
Timestamp:
10/07/2014 03:09:51 PM (5 years ago)
Author:
trev
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #395

    • Property Cc kzar added; dave@… removed
  • Ticket #395 – Description

    v14 v15  
    11=== Background === 
    2 This ticket is part of the #495 (Filter Hit Statistics Tool group). We're aiming to cut down on unused filters by allowing users to opt-in and share their filter usage data with us. This ticket is related to the back-end database and related APIs.  
     2See #495. 
    33 
    44=== What to change === 
    55Create a web application (e.g. Flask) that collects filter hit statistics sent by Adblock Plus (as of #394) anonymously. 
    66 
    7 Filter hits will be collected for a certain period of time by the browser, for example 1 week, and then be submitted to the web application via a POST request as JSON. 
    8  
    9 The JSON submitted will be a map of filters, each filter key being a map of domains with each domain key being the number of hits. 
     7Filter hits will be collected for a certain period of time by the browser, for example 1 week, and then be submitted to the web application via a POST request as JSON. See #394 for the exact data format. 
    108 
    119This server will then store the raw data as JSON in either flat files or noSQL/similar database and also will store aggregated data in a MySQL database. In the future if the data proves too much we might stop recording the raw data at all. 
     
    1715 1.) Query by filter. "For this filter show me which domains matched and how often." 
    1816 2.) Query by domain. "For this domain which filters matched and how often?" 
    19  
    20 Server should be set up in the infrastructure repository with Puppet scripts etc. It will be a dedicated server called "hitstats". 
    21  
    22 === Questions === 
    23  - What storage should we use for the raw data? We want to balance simplicity and speed against the usefulness of the format. E.g. MongoDB might be more useful for querying but be complex to maintain or struggle under a large load.  
    24  - What kind of authentication are we going to use to prevent outside people from performing the queries? 
    25  - What exact format will the browser send the data to the server in? 
    26  - What querying of the raw data would be useful?