Opened 4 years ago

Closed 2 years ago

#3273 closed change (rejected)

Extend telemetry data by additional information

Reported by: mario Assignee:
Priority: Unknown Milestone:
Module: Adblock-Plus-for-Firefox Keywords: 2016q1
Cc: saroyanm, trev, matze Blocked By: #394
Blocking: Platform: Firefox
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by saroyanm)

Background

#495 introduces "Telemetry", formally known as "Filter Hit Statistics". #394 covers the client side implementation, i.e. collecting and regularly sending telemetry data to our backend after an explicit opt-in.
As soon as #394 has landed, the following changes should be implemented on the client side.

By sending additional information (described in "want to change"), the following requirements can be met:

  1. Further improve anonymization of the data.
  2. Identify filter lists where none of the filters actually hit.
  3. Identify often visited domains with an unusual high amount of filter hits in order to improve filter rules.
  4. Identify the environment's locale.

What to change

  1. Add a new attribute to the JSON on root level called "filterListSubscriptions" which includes an array of all subscribed filter lists.
      "filterListSubscriptions": ["https://easylist-downloads.adblockplus.org/easylist.txt", "https:// ..."]
    
  1. Add a new attribute to the JSON on root level called "domains" which includes an object of all visited domains containing the number of page impressions within this domain.
      "domains": {
        "example.com": {
          "pages": 143       // Number of page impressions within this domain
        },
        "example.org": {"pages": 12}
      }
    
  1. Add a new attribute to the JSON on root level called "appLocale" which describes the browser's locale.
      "appLocale": "en-US",
    

This is en example of the full JSON format containing the changes described above.

{
  "version": 1,               // For the server to recognize outdated clients
  "timeSincePush": 12345,     // UTC Time interval (seconds in 1h-steps) since previous push
  "addonName": "adblockplus", // see require("info")
  "addonVersion": "2.3.4",    // see require("info")
  "application": "firefox",   // see require("info")
  "applicationVersion": "31", // see require("info")
  "platform": "gecko",        // see require("info")
  "platformVersion": "31",    // see require("info")
  "appLocale": "en-US",       // see Utils.appLocale (actually ABP locale)
  "filterListSubscriptions": ["https://easylist-downloads.adblockplus.org/easylist.txt", "https:// ..."]  // All filter list subscriptions
  "domains": {
    "example.com": {
      "pages": 143       // Number of page impressions within this domain
    },
    "example.org": {"pages": 12}
  },
  "filters": {
    "||example.com^": {
      "firstParty": {
        "example.com": {
          "hits": 12,              // Number of hits
          "latest": 123456789      // UTC Time interval of last hit (in 1h-steps)
        },
        "example.org": {"hits": 4, "latest": 987654321}
      },
      "thirdParty": {
        "example.com": {"hits": 5, "latest": 123455489}
      },
      "subscriptions": ["https://easylist-downloads.adblockplus.org/easylist.txt", "https:// ..."]  // Subscription source of filter
    },
    "example.com##foo > bar": {
      ...
    }
  }
}

Note: The format might change. For the original JSON format please consult #394.

Change History (12)

comment:1 in reply to: ↑ description ; follow-up: Changed 4 years ago by Kirill

Replying to mario:

  1. Add a new attribute to the JSON on root level called "domains" which includes an object of all visited domains containing the number of pages loaded from this domain.
      "domains": {
        "example.com": {
          "pages": 143       // Number of pages loaded from this domain
        },
        "example.org": {"pages": 12}
      }
    

"Number of pages" might be misleading. We don't only need to know how many pages a user opened on a domain, but how often he did so. Maybe this is what you meant and what is clear to everyone else, but I wanted to point it out to prevent misunderstandings...

comment:2 Changed 4 years ago by mario

  • Description modified (diff)

You're right. Changed "number of pages loaded" to "number of page impressions" to make this more clear.

comment:3 Changed 4 years ago by saroyanm

  • Cc saroyanm added

comment:4 in reply to: ↑ description ; follow-up: Changed 4 years ago by saroyanm

  • Cc trev matze added

Replying to mario:

  1. Add a new attribute to the JSON on root level called "domains" which includes an object of all visited domains containing the number of page impressions within this domain.
      "domains": {
        "example.com": {
          "pages": 143       // Number of page impressions within this domain
        },
        "example.org": {"pages": 12}
      }
    

Is there a reason of storing the page views as separate object ? Also why one object has "pages" key, another not ? What about:

  "domainViews": {
    "example.com": 143,    // Number of page impressions within this domain
    "example.org": 12
  }

comment:5 in reply to: ↑ 1 ; follow-up: Changed 4 years ago by saroyanm

Replying to Kirill:

Replying to mario:
"Number of pages" might be misleading. We don't only need to know how many pages a user opened on a domain, but how often he did so. Maybe this is what you meant and what is clear to everyone else, but I wanted to point it out to prevent misunderstandings...

Not sure if I understand what you mean and how impressions should be calculated, can you please describe a bit what exactly we need to calculate ?

comment:6 in reply to: ↑ 4 Changed 4 years ago by Kirill

Replying to saroyanm:

Is there a reason of storing the page views as separate object ? Also why one object has "pages" key, another not ? What about:

  "domainViews": {
    "example.com": 143,    // Number of page impressions within this domain
    "example.org": 12
  }

The reason is, that we had another paramater in there which got removed, but the structure stayed. I like your suggestion, but if we will add parameters to domains (like last visited or something different), then we would change the structure again to the original proposed one. I frankly don't know what is better here, a simple or an extensible format....

comment:7 in reply to: ↑ 5 Changed 4 years ago by Kirill

Replying to saroyanm:

Not sure if I understand what you mean and how impressions should be calculated, can you please describe a bit what exactly we need to calculate?

What we want, is somehow a count of visits of a domain. Not really sure how to do it technically.

comment:8 follow-up: Changed 4 years ago by trev

Actually, there just shouldn't be a separate "domains" object - this is still data related to filter hits, not some general behavior tracking. In other words:

    "||example.com^": {
      "firstParty": {
        "example.com": {
          "hits": 12,              // Number of hits
          "latest": 123456789,     // UTC Time interval of last hit (in 1h-steps)
          "pages": 12              // Number of page impressions
        },
        "example.org": {"hits": 4, "latest": 987654321, "pages": 2}
      },
    },

comment:9 in reply to: ↑ 8 Changed 4 years ago by saroyanm

Replying to trev:

Actually, there just shouldn't be a separate "domains" object - this is still data related to filter hits, not some general behavior tracking. In other words:

    "||example.com^": {
      "firstParty": {
        "example.com": {
          "hits": 12,              // Number of hits
          "latest": 123456789,     // UTC Time interval of last hit (in 1h-steps)
          "pages": 12              // Number of page impressions
        },
        "example.org": {"hits": 4, "latest": 987654321, "pages": 2}
      },
    },

What about page impressions on the page where we don't have hit ?
ex.: user visit example.com/no-ad page where is no add, should we update the page impression for each filter that have been hit previously on other pages ?
If so doesn't sounds efficient with current implementation. I would say we will need to change the data structure to make it efficient in that case. ex.:

   "example.com": {
     "firstParty": {
       "||example.com^": {
           "hits": 12,              // Number of hits
           "latest": 123456789     // UTC Time interval of last hit (in 1h-steps)
        }
     },
     "thirdParty": {
       ...
     },
     impression: 20
   }
   "example.org": {
     "firstParty": {
       "||example.com^": { "hits": 6, "latest": 12345678 }
     },
     impression: 30
   }

The question is when we need to update the Impression, only in case we had a filter hit on the specific domain ?
Maybe I just don't understand your proposed structure.

comment:10 Changed 4 years ago by mario

  • Keywords 2016q1 added; 2015q4 removed

comment:11 Changed 4 years ago by saroyanm

  • Description modified (diff)

Removed the first point while we decided to implement that in the initial review.

Last edited 4 years ago by saroyanm (previous) (diff)

comment:12 Changed 2 years ago by trev

  • Resolution set to rejected
  • Status changed from new to closed

Mass-closing all bugs in Adblock Plus for Firefox module, the codebase of Adblock Plus 3.0 belongs into Platform and User-Interface modules. Old bugs are unlikely to still apply.

Note: See TracTickets for help on using tickets.