Opened on 04/01/2014 at 01:57:38 PM
Closed on 09/16/2017 at 12:26:36 PM
#240 closed change (fixed)
Move reports.adblockplus.org to a separate server
Reported by: | trev | Assignee: | trev |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Infrastructure | Keywords: | |
Cc: | mathias@adblockplus.org | Blocked By: | #1495 |
Blocking: | Platform: | Unknown | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
https://github.com/mjhennig/adblockplus-infrastructure/pull/22 |
Description
Background
reports.adblockplus.org requires quite a bit of CPU power for the recurring tasks (report parsing and digest updates). The main server which is currently running those has more than enough to do already, it shouldn't be doing that.
What to change
Create a server configuration for reports.adblockplus.org in the infrastructure repository and migrate that task to a separate server.
Attachments (0)
Change History (13)
comment:1 Changed on 07/29/2014 at 06:22:49 PM by matze
- Platform set to Unknown
comment:2 Changed on 07/29/2014 at 06:23:03 PM by matze
- Cc mathias@adblockplus.org added
comment:3 Changed on 07/30/2014 at 10:18:47 AM by trev
No, nothing on the main server is configured via Puppet right now. Here is how it is currently set up:
- Reports are handled by modules under sitescripts.reports.
- There is a schema.sql file in that directory showing which can be used to initialize the database.
- There is also a static directory there which is the web server root.
- multiplexer.fcgi needs to run, some URLs are being handled by it.
- Some URLs are being forwarded to the pregenerated files in the data directory (configured in sitescripts.ini).
- The necessary sitescripts.ini entries can be seen in the .sitescripts.example file in the sitescripts repository root, section [reports] is relevant here. The configuration on the server is very much like this example, digestDays is set to 30 and defaultSubscriptionRecipient is Wladimir Palant <trev@adblockplus.org> however (yes, we probably want to do something about that setting in future).
- There is a number of cron jobs related to issue reports:
* * * * * python -m sitescripts.reports.bin.parseNewReports 35 * * * * python -m sitescripts.reports.bin.updateSubscriptionList 45 * * * * python -m sitescripts.reports.bin.updateDigests 15 0 * * * python -m sitescripts.reports.bin.removeOldReports 20 0 * * * python -m sitescripts.reports.bin.removeOldUsers 35 2 * * * python -m sitescripts.reports.bin.mailDigests day 50 2 * * 0 python -m sitescripts.reports.bin.mailDigests week 0 50 2 * * 1 python -m sitescripts.reports.bin.mailDigests week 1 50 2 * * 2 python -m sitescripts.reports.bin.mailDigests week 2 50 2 * * 3 python -m sitescripts.reports.bin.mailDigests week 3 50 2 * * 4 python -m sitescripts.reports.bin.mailDigests week 4 50 2 * * 5 python -m sitescripts.reports.bin.mailDigests week 5 50 2 * * 6 python -m sitescripts.reports.bin.mailDigests week 6
- The current nginx configuration for the subdomain looks like this:
access_log <snip>/access_log_reports main; root <snip>/reports; add_header Strict-Transport-Security "max-age=2592000"; charset utf-8; location / { } location /submitReport { fastcgi_pass unix:<snip>/multiplexer-fastcgi.sock; include fastcgi_params; } location /updateReport { fastcgi_pass unix:<snip>/multiplexer-fastcgi.sock; include fastcgi_params; } location /showUser { internal; fastcgi_pass unix:<snip>/multiplexer-fastcgi.sock; include fastcgi_params; fastcgi_param REQUEST_URI $uri; } location /data { internal; } location /in_progress.html { internal; } location /user { rewrite "^/user/([\da-f]{32})$" /showUser?id=$1 last; return 404; } location = /digest { fastcgi_pass unix:<snip>/multiplexer-fastcgi.sock; include fastcgi_params; } location ~ "^/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}$" { if ($request_uri ~ "^/(([\da-f])([\da-f])([\da-f])([\da-f])[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})") { set $target /data/$2/$3/$4/$5/$1; } if (-f $document_root$target.html) { rewrite ^ $target.html last; } if (-f $document_root/data$request_uri.xml) { rewrite ^ /in_progress.html last; } return 404; } location ~ "^/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}\.png$" { if ($request_uri ~ "^/(([\da-f])([\da-f])([\da-f])([\da-f])[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})") { set $target /data/$2/$3/$4/$5/$1.png; } if (-f $document_root$target) { rewrite ^ $target last; } return 404; }
Note that the main server is copying files from the static directory into the web root that also contains the data directory - I guess that we don't want it to be set up like this on the new server. The web root should rather be pointing directly to the repository and /data should be an alias pointing somewhere else.
How the system works:
- /submitReport URL receives new reports and stores them as Python dumps on disk.
- /parseNewReports runs every minute, processes these dumps, writes the data into the database and generates a report page.
- Each report is assigned a GUID that determines the URL under which it is accessible. If the report page hasn't been generated yet the server shows in_progress.html instead.
- Subscription authors get daily or weekly digests listing their issue reports.
- There are also static web pages containing report listings for each subscription, subscription authors can access them via the /digest URL.
- Subscription authors can update reports with a new status, this happens via /updateReport URL - this updates the database, regenerates the report page and optionally notifies the reporter as well.
- Issue reports expire after 30 days, at this point they will be completely removed from the server.
That's hopefully all you need to know, don't hesitate to ask if I missed something.
comment:4 Changed on 08/06/2014 at 11:44:57 AM by matze
- Owner set to matze
Ok, thank you. I'll look into it, importing the above information and configuration into puppet(8).
comment:5 Changed on 08/11/2014 at 01:37:50 PM by matze
@palant/trev Please provide me with an access log snippet (from access_log_reports) or an entire file with various examples for the URIs requested. It could speed up testing a lot, especially since one would otherwise need to examine possible invocations from the Plugin and the source-code (in order to create a service map), which is probably a bit time-consuming and not as accurate as necessary.. Especially since most invalid invocations simply return HTTP errors and no hint on what went wrong.
comment:6 Changed on 08/13/2014 at 08:55:03 AM by matze
- Blocked By 1203 added
comment:7 Changed on 08/13/2014 at 01:05:22 PM by matze
- Blocked By 1203 removed
comment:8 Changed on 08/14/2014 at 08:21:48 PM by trev
I cannot just post the access logs (see privacy policy). Here are the log entries for a report I just submitted myself:
x.x.x.x - - [14/Aug/2014:20:03:54 +0000] "POST /submitReport?version=1&guid=4cf0172b-af12-d24e-a762-6a10608a0536&lang=en-US HTTP/1.1" 200 1219 "-" "Mozilla/5.0 ..." x.x.x.x - - [14/Aug/2014:20:03:57 +0000] "GET /4cf0172b-af12-d24e-a762-6a10608a0536 HTTP/1.1" 200 876 "-" "Mozilla/5.0 ..."
Opening the digest (let me know if you need help to generate digest ID and secret in the test environment):
x.x.x.x - - [14/Aug/2014:20:07:29 +0000] "GET /digest?id=...&secret=... HTTP/1.1" 302 374 "-" "Mozilla/5.0 ..."
There I have a link like "https://reports.adblockplus.org/4cf0172b-af12-d24e-a762-6a10608a0536#secret=..." (the secret doesn't show up in the logs) which allows me to update status. Actually updating it produces the following request:
x.x.x.x - - [14/Aug/2014:20:16:40 +0000] "POST /updateReport HTTP/1.1" 400 1019 "https://reports.adblockplus.org/4cf0172b-af12-d24e-a762-6a10608a0536" "Mozilla/5.0 ..."
And looking up the user's profile (referrer has been changed, I was using a different report which wasn't submitted anonymously):
x.x.x.x - - [14/Aug/2014:20:18:57 +0000] "GET /user/... HTTP/1.1" 200 1357 "https://reports.adblockplus.org/4cf0172b-af12-d24e-a762-6a10608a0536" "Mozilla/5.0 ..."
comment:9 Changed on 10/20/2014 at 04:43:38 PM by AAlvz
@trev @palant How is the multiplexer.fgci running in the server now? And the multiplexer.py? When submiting a Report what is used in sitescripts.ini?
We're looking for the current flow in the code when submitting a report because right now, when we try to submit a report, the generation of the XML that contains all the information stops. Because of this the submit report process never ends and the XML code is never fully generated unless you cancel the report submission.
comment:10 Changed on 10/23/2014 at 08:21:54 AM by matze
- Blocked By 1495 added
comment:11 Changed on 01/19/2015 at 10:27:01 AM by AAlvz
- Review URL(s) modified (diff)
- Status changed from new to reviewing
comment:12 Changed on 09/16/2017 at 12:26:01 PM by trev
- Owner changed from matze to trev
- Tester set to Unknown
comment:13 Changed on 09/16/2017 at 12:26:36 PM by trev
- Resolution set to fixed
- Status changed from reviewing to closed
This has been done in http://hub.eyeo.com/issues/3559
Is it possible that this part of our infrastructure isn't configured using puppet(8) yet? At least I haven't found any setup information in any repository so far..