Opened on 03/10/2016 at 06:31:37 PM
Closed on 03/15/2016 at 11:31:30 PM
#3774 closed change (fixed)
Support multiple mirrors for the Malware Domains List
Reported by: | matze | Assignee: | |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Sitescripts | Keywords: | goodfirstbug |
Cc: | kvas, sebastian, trev | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by sebastian)
Background
The script that converts the Malware Domains List into an Adblock Plus filter list, currently relies on a single mirror, i.e.mirror3.malwaredomains.com.
As of now, this mirror blocks out requests sent with Python's urllib module's default user agent string, while other mirrors don't have that issue.
Regardless of this particular issue, it would make sense to support a list of mirrors, so that when one mirror fails we automatically fallback to another mirror from that list.
What to change
Support a list of mirrors, so that when downloading the Malware Domains List fails, the next mirror in the list is tried. Try following mirrors in that order:
- mirror3.malwaredomains.com (this is the one we got initially told to use)
- mirror1.malwaredomains.com
- mirror2.malwaredomains.com
Attachments (0)
Change History (15)
comment:1 Changed on 03/10/2016 at 07:13:47 PM by matze
- Cc kvas sebastian palant added
- Component changed from Infrastructure to Sitescripts
- Priority changed from P2 to Unknown
comment:2 Changed on 03/10/2016 at 07:25:47 PM by matze
- Owner matze deleted
comment:3 Changed on 03/10/2016 at 08:29:46 PM by sebastian
I contacted the malware domains list maintainers. Let's wait for their response.
comment:4 Changed on 03/10/2016 at 08:30:14 PM by matze
Awesome, thank you!
comment:5 Changed on 03/10/2016 at 10:19:08 PM by sebastian
They are going to try to resolve the issue. However, I learned that there are multiple mirrors of which only the one we currently use seem to have that problem. So I guess it would make sense to have our script use a list of these mirrors, falling back to the next one if one fails. That will also be more robust against other potential network/server issues.
comment:6 Changed on 03/11/2016 at 03:14:34 PM by sebastian
- Description modified (diff)
- Keywords goodfirstbug added
- Priority changed from Unknown to P2
- Type changed from defect to change
I've updated the issue description, in order to fallback to another mirror when the request fails. This seems to be a good first bug. @kvas, do you want to have a try?
comment:7 Changed on 03/11/2016 at 03:21:39 PM by sebastian
- Summary changed from Fix malware domain list updates to Support multiple mirrors for the Malware Domains List
comment:8 Changed on 03/12/2016 at 02:06:30 PM by trev
Before changing anything we should check with MalwareDomains maintainer - he explicitly asked us to use that mirror and not the others.
comment:9 Changed on 03/12/2016 at 02:07:02 PM by trev
- Cc trev added; palant removed
comment:10 Changed on 03/12/2016 at 02:12:08 PM by sebastian
Well, as indicated above, I talked to them. And they pointed out that there are multiple mirrors. However, implementing fallback logic was my idea. But they didn't object.
comment:11 Changed on 03/12/2016 at 02:14:59 PM by matze
I guess we can make sure the one explicitly asked for is always the first one tried, just to be on the safe side. If that one fails I don't believe anybody expects us to wait, not when there are working alternatives.
comment:12 Changed on 03/12/2016 at 05:23:40 PM by sebastian
- Description modified (diff)
comment:13 Changed on 03/14/2016 at 05:06:39 PM by kvas
- Blocked By 3799 added
comment:14 Changed on 03/15/2016 at 11:10:15 PM by abpbot
A commit referencing this issue has landed:
https://hg.adblockplus.org/sitescripts/rev/e33e438e49cc
comment:15 Changed on 03/15/2016 at 11:31:30 PM by kvas
- Blocked By 3799 removed
- Resolution set to fixed
- Status changed from new to closed
The HTTP server behind the malware domain list source blocks our requests via Python's urllib:
I suppose we can easily work around this issue by changing the User-Agent header sent, based on the fact that wget(1) from the same host succeeds in downloading the ZIP archive:
In order to avoid similar issues in the future, we may want to contact the publisher and ask for reliable access, or invest some time in searching for or developing a more flexible approach, at least.