Opened on 08/25/2015 at 09:20:19 AM
Closed on 10/12/2015 at 11:34:00 AM
#2951 closed defect (fixed)
abpcrawler output filenames all start with "None" instead of domain
Reported by: | philll | Assignee: | trev |
---|---|---|---|
Priority: | P4 | Milestone: | |
Module: | Extensions-for-Adblock-Plus | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Platform: | Firefox | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description
Environment
Debian Jessie
abpcrawler 387:0d9a4db7d073
Firefox 40
How to reproduce
- Execute the crawler with the attached urls.txt file
Observed behaviour
All output file names start with "None".
Expected behaviour
All output file names should start with the respective domain.
Attachments (1)
Change History (6)
Changed on 08/25/2015 at 09:21:12 AM by philll
comment:1 Changed on 08/25/2015 at 12:22:53 PM by philll
- Resolution set to invalid
- Status changed from new to closed
comment:2 Changed on 08/25/2015 at 08:23:49 PM by trev
- Resolution invalid deleted
- Status changed from closed to reopened
I looked a bit more into this and the problem is that we open the URLs in Firefox, yet determining the host name happens in Python. Firefox tries various things to get a "proper" URL, Python won't. Ideally, we would use the same logic in both cases, accessing the Firefox logic isn't really feasible however. So I think adding the scheme if there is none should be good enough as solution.
comment:3 Changed on 08/25/2015 at 08:23:57 PM by trev
- Owner set to trev
comment:4 Changed on 08/25/2015 at 08:26:08 PM by trev
- Component changed from Unknown to Extensions-for-Adblock-Plus
- Platform changed from Unknown / Cross platform to Firefox
- Priority changed from Unknown to P4
- Ready set
- Review URL(s) modified (diff)
- Status changed from reopened to reviewing
comment:5 Changed on 10/12/2015 at 11:34:00 AM by trev
- Resolution set to fixed
- Status changed from reviewing to closed
Note: See
TracTickets for help on using
tickets.
The URL list in use didn't specific full URLs but host names only.