Opened on 04/15/2016 at 10:06:45 AM

Last modified on 10/03/2019 at 07:23:36 PM

#3938 new change

Configure the environment to automatically run abpcrawler.

Reported by: sergz Assignee:
Priority: P2 Milestone:
Module: Infrastructure Keywords: abpcrawler
Cc: matze, fred, fhd, nicole, philll, TobiasHilleke Blocked By:
Blocking: #3936 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by sergz)

Background

In order to automate some quality related issues we need to run crawler on a periodic basis.

What to change

  • prepare a single virtual machine (linux-based). It should have enough space to store gathered information.
  • configure to run abpcrawler with some interval
  • to be continued...

If it takes longer then convert the current issue into meta issue.

Relevant issues


Attachments (0)

Change History (5)

comment:1 Changed on 04/15/2016 at 12:58:08 PM by sergz

  • Description modified (diff)

comment:2 Changed on 04/22/2016 at 01:50:40 PM by sergz

Please find the results below to estimate the hardware requirements.

I have run the crawler with 1000 randomly picked URLs from the commit log of easylist. BTW, the number of extracted URLs is a bit more than 73k URL.
Some results:
Number of files in the output folder:

  • json - 1000
  • xml - 957
  • jpg - 954

The size of output folder is 662M.
Starting from some point (> 200 URLs) the avg size of files:

  • json - ~50K
  • xml - ~170K
  • jpg - ~500K

Just for reference std of the file sizes is growing with the number of processed URLs only for xml files, for json and jpg it's pretty constant, although quite big.

Firefox is actually eating a lot of memory. I've observed that with 2GM RAM (and 2GB swap) the memory usage (resident memory) is about 1GB regardless of the number of tabs (I tried with 2, 4, 8, 30), however sometimes it grows up to 1.8 GB and it seems can be even bigger, I guess, it depends on the tab content and GC.

Firefox 45 (release).
Ubuntu 15.10, x86_64, 2GB RAM, 2 Cores.

comment:3 Changed on 04/22/2016 at 02:19:11 PM by trev

Please note that results for memory usage only apply to your specific configuration. If you give Firefox 1 GB it will likely use less memory, with 4 GB it will likely use more. Garbage collection depends on memory pressure.

comment:4 Changed on 05/06/2016 at 09:20:18 AM by fhd

  • Priority changed from Unknown to P2

IMHO this is very important, setting to P2.

comment:5 Changed on 12/21/2017 at 11:31:08 AM by fhd

  • Cc trev removed

Add Comment

Modify Ticket

Change Properties
Action
as new .
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from (none).
Next status will be 'reviewing'.
 
Note: See TracTickets for help on using tickets.