Opened on 01/18/2017 at 09:48:22 AM

Closed on 03/23/2017 at 08:02:04 PM

Last modified on 03/30/2017 at 12:27:22 PM

#4814 closed change (fixed)

Collect the Acceptable Ads Committee application form data

Reported by: jobp Assignee: jsonesen
Priority: Unknown Milestone:
Module: Sitescripts Keywords:
Cc: vickyyu, ben, kvas, jsonesen Blocked By:
Blocking: #4815 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29374647/

Description (last modified by kvas)

Background

Also see: https://issues.adblockplus.org/ticket/4815

If things go well, we hope to receive quite some applications which are submitted through "https://acceptableads.com/en/committee/apply".

In order for us to not having to manually add all those applications to some database / list, it would be great if this data can be collected outside the emails that we are already getting.

What to change

Instead of only receiving an email, we would like the data submitted to be collected in a CSV file.

  1. Configurable Filename and Path (optional)
    • <form_name>.csv=<log_full_path>
    • this functionality already exists, and just needs to be configured as such and documented
  1. Make Template Option Field Optional
    • this would require changes in two methods make_handler() and handler()
    • handler() break out the logic that evaluates whether a field is mandatory or not into a new method
    • make_handler() get rid of statically coded strings in relation to template and implement the new mandatory option checker
  1. If the CSV Doesnt Exist Then Write New File With Header
    • use filesystem and check for file presence if exists get reader and check fieldnames
    • if fieldnames match then append to file, if not write error log
  1. Add Timestamp Field To CSV
    • already appended prior to being sent to the sendmail method
    • may have issues with order here, but everything else is an ordered dict.
  1. Add Field Name Validation
    • if log exists but field names do not match, log error and write to new csv

Attachments (0)

Change History (21)

comment:1 Changed on 01/18/2017 at 09:52:51 AM by jobp

  • Description modified (diff)

comment:2 Changed on 01/18/2017 at 10:02:07 AM by matze

  • Blocking 4815 added
  • Cc kvas jsonesen added
  • Component changed from Unknown to Sitescripts

@kvas @jsonesen

If the handler would not only render and send the e-mail but also allow for storing the submitted data in a CSV file or similar, this should be enough for this one already.

comment:3 Changed on 01/23/2017 at 02:00:49 PM by jsonesen

  • Owner set to jsonesen

comment:4 Changed on 01/25/2017 at 12:15:49 PM by jsonesen

So I think this will be done by implementing a write function to the end of the form_handler method which will append to a file which is basically "form_name.csv" since each site can have multiple configurations of forms and fields. Does this sound ok?

comment:5 follow-up: Changed on 01/25/2017 at 01:08:07 PM by kvas

Hi Jon,

Your plan looks good. Here are a few suggestions about how to make it greatTM ;)

  • It would be good to make the filename and path to the file configurable. Something like this (see #4377 for more detail on how formmail2 is configured):
    <name>.csv_log = <csv-log-path>
    
  • In the same change we could make <name>.template configuration variable optional and only render and send the email if it is present (we should probably check that at least one of .csv_log and .template is present though to prevent accidental misconfiguration of a form that does nothing at all). This is useful to deal with the situations when a form gets too many requests and overwhelms the email system (I remember there was some talk about this in the ops meeting).
  • If the CSV file doesn't exist yet, it should be created and the first line with the field names should be written into it. This way the file will be easier to understand once it's taken out of the server where it originates. The fields should be ordered the same way as their order in the configuration file (ConfigParser seems to preserve order so we can use that). When we output the data, sorting of the fields should also be in the configuration file order.
  • Add a field called timestamp to the title line (see previous item of the list) and to each data line. We're already passing a timestamp to the mail template so we can use it here as well.

We can also think about what to do when the configuration changes and the list of fields becomes different. In this case the field names at the beginning of the CSV file will not match the field values in the new lines. We could detect this and start writing to another file or something like this. IMO it's too fancy for this ticket so I would just mention it in the docs and hope that whoever configures the script has enough sanity to prevent the field name confusion.

comment:6 in reply to: ↑ 5 ; follow-up: Changed on 01/25/2017 at 02:07:29 PM by jsonesen

Replying to kvas:

Hi Jon,

Your plan looks good. Here are a few suggestions about how to make it greatTM ;)

  • It would be good to make the filename and path to the file configurable. Something like this (see #4377 for more detail on how formmail2 is configured):
    <name>.csv_log = <csv-log-path>
    

Sure, this seems like a good idea. I still wonder if having it as CSV files on the server is the best solution. Since this data will need to be made accessible.


  • In the same change we could make <name>.template configuration variable optional and only render and send the email if it is present (we should probably check that at least one of .csv_log and .template is present though to prevent accidental misconfiguration of a form that does nothing at all). This is useful to deal with the situations when a form gets too many requests and overwhelms the email system (I remember there was some talk about this in the ops meeting).


yeah I agree here.

  • If the CSV file doesn't exist yet, it should be created and the first line with the field names should be written into it. This way the file will be easier to understand once it's taken out of the server where it originates. The fields should be ordered the same way as their order in the configuration file (ConfigParser seems to preserve order so we can use that). When we output the data, sorting of the fields should also be in the configuration file order.

Wouldn't this cause problems automating the ingestion of the csv to a sheet, or database?

  • Add a field called timestamp to the title line (see previous item of the list) and to each data line. We're already passing a timestamp to the mail template so we can use it here as well.

no problem

We can also think about what to do when the configuration changes and the list of fields becomes different. In this case the field names at the beginning of the CSV file will not match the field values in the new lines. We could detect this and start writing to another file or something like this. IMO it's too fancy for this ticket so I would just mention it in the docs and hope that whoever configures the script has enough sanity to prevent the field name confusion.

This leads me to believe that we would be better of with a database solution.

comment:7 in reply to: ↑ 6 Changed on 01/25/2017 at 03:36:50 PM by kvas

Replying to jsonesen:

Replying to kvas:
...

  • If the CSV file doesn't exist yet, it should be created and the first line with the field names should be written into it. This way the file will be easier to understand once it's taken out of the server where it originates. The fields should be ordered the same way as their order in the configuration file (ConfigParser seems to preserve order so we can use that). When we output the data, sorting of the fields should also be in the configuration file order.

Wouldn't this cause problems automating the ingestion of the csv to a sheet, or database?

Spreadsheets usually have a mode of import when the first line becomes headers. Even if not, it just becomes first row so that also works. I'm not sure about database importing tools but if we use a Python script to do the import csv.DictReader will take care of this nicely. In general having the field names in the first line seems pretty common in CSV files so I don't think this will be a problem.

We can also think about what to do when the configuration changes and the list of fields becomes different. In this case the field names at the beginning of the CSV file will not match the field values in the new lines. We could detect this and start writing to another file or something like this. IMO it's too fancy for this ticket so I would just mention it in the docs and hope that whoever configures the script has enough sanity to prevent the field name confusion.

This leads me to believe that we would be better of with a database solution.

Well, the database will require a schema change and data migration in this scenario so it's also not for free. Of course if it's a NoSQL database like MongoDB, things will mostly just work, but we don't use MongoDB anywhere at the moment and I'm not sure we want to start just because of this ticket.

In general using a database does have some advantages with regards to how the data is accessed but it requires more work to set it up and creates additional dependencies. I'd just go with CSV for now, since it's simple and does the job. Then in the future we could upgrade this code to a more hi-tech solution if we see any issues with the CSV approach.

comment:8 Changed on 02/02/2017 at 01:55:08 PM by jsonesen

  • Description modified (diff)

comment:9 Changed on 02/02/2017 at 02:15:07 PM by jsonesen

  • Description modified (diff)

comment:10 Changed on 02/02/2017 at 02:19:10 PM by jsonesen

  • Description modified (diff)

comment:11 Changed on 02/02/2017 at 02:24:12 PM by jsonesen

  • Description modified (diff)

comment:12 Changed on 02/06/2017 at 05:29:12 PM by jsonesen

  • Description modified (diff)

comment:13 Changed on 02/08/2017 at 05:44:19 PM by kvas

  • Description modified (diff)

comment:14 Changed on 03/18/2017 at 09:16:46 AM by jsonesen

  • Review URL(s) modified (diff)

comment:15 Changed on 03/18/2017 at 09:17:01 AM by jsonesen

  • Status changed from new to reviewing

comment:16 Changed on 03/20/2017 at 09:55:31 AM by abpbot

A commit referencing this issue has landed:
Issue 4814 - Adds csv log to formmail2

comment:17 Changed on 03/20/2017 at 09:58:29 AM by jsonesen

I am sorry, this was an accidental push I meant to exclude the revision for the csv_log in the push

comment:18 Changed on 03/23/2017 at 07:53:28 PM by abpbot

A commit referencing this issue has landed:
Issue 4814 - Adds csv logging to formmail2

comment:19 Changed on 03/23/2017 at 08:02:04 PM by jsonesen

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:20 Changed on 03/30/2017 at 11:33:49 AM by vickyyu

Hey guys, I was just wondering if we can make use of the outcome of this ticket from a non-technical perspective?

comment:21 Changed on 03/30/2017 at 12:27:22 PM by kvas

Hi Vicky, this still needs to be configured on the AAC website to be used. I've pinged the ops guys a few days ago on IRC but AFAIK it didn't happen yet. But it's coming real soon now ;)

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from jsonesen.
 
Note: See TracTickets for help on using tickets.