Opened on 12/02/2016 at 05:44:03 PM

Closed on 07/07/2017 at 01:27:19 PM

#4687 closed change (fixed)

[cms] Context function for retrieving pages metadata

Reported by: juliandoucette Assignee: jsonesen
Priority: P2 Milestone:
Module: Sitescripts Keywords:
Cc: saroyanm, kvas, jsonesen, philll Blocked By:
Blocking: #4847, #5043 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29370597/
https://codereview.adblockplus.org/29472555/

Description (last modified by jsonesen)

Background

  • The new Help Center website categorizes pages by taxonomy like [product, platform, category, tag],
  • Several of the page templates required to build this website list pages based on these taxonomies,
  • Therefore we must provide a method for adblockplus/cms to implement these lists using Jinja2.

What to change

Note: from here on we'll use the terms:

  • field name to refer to each type of taxonomy (e.g. product, platform, etc.),
  • field value to refer to the value of a field for a particular page (e.g. product=adblock-plus), and
  • page metadata to refer to the collection of all the fields and values of one page (including the page field which contains the id of the page).

Provide the following @contextfunction:

def get_pages_metadata(filters={}):
    """Return metadata dicts for all pages matching the filters

    Args:
        filters: Mapping of field names to desired field values or lists of them
            Example: {'product': 'adblock-plus', 'platform': ['firefox', 'chrome']}
    Returns:
        A list of metadata dicts for pages matching all the filters (if no filters are given, returns all pages)
            Example: [{page: 'acceptable-ads', product: 'adblock-plus', platform: 'firefox'}]
    Raises:
        TypeError: if filters is not a dict mapping strings to strings or lists of strings
    """

Since this method is being used as a context function in order to have access to the mercurial source object, unit tests should be added to this object's api in order to ensure that it remains stable.

Attachments (0)

Change History (30)

comment:1 Changed on 12/02/2016 at 05:45:44 PM by juliandoucette

@jsonesen I added an extra requirement to the input (List of tuples which provides a default value to filter by) to the requirements. Please let me know if this is OK with you. We can simplify these requirements if needed.

comment:2 Changed on 12/02/2016 at 05:47:10 PM by juliandoucette

  • Description modified (diff)

comment:3 Changed on 12/02/2016 at 05:55:40 PM by juliandoucette

  • Description modified (diff)

comment:4 Changed on 12/02/2016 at 05:57:31 PM by juliandoucette

  • Description modified (diff)

comment:5 Changed on 12/02/2016 at 06:00:24 PM by juliandoucette

  • Description modified (diff)

comment:6 Changed on 12/05/2016 at 02:25:22 PM by kvas

It seems to me that the current API is a bit complex: the taxonomies parameter is trying to do too many things at the same time. May I suggest the following change, which I think makes it more clear:

def index(taxonomies, filters=None):
    """Index pages by taxonomy.
 
    Args:
        taxonomies: A list of taxonomies
            List of strings: ["product", "platform"]
        filters: Filters that will be used to select pages
            Dictionary: {'product': 'adblock-plus'}
    Returns:
        A list of pages matching any provided filters including page name and given fields (in taxonomies)
            List of dicts: [{page: "acceptable-ads", product: "adblock-plus", platform: "firefox"}]
    Raises:
        TypeError: if taxonomies is not a list
        TypeError: if filters is not a dict
    """

What do you think?

comment:7 Changed on 12/05/2016 at 02:43:14 PM by jsonesen

I implemented the tuple requirement and in fact I think you have a point here, as there ends up being additional complexity in order to determine if the filter exists or not in the list and this is done using isinstance() and in fact heavy type checking like this is normally not considered very pythonic. So I would agree that adding a filters parameter probably makes more sense here, although I think that the suggestion Julian made regarding a filtering type behavior is good it does make a bit more sense to do less 'inference' with the code and just make it an argument so that there is not additional implementation details to be documented regarding our fancy use of a filter tuple

comment:8 Changed on 12/05/2016 at 04:23:41 PM by jsonesen

Another detail that has been left out is the case where the metadata section contains a comma separated list, i think handling this requires splitting the metadata string and then checking the filter. For example is you use the filter to get pages tagged 'popular' i need to split the tags' section and then check over that to make sure that 'popular' tag exists, simply saying if tag == 'popular': is not sufficient because as is if 'tag' has 'popular, cool, awesome' it is represented as {'tags': 'popular, cool, awesome'}

comment:9 follow-up: Changed on 12/05/2016 at 04:43:58 PM by juliandoucette

It seems to me that the current API is a bit complex

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Another detail that has been left out is the case where the metadata section contains a comma separated list

Acknowledged.

We could:

  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

Last edited on 12/05/2016 at 04:45:55 PM by juliandoucette

comment:10 in reply to: ↑ 9 Changed on 12/05/2016 at 05:18:15 PM by kvas

Replying to juliandoucette:

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

Another detail that has been left out is the case where the metadata section contains a comma separated list

We could:

  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

What if we treat all of those things as essentially tags (each taxonomy would be a separate kind of tags) and then if a page has all the tags contained in the filter it matches. Otherwise it doesn't. So if we have, for example:

page1 = {'keywords': ['foo', 'bar'], 'category': 'spam'}
page2 = {'keywords': 'bar', 'category': 'ham'}

then

index({'keywords': 'bar'})                       # returns page1 and page2
index({'keywords': ['foo', 'bar']})              # returns page1
index({'keywords': 'bar', 'category': 'ham'})    # returns page2

Would this work for you, Julian?

comment:11 follow-up: Changed on 12/05/2016 at 05:46:26 PM by juliandoucette

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Would this work for you, Julian?

Yes.

comment:12 in reply to: ↑ 11 Changed on 12/05/2016 at 07:15:25 PM by kvas

  • Description modified (diff)

Replying to juliandoucette:

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Cool, will do.

Would this work for you, Julian?

Yes.

Great!

One more thing. The name index is a bit non-descriptive. What if we call it something like get_pages_metadata? I took the liberty to already in the ticket description but let me know if you disagree.

comment:13 Changed on 12/05/2016 at 08:00:40 PM by juliandoucette

One more thing. The name index is a bit non-descriptive. What if we call it something like get_pages_metadata? I took the liberty to already in the ticket description but let me know if you disagree.

@jsonesen's prototype function was named "indexes". That's where I got "index" from.

I have no objections to changing the name. Choose whatever you think fits best.

comment:14 Changed on 12/06/2016 at 05:08:31 PM by kvas

  • Priority changed from Unknown to P2
  • Ready set
  • Summary changed from Provide global index function in adblockplus/cms to [cms] Context function for retrieving pages metadata

comment:15 Changed on 01/09/2017 at 08:17:24 AM by jsonesen

  • Review URL(s) modified (diff)

comment:16 Changed on 01/09/2017 at 08:17:48 AM by jsonesen

  • Owner set to jsonesen

comment:17 Changed on 01/30/2017 at 10:14:20 AM by jsonesen

  • Description modified (diff)

comment:18 Changed on 02/09/2017 at 12:35:54 PM by saroyanm

Any rough estimation when this will be ready, pushed ?
I think that this might be needed for acceptableads.com/committee launch that we have a deadline until March. We probably will be able to make a use of this functionality.

comment:19 Changed on 02/09/2017 at 12:37:55 PM by saroyanm

  • Blocking 4847 added

comment:20 follow-up: Changed on 02/09/2017 at 12:40:06 PM by jsonesen

As discussed this may later be added as a default global however as of now that is not the case. Therefore, since the codereview has been given the LGTM by vasily it is ready now. So you can either download the patch set from the review or I would be happy to send the files via email. Additionally if there is a repo for the help center I would be happy to provide a patch set for it.

comment:21 in reply to: ↑ 20 Changed on 02/09/2017 at 12:49:07 PM by saroyanm

Replying to jsonesen:

As discussed this may later be added as a default global however as of now that is not the case. Therefore, since the codereview has been given the LGTM by vasily it is ready now. So you can either download the patch set from the review or I would be happy to send the files via email. Additionally if there is a repo for the help center I would be happy to provide a patch set for it.

I see, didn't noticed that it can already be applied to website repository, we don't have a repository for Help Center AFAIK, but as I mentioned we might need this for acceptableads.com/committee launch, Stakeholders would like to have a Blog functionality there, which probably will not make until the deadline, but I think we can make a use of this functionality and create a static Blog using it, I'll add you to all related ticket and let you know in case of a question, I'm going to test implementation myself later today, downloading the patch should be enough for now I guess.

comment:22 follow-up: Changed on 02/09/2017 at 12:57:44 PM by jsonesen

Sounds good to me, we can use these tickets as a case for adding this as a default global. Additionally, there is example pages on that patch set and some tests which should provide an example of how to utilize the global, if you have any questions or would like to do a call on this then feel free to reach out

comment:23 in reply to: ↑ 22 Changed on 02/09/2017 at 12:59:31 PM by saroyanm

Replying to jsonesen:

Sounds good to me, we can use these tickets as a case for adding this as a default global. Additionally, there is example pages on that patch set and some tests which should provide an example of how to utilize the global, if you have any questions or would like to do a call on this then feel free to reach out

Awesome!

comment:24 Changed on 03/10/2017 at 12:00:37 PM by abpbot

A commit referencing this issue has landed:
Issue 4687 - Add Context Function get_pages_metadata to Test Site

comment:25 Changed on 03/11/2017 at 12:04:17 PM by kvas

Just to clarify, this was landed by mistake, the idea for this change was that it will be part of Help Center initially and then if we find it generally useful, it could migrate to the CMS as a globally available global. But the time hasn't come yet so for now I rolled that commit back.

Last edited on 03/11/2017 at 12:05:06 PM by kvas

comment:26 Changed on 06/21/2017 at 10:54:05 AM by kvas

  • Blocking 5043 added

comment:27 Changed on 06/23/2017 at 10:09:56 AM by jsonesen

  • Review URL(s) modified (diff)

comment:28 Changed on 06/23/2017 at 10:09:59 AM by jsonesen

  • Status changed from new to reviewing

comment:29 Changed on 07/07/2017 at 11:09:28 AM by abpbot

A commit referencing this issue has landed:
Issue 4687 - Add global get_pages_metadata to template converters

comment:30 Changed on 07/07/2017 at 01:27:19 PM by kvas

  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from jsonesen.
 
Note: See TracTickets for help on using tickets.