Opened 3 years ago

Closed 2 years ago

#4687 closed change (fixed)

[cms] Context function for retrieving pages metadata

Reported by: juliandoucette Assignee: jsonesen
Priority: P2 Milestone:
Module: Sitescripts Keywords:
Cc: saroyanm, kvas, jsonesen, philll Blocked By:
Blocking: #4847, #5043 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29370597/
https://codereview.adblockplus.org/29472555/

Description (last modified by jsonesen)

Background

  • The new Help Center website categorizes pages by taxonomy like [product, platform, category, tag],
  • Several of the page templates required to build this website list pages based on these taxonomies,
  • Therefore we must provide a method for adblockplus/cms to implement these lists using Jinja2.

What to change

Note: from here on we'll use the terms:

  • field name to refer to each type of taxonomy (e.g. product, platform, etc.),
  • field value to refer to the value of a field for a particular page (e.g. product=adblock-plus), and
  • page metadata to refer to the collection of all the fields and values of one page (including the page field which contains the id of the page).

Provide the following @contextfunction:

def get_pages_metadata(filters={}):
    """Return metadata dicts for all pages matching the filters

    Args:
        filters: Mapping of field names to desired field values or lists of them
            Example: {'product': 'adblock-plus', 'platform': ['firefox', 'chrome']}
    Returns:
        A list of metadata dicts for pages matching all the filters (if no filters are given, returns all pages)
            Example: [{page: 'acceptable-ads', product: 'adblock-plus', platform: 'firefox'}]
    Raises:
        TypeError: if filters is not a dict mapping strings to strings or lists of strings
    """

Since this method is being used as a context function in order to have access to the mercurial source object, unit tests should be added to this object's api in order to ensure that it remains stable.

Change History (30)

comment:1 Changed 3 years ago by juliandoucette

@jsonesen I added an extra requirement to the input (List of tuples which provides a default value to filter by) to the requirements. Please let me know if this is OK with you. We can simplify these requirements if needed.

comment:2 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:3 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:4 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:5 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:6 Changed 3 years ago by kvas

It seems to me that the current API is a bit complex: the taxonomies parameter is trying to do too many things at the same time. May I suggest the following change, which I think makes it more clear:

def index(taxonomies, filters=None):
    """Index pages by taxonomy.
 
    Args:
        taxonomies: A list of taxonomies
            List of strings: ["product", "platform"]
        filters: Filters that will be used to select pages
            Dictionary: {'product': 'adblock-plus'}
    Returns:
        A list of pages matching any provided filters including page name and given fields (in taxonomies)
            List of dicts: [{page: "acceptable-ads", product: "adblock-plus", platform: "firefox"}]
    Raises:
        TypeError: if taxonomies is not a list
        TypeError: if filters is not a dict
    """

What do you think?

comment:7 Changed 3 years ago by jsonesen

I implemented the tuple requirement and in fact I think you have a point here, as there ends up being additional complexity in order to determine if the filter exists or not in the list and this is done using isinstance() and in fact heavy type checking like this is normally not considered very pythonic. So I would agree that adding a filters parameter probably makes more sense here, although I think that the suggestion Julian made regarding a filtering type behavior is good it does make a bit more sense to do less 'inference' with the code and just make it an argument so that there is not additional implementation details to be documented regarding our fancy use of a filter tuple

comment:8 Changed 3 years ago by jsonesen

Another detail that has been left out is the case where the metadata section contains a comma separated list, i think handling this requires splitting the metadata string and then checking the filter. For example is you use the filter to get pages tagged 'popular' i need to split the tags' section and then check over that to make sure that 'popular' tag exists, simply saying if tag == 'popular': is not sufficient because as is if 'tag' has 'popular, cool, awesome' it is represented as {'tags': 'popular, cool, awesome'}

comment:9 follow-up: Changed 3 years ago by juliandoucette

It seems to me that the current API is a bit complex

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Another detail that has been left out is the case where the metadata section contains a comma separated list

Acknowledged.

We could:

  • Disallow tags (comma separated lists)
  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

Version 0, edited 3 years ago by juliandoucette (next)

comment:10 in reply to: ↑ 9 Changed 3 years ago by kvas

Replying to juliandoucette:

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

Another detail that has been left out is the case where the metadata section contains a comma separated list

We could:

  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

What if we treat all of those things as essentially tags (each taxonomy would be a separate kind of tags) and then if a page has all the tags contained in the filter it matches. Otherwise it doesn't. So if we have, for example:

page1 = {'keywords': ['foo', 'bar'], 'category': 'spam'}
page2 = {'keywords': 'bar', 'category': 'ham'}

then

index({'keywords': 'bar'})                       # returns page1 and page2
index({'keywords': ['foo', 'bar']})              # returns page1
index({'keywords': 'bar', 'category': 'ham'})    # returns page2

Would this work for you, Julian?

comment:11 follow-up: Changed 3 years ago by juliandoucette

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Would this work for you, Julian?

Yes.

comment:12 in reply to: ↑ 11 Changed 3 years ago by kvas

  • Description modified (diff)

Replying to juliandoucette:

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Cool, will do.

Would this work for you, Julian?

Yes.

Great!

One more thing. The name index is a bit non-descriptive. What if we call it something like get_pages_metadata? I took the liberty to already in the ticket description but let me know if you disagree.

comment:13 Changed 3 years ago by juliandoucette

One more thing. The name index is a bit non-descriptive. What if we call it something like get_pages_metadata? I took the liberty to already in the ticket description but let me know if you disagree.

@jsonesen's prototype function was named "indexes". That's where I got "index" from.

I have no objections to changing the name. Choose whatever you think fits best.

comment:14 Changed 3 years ago by kvas

  • Priority changed from Unknown to P2
  • Ready set
  • Summary changed from Provide global index function in adblockplus/cms to [cms] Context function for retrieving pages metadata

comment:15 Changed 3 years ago by jsonesen

  • Review URL(s) modified (diff)

comment:16 Changed 3 years ago by jsonesen

  • Owner set to jsonesen

comment:17 Changed 3 years ago by jsonesen

  • Description modified (diff)

comment:18 Changed 3 years ago by saroyanm

Any rough estimation when this will be ready, pushed ?
I think that this might be needed for acceptableads.com/committee launch that we have a deadline until March. We probably will be able to make a use of this functionality.

comment:19 Changed 3 years ago by saroyanm

  • Blocking 4847 added

comment:20 follow-up: Changed 3 years ago by jsonesen

As discussed this may later be added as a default global however as of now that is not the case. Therefore, since the codereview has been given the LGTM by vasily it is ready now. So you can either download the patch set from the review or I would be happy to send the files via email. Additionally if there is a repo for the help center I would be happy to provide a patch set for it.

comment:21 in reply to: ↑ 20 Changed 3 years ago by saroyanm

Replying to jsonesen:

As discussed this may later be added as a default global however as of now that is not the case. Therefore, since the codereview has been given the LGTM by vasily it is ready now. So you can either download the patch set from the review or I would be happy to send the files via email. Additionally if there is a repo for the help center I would be happy to provide a patch set for it.

I see, didn't noticed that it can already be applied to website repository, we don't have a repository for Help Center AFAIK, but as I mentioned we might need this for acceptableads.com/committee launch, Stakeholders would like to have a Blog functionality there, which probably will not make until the deadline, but I think we can make a use of this functionality and create a static Blog using it, I'll add you to all related ticket and let you know in case of a question, I'm going to test implementation myself later today, downloading the patch should be enough for now I guess.

comment:22 follow-up: Changed 3 years ago by jsonesen

Sounds good to me, we can use these tickets as a case for adding this as a default global. Additionally, there is example pages on that patch set and some tests which should provide an example of how to utilize the global, if you have any questions or would like to do a call on this then feel free to reach out

comment:23 in reply to: ↑ 22 Changed 3 years ago by saroyanm

Replying to jsonesen:

Sounds good to me, we can use these tickets as a case for adding this as a default global. Additionally, there is example pages on that patch set and some tests which should provide an example of how to utilize the global, if you have any questions or would like to do a call on this then feel free to reach out

Awesome!

comment:24 Changed 2 years ago by abpbot

A commit referencing this issue has landed:
Issue 4687 - Add Context Function get_pages_metadata to Test Site

comment:25 Changed 2 years ago by kvas

Just to clarify, this was landed by mistake, the idea for this change was that it will be part of Help Center initially and then if we find it generally useful, it could migrate to the CMS as a globally available global. But the time hasn't come yet so for now I rolled that commit back.

Last edited 2 years ago by kvas (previous) (diff)

comment:26 Changed 2 years ago by kvas

  • Blocking 5043 added

comment:27 Changed 2 years ago by jsonesen

  • Review URL(s) modified (diff)

comment:28 Changed 2 years ago by jsonesen

  • Status changed from new to reviewing

comment:29 Changed 2 years ago by abpbot

A commit referencing this issue has landed:
Issue 4687 - Add global get_pages_metadata to template converters

comment:30 Changed 2 years ago by kvas

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.