Opened 3 years ago

Last modified 2 years ago

#4687 closed change

Provide global index function in adblockplus/cms — at Version 12

Reported by: juliandoucette Assignee:
Priority: P2 Milestone:
Module: Sitescripts Keywords:
Cc: saroyanm, kvas, jsonesen, philll Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29370597/
https://codereview.adblockplus.org/29472555/

Description (last modified by kvas)

Background

  • The new Help Center website categorizes pages by taxonomy like [product, platform, category, tag],
  • Several of the page templates required to build this website list pages based on these taxonomies,
  • Therefore we must provide a method for adblockplus/cms to implement these lists using Jinja2.

What to change

Note: from here on we'll use the terms:

  • field name to refer to each type of taxonomy (e.g. product, platform, etc.),
  • field value to refer to the value of a field for a particular page (e.g. product=adblock-plus), and
  • page metadata to refer to the collection of all the fields and values of one page (including the page field which contains the id of the page).

Provide the following @contextfunction:

def get_pages_metadata(filters={}):
    """Return metadata dicts for all pages matching the filters

    Args:
        filters: Mapping of field names to desired field values or lists of them
            Example: {'product': 'adblock-plus', 'platform': ['firefox', 'chrome']}
    Returns:
        A list of metadata dicts for pages matching all the filters (if no filters are given, returns all pages)
            Example: [{page: 'acceptable-ads', product: 'adblock-plus', platform: 'firefox'}]
    Raises:
        TypeError: if filters is not a dict mapping strings to strings or lists of strings
    """

Change History (12)

comment:1 Changed 3 years ago by juliandoucette

@jsonesen I added an extra requirement to the input (List of tuples which provides a default value to filter by) to the requirements. Please let me know if this is OK with you. We can simplify these requirements if needed.

comment:2 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:3 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:4 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:5 Changed 3 years ago by juliandoucette

  • Description modified (diff)

comment:6 Changed 3 years ago by kvas

It seems to me that the current API is a bit complex: the taxonomies parameter is trying to do too many things at the same time. May I suggest the following change, which I think makes it more clear:

def index(taxonomies, filters=None):
    """Index pages by taxonomy.
 
    Args:
        taxonomies: A list of taxonomies
            List of strings: ["product", "platform"]
        filters: Filters that will be used to select pages
            Dictionary: {'product': 'adblock-plus'}
    Returns:
        A list of pages matching any provided filters including page name and given fields (in taxonomies)
            List of dicts: [{page: "acceptable-ads", product: "adblock-plus", platform: "firefox"}]
    Raises:
        TypeError: if taxonomies is not a list
        TypeError: if filters is not a dict
    """

What do you think?

comment:7 Changed 3 years ago by jsonesen

I implemented the tuple requirement and in fact I think you have a point here, as there ends up being additional complexity in order to determine if the filter exists or not in the list and this is done using isinstance() and in fact heavy type checking like this is normally not considered very pythonic. So I would agree that adding a filters parameter probably makes more sense here, although I think that the suggestion Julian made regarding a filtering type behavior is good it does make a bit more sense to do less 'inference' with the code and just make it an argument so that there is not additional implementation details to be documented regarding our fancy use of a filter tuple

comment:8 Changed 3 years ago by jsonesen

Another detail that has been left out is the case where the metadata section contains a comma separated list, i think handling this requires splitting the metadata string and then checking the filter. For example is you use the filter to get pages tagged 'popular' i need to split the tags' section and then check over that to make sure that 'popular' tag exists, simply saying if tag == 'popular': is not sufficient because as is if 'tag' has 'popular, cool, awesome' it is represented as {'tags': 'popular, cool, awesome'}

comment:9 follow-up: Changed 3 years ago by juliandoucette

It seems to me that the current API is a bit complex

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Another detail that has been left out is the case where the metadata section contains a comma separated list

Acknowledged.

We could:

  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

Last edited 3 years ago by juliandoucette (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 3 years ago by kvas

Replying to juliandoucette:

I think my API is simpler to use.

I'll leave it to you guys to determine what is/isn't "pythonic" and/or too complex.

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

Another detail that has been left out is the case where the metadata section contains a comma separated list

We could:

  • Allow regular expressions
  • Allow a separate format EG: {property: "tags", contains: "popular"} vs {property: "category", value: "mobile"}

*whatever works* ;)

What if we treat all of those things as essentially tags (each taxonomy would be a separate kind of tags) and then if a page has all the tags contained in the filter it matches. Otherwise it doesn't. So if we have, for example:

page1 = {'keywords': ['foo', 'bar'], 'category': 'spam'}
page2 = {'keywords': 'bar', 'category': 'ham'}

then

index({'keywords': 'bar'})                       # returns page1 and page2
index({'keywords': ['foo', 'bar']})              # returns page1
index({'keywords': 'bar', 'category': 'ham'})    # returns page2

Would this work for you, Julian?

comment:11 follow-up: Changed 3 years ago by juliandoucette

Speaking of simplicity, I just thought, maybe we just alway return all the "taxonomies" for each page. I don't see how they'd bother the user and for implementation purposes we'd need to load the page metadata anyway and after that it's actually easier to return all of them rather than just some. Does this make sense?

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Would this work for you, Julian?

Yes.

comment:12 in reply to: ↑ 11 Changed 3 years ago by kvas

  • Description modified (diff)

Replying to juliandoucette:

IIRC we decided to use a whitelist for a reason. Perhaps @jsonesen can elaborate and you two can determine what is best?

Cool, will do.

Would this work for you, Julian?

Yes.

Great!

One more thing. The name index is a bit non-descriptive. What if we call it something like get_pages_metadata? I took the liberty to already in the ticket description but let me know if you disagree.

Note: See TracTickets for help on using tickets.