Opened on 03/18/2014 at 11:26:47 AM

Closed on 09/09/2014 at 06:18:03 PM

Last modified on 11/20/2014 at 03:13:58 PM

#170 closed change (fixed)

Replacing Mercurial subrepositories

Reported by: trev Assignee: trev
Priority: P4 Milestone:
Module: Infrastructure Keywords:
Cc: fhd Blocked By: #1366
Blocking: #1360, #1362, #1363, #1364, #1588 Platform: Unknown
Ready: yes Confidential: no
Tester: Verified working: no
Review URL(s):

http://codereview.adblockplus.org/5168251361296384/

Description (last modified by trev)

Background

We are currently using Mercurial's subrepositories quite heavily. This has a number of downsides however:

  • Repository mirrors on GitHub aren't really functional - Git doesn't know how to check out the subrepositories.
  • A push always tries to push all subrepositores, even if no changes were made - this can take a while, e.g. for the adblockplusie or infrastructure repositories that have extensive dependencies.
  • We use relative paths for our subrepositories. That fails pretty badly if a repository isn't cloned out from our server but rather some mirror (e.g. an existing local clone).
  • Somebody having write access (clone via SSH) to a repository also needs write access to all subrepositores because these will be cloned via SSH as well. Even if you go there and clone via HTTPS manually you won't be able to push.
  • Developers need to have multiple clones of the same repository. Originally I tried using only one clone via symbolic links but that resulted in lots of unintended .hgsubstate changes.
  • An invalid subrepository state (.hgsubstate referring to a revision that doesn't exist on the server) is very hard to repair - Mercurial will refuse to update to that revision, some voodoo dance is required to change it nevertheless.

Altogether, subrepositories aren't unproblematic which is why even the official documentation says:

This is considered a feature of last resort.

So far we used it mostly because the alternative (untracked dependencies) is even worse. I've thought about other alternatives, it essentially boils down to implementing our own subrepositories mechanism.

What to change

Here is what I think can work:

  • Each repository contains a dependencies file formatted like this:
    _root = hg:https://hg.adblockplus.org/ git:https://github.com/adblockplus/
    _self = buildtools/ensure_dependencies.py
    buildtools = buildtools 1.0.5
    third_party/googletest = googletest hg:bd26b148913a git:0476e154db5f
    
  • There is an ensure_dependencies.py script in these repositories that does the following when executed:
    • Parse dependencies file to extract the list of subrepositories.
    • For each subrepository check whether the corresponding directory already exists.
    • If the directory doesn't exist then clone the subrepository there. To determine the repository path, first get the repository path for the current directory (either Mercurial or Git) and assume that the subrepository is available under the same address. If finding repository path fails or cloning from that URL doesn't work, use the fallback root URL specified in dependencies file.
    • If the directory exists and is neither a Mercurial nor a Git repository - assume that it is the correct revision. Otherwise try to update it to the specified tag/revision.
    • If the update fails (no such tag/revision) - pull the repository and try again. If it still fails - report failure.
    • Recurse into all subrepositories to process the dependencies file there as well.
    • If the _self setting is present: verify that ensure_dependencies.py version running is identical to the one references. If not, replace and re-run it.
  • Make build.py call ensure_dependencies.py as the very first thing and fail if ensure_dependencies.py fails. We can also integrate this call into the other build systems or at least document in README.md that it needs to be performed.

Downsides of this approach:

  • We will have a non-trivial script that needs to be copied into all repositories - this is exactly what I meant to avoid when I created the buildtools repository and made build.py in the actual repositories a trivial dummy. Let's hope that we won't need to change ensure_dependencies.py often.
  • Changing versions in dependencies file needs to happen manually, Mercurial will no longer take care of that if you simply update the subrepository. Also, it will make testing your changes a bit more complicated - once you commit your changes to a subrepository you need to update the dependencies file (using tip as subrepo revision is probably the best approach during development).
  • Everybody has to remember to push changes to a subrepository before changing the dependency - this will no longer happen automatically. Failing to do so will mean that builds will no longer work for everybody else (should hopefully be a rare case that is easily repaired).
  • We either have to tag all relevant revisions in our subrepositories so that we can refer to them or use two different revisions: one for Mercurial and one for Git.

Attachments (0)

Change History (22)

comment:1 Changed on 03/18/2014 at 11:40:25 AM by philll

by fhd:

I moved 4 posts to a new topic: Switching to Git

This is about not relying on VCS specific features to handle our dependencies anymore, which I think is important. Switching to git is a different topic altogether, let's discuss it in the other thread.

Yep, let's do this. I think we've outgrown subrepositories and I'd feel better if we don't rely on VCS specific features for this to begin with.

Note that Chromium has essentially the same thing, called gtool. We might want to look into using it, it seems to be rather general purpose.

We will have a non-trivial script that needs to be copied into all repositories - this is exactly what I meant to avoid when I created the buildtools repository and made build.py in the actual repositories a trivial dummy. Let's hope that we won't need to change ensureDeps.py often.

We could have a simple bootstrapping script that e.g. fetches/updates the real script and invokes it.

by palant:

Note that Chromium has essentially the same thing, called gtool. We might want to look into using it, it seems to be rather general purpose.

Note that Chromium has essentially the same thing, called gtool. We might want to look into using it, it seems to be rather general purpose.


We could have a simple bootstrapping script that e.g. fetches/updates the real script and invokes it.

The ugly part here is: how do we update the real script then if we ever need to? Also, this will break offline builds, e.g. building with just our source code archive won't be possible.


by fhd:

You probably mean gclient.

Yup.


The ugly part here is: how do we update the real script then if we ever need to?

If the script is simple, it's less likely that we have to update it. It doesn't solve the issue completely, but it's better then copying around the actual, more complex script IMO.


Also, this will break offline builds, e.g. building with just our source code archive won't be possible.

Won't we have the same issue with this whole mechanism to begin with? If we create a source archive, we can include everything one needs.


If the script is simple, it's less likely that we have to update it. It doesn't solve the issue completely, but it's better then copying around the actual, more complex script IMO.

I think you didn't get my objection. What if that "actual, more complex" script needs to change - how do we make sure that people get the update? They already downloaded it and we probably don't want to redownload on each build. Note that we might need to change something in a way that isn't backwards compatible - which means that we have to ensure that whoever is building an old version of Adblock Plus still has the old version while everybody else gets the new version.


by fhd:

Yeah, I did get that wrong then. We could always run this through the wrapper script, and that could then update the real scripts, but I agree that it's not great.

Why not have this as an external tool in its own repository then? That's how Chromium does it with depot_tools. Arguably complicates the setup a bit, but I think it's preferable to having a copy of a non-trivial script in every repository and keeping them in sync manually.

by palant:

Why not have this as an external tool in its own repository then?

Wouldn't getting the right version of that repository require all the same logic then? Chromium asks people to clone depot_tools manually which is suboptimal to say the least (e.g. it doesn't resolve the versioning problems I mentioned above).

We can automate keeping this script in sync but IMHO there is no way around having it in most of our repositories.


by fhd:

Wouldn't getting the right version of that repository require all the same logic then? Chromium asks people to clone depot_tools manually which is suboptimal to say the least (e.g. it doesn't resolve the versioning problems I mentioned above).

Yeah, I was suggesting we have people clone it manually, and it's up to them to keep it up to date manually as well of course. I'm not sure what I prefer, I think it depends on how complex the script is to begin with. If it's just one file with a few hundred LOC, copying sounds reasonable. depot_tools is obviously more involved.


by palant:

I think that a few hundred LOC should definitely do (ideally below two hundred). gclient currently has around 4000 lines in three files, that looks like an overkill for us. Besides, it only supports Git and SVN and apparently isn't meant for the scenario where the same repository can be available via different SCMs/servers.

by fhd:

Then copying it would be the nicest way I guess. I would still prefer to have a central repository for this though, where we do the actual meaningful commits. In all the repositories using it, we can then just say "Update ensureDeps.py".

by palant:

Yep. We can have the master copy in the buildtools repository or create a new repository for it. I'm not quite sure which is preferable but having it in buildtools would have the advantage that consistency can be easily verified automatically for the repositories listing buildtools as dependency (which is most of them) - it should be identical to the one in the listed buildtools version. Then again, a new single-purpose repository can be listed as a dependency everywhere.

comment:2 Changed on 03/18/2014 at 11:42:06 AM by philll

  • Cc trev added
  • Reporter changed from philll to trev

@trev: could you please add a what to change section with what we actually want to do here?

comment:3 Changed on 03/18/2014 at 12:42:21 PM by trev

  • Description modified (diff)

comment:4 Changed on 03/18/2014 at 01:42:27 PM by trev

  • Cc trev removed

comment:5 Changed on 04/23/2014 at 10:21:39 AM by fhd

Since one of the goals here is to make our code work when checked out via Git, we should also figure out how to create a .gitignore. We should most definitely generate it, no big deal since we'll need a bootstrap script anyway.

comment:6 Changed on 04/23/2014 at 10:23:41 AM by fhd

  • Cc fhd added

comment:7 Changed on 04/30/2014 at 12:00:04 PM by fhd

  • Priority changed from Unknown to P4
  • Ready set

comment:8 Changed on 08/25/2014 at 03:15:26 PM by trev

  • Description modified (diff)
  • Owner set to trev
  • Platform set to Unknown

I changed the specification somewhat:

  • Relaxed the requirement to use tags, it's likely not going to work. The alternative is to specific two revision IDs, one for Mercurial and another for Git.
  • Allowed to specify two different roots: one for Mercurial and one for Git. So the root matching the current repository can be used.
  • Prefixed the root setting with _ to distinguish it from the directory settings.
  • Added a _self setting to make sure ensureDeps.py will keep itself updated automatically.

comment:9 Changed on 09/01/2014 at 08:17:05 PM by trev

  • Review URL(s) modified (diff)

Created review for a work-in-progress implementation. From what I can tell, there are only two outstanding issues:

  • The Git pull command doesn't work (still need to figure out how to do this).
  • The self-update mechanism hasn't been implemented yet.

comment:10 Changed on 09/03/2014 at 01:19:50 PM by trev

  • Description modified (diff)

Modified the description as per discussion in review: the script should be called ensure_dependencies.py and the corresponding file dependencies (was: ensureDeps.py and .sub).

comment:11 Changed on 09/03/2014 at 09:55:49 PM by trev

  • Status changed from new to reviewing

comment:12 Changed on 09/09/2014 at 06:18:03 PM by trev

  • Resolution set to fixed
  • Status changed from reviewing to closed

The script has been pushed: https://hg.adblockplus.org/buildtools/rev/c657c4f718b7

Marking this as fixed, will file follow-up issues to start using it.

comment:13 Changed on 09/11/2014 at 05:26:11 PM by trev

  • Blocking 1360 added

comment:14 Changed on 09/11/2014 at 05:43:26 PM by trev

  • Blocked By 1361 added

comment:15 Changed on 09/11/2014 at 05:55:44 PM by trev

  • Blocking 1362 added

comment:16 Changed on 09/11/2014 at 05:56:12 PM by trev

  • Blocked By 1361 removed

comment:17 Changed on 09/11/2014 at 06:34:22 PM by trev

  • Blocking 1363 added

comment:18 Changed on 09/11/2014 at 06:40:05 PM by trev

  • Blocking 1364 added

comment:19 Changed on 09/11/2014 at 07:33:27 PM by trev

  • Blocked By 1366 added

comment:20 Changed on 09/18/2014 at 11:19:02 AM by matze

  • Blocked By 1366 removed

comment:21 Changed on 09/18/2014 at 11:19:15 AM by matze

  • Blocked By 1366 added

comment:22 Changed on 11/20/2014 at 03:13:58 PM by trev

  • Blocking 1588 added

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from trev.
 
Note: See TracTickets for help on using tickets.