Opened 5 years ago

Closed 4 years ago

#2123 closed change (incomplete)

ensure_dependencies.py should allow absolute URLs for dependencies

Reported by: kzar Assignee:
Priority: Unknown Milestone:
Module: Automation Keywords:
Cc: trev, sebastian Blocked By:
Blocking: Platform: Unknown
Ready: no Confidential: no
Tester: Verified working: no
Review URL(s):

Description (last modified by kzar)

Background

Currently dependencies files specify one repository root URL and then a commit hash + repository name for each dependency.

Sometimes during development it's necessary to make changes in multiple repositories that interact. It would be very useful to be able to specify the entire repository URL for a dependency to test these interactions properly.

Otherwise the developer is forced to either change the global _root repository URL for all dependencies and maintain forks of them all, or manually clone and check out her version of the dependency that's under development - fighting ensure_dependencies every step of the way. (This is made even worse by the fact that a lot of the build scripts call ensure_dependencies automatically!)

What to change

Improve the ensure_dependencies.py script to allow absolute URLs to be specified for dependencies when required.

For dependencies where the absolute URL has been specified the global _root URL should be ignored. For all other dependencies the global _root URL should still be used.

Change History (12)

comment:1 Changed 5 years ago by kzar

For example I'm working on #616, trying to test my changes to adblockpluschrome that depend on my earlier changes to adblockplus. My changes to adblockplus haven't been pushed yet so I need to use them from my forked repository. (I am also using a forked repository for adblockpluschrome. For both repositories my work is in a development branch.)

Here's what I had to do:

  • Cloned adblockpluschrome from my fork repository, checked out my desired branch.
  • Run the build script so that ensure_dependencies is triggered and the dependencies are all downloaded from the official adblockplus repositories.
  • Manually set the remote origin's URL for the adblockplus dependency to point to my fork repository.
  • Manually pulled my desired development branch for adblockplus from my fork repository.
  • Edited the dependencies file for the adblockpluschrome repository to specify my branch name as the desired commit for the adblockplus dependency
  • Ran ./build.py -t chrome devenv, loaded the result and tested my changes.

Which really sucks and is not at all obvious. (My first few attempts resulted in frustration when I attempted to build the Chrome extension only to find that the build script had called the ensure_dependencies script in turn and messed up my carefully set up dependency repository.)

Last edited 5 years ago by kzar (previous) (diff)

comment:2 Changed 5 years ago by kzar

  • Description modified (diff)

comment:3 Changed 5 years ago by kzar

  • Summary changed from ensure_dependencies.py should allow per-dependency repository root URLs to ensure_dependencies.py should allow absolute URLs for dependencies

comment:4 Changed 5 years ago by kzar

  • Description modified (diff)

comment:5 Changed 5 years ago by kzar

  • Description modified (diff)

comment:6 Changed 4 years ago by sebastian

  • Cc sebastian added
  • Verified working unset

This is actually already supposed to work because of following code (both os.path.join and urljoin just return the second argument if it's absolute):

  if os.path.exists(roots[type]):
    url = os.path.join(roots[type], sourcename)
  else:
    url = urlparse.urljoin(roots[type], sourcename)

Also note that when you have local modifications in one of the dependency repositories, ensure_dependencies.py fails but the build continues. So you usually don't need to tweak the dependencies file to test cross-reprository changes during development.

Last edited 4 years ago by sebastian (previous) (diff)

comment:7 follow-up: Changed 4 years ago by kzar

I often don't have unstaged changes in a working directory, I have commits in a branch of my fork of the repository. We can't assume changes that a dev is trying to test are uncommitted.

I didn't realise absolute URLs were even supposed to be supported yet, is there some documentation of the dependency file format and script usage past the minimal output of ./ensure_dependencies.py -h?

I've just tried to use an absolute repository URL for my fork, with my branch as a revision:

adblockplus = git@github.com:kzar/adblockplus.git git:616-generic-hide
Failed with a message "fatal: repository 'https://github.com/adblockplus/git@github.com:kzar/adblockplus.git/' not found".

adblockplus = https://github.com/kzar/adblockplus git:616-generic-hide
Failed with a message "Exception: Failed to resolve revision 616-generic-hide"

Finally as a last resort I tried the HTTP repository URL with the commit hash for the head of my branch:

adblockplus = https://github.com/kzar/adblockplus git:8e08a24c7ae889066ca9039e0660622a8eda0447

That worked but kind of sucks, I'm not able to push changes to my repository that way and my local copy is left in a bad state ("HEAD detached at 8e08a24"). I'll have to fix both of those manually before getting much work done anyway.

It seems to me like we should do the following:

  • Support absolute SSH Git URLs like git@github.com:kzar/adblockplus.git
  • Have ensure_dependencies.py fetch all remote branches if specified revision can't be found / or maybe just always.
  • Write some more documentation with examples of how to write dependency files, or at least make the documentation easier to find if it already exists.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by sebastian

Replying to kzar:

I often don't have unstaged changes in a working directory, I have commits in a branch of my fork of the repository. We can't assume changes that a dev is trying to test are uncommitted.

Well, since you are apparently fine with tweaking the dependencies file, you could also just remove the line for the repo you have local commits in that you want to keep. But if you prefer to keep your WIP commits in forked remote repositories, tracked by ensure_dependencies.py, fair enough.

I didn't realise absolute URLs were even supposed to be supported yet, is there some documentation of the dependency file format and script usage past the minimal output of ./ensure_dependencies.py -h?

This is an implicit feature. The URL given for the repository is resolved relative to _root. Hence if an absolute URL is given it overrides the base URL. This isn't special about ensure_dependencies.py, but how URL resolution works everywhere. In fact, there isn't even code in ensure_dependencies.py to make this work. This behavior is leaked from os.path.join() and urljoin().

I've just tried to use an absolute repository URL for my fork, with my branch as a revision:

adblockplus = git@github.com:kzar/adblockplus.git git:616-generic-hide
Failed with a message "fatal: repository 'https://github.com/adblockplus/git@github.com:kzar/adblockplus.git/' not found".

That is because this is a GIT/SSH-specific URL with omitted protocol part. More generic URL parsers (like Python's urlparse module) can't make assumptions about the protocol here. Hence git@github.com:kzar is recognized as part of the pathname. However, simply adding ssh:// to the URL should fix it.

adblockplus = https://github.com/kzar/adblockplus git:616-generic-hide
Failed with a message "Exception: Failed to resolve revision 616-generic-hide"

This looks like that 616-generic-hide is a branch that is only known by the remote. Calling git branch --track 616-generic-hide should fix that.

Finally as a last resort I tried the HTTP repository URL with the commit hash for the head of my branch:

adblockplus = https://github.com/kzar/adblockplus git:8e08a24c7ae889066ca9039e0660622a8eda0447

That worked but kind of sucks, I'm not able to push changes to my repository that way

According to GitHub documentation you should be able to push as well via HTTPS, though you get prompted for your credentials.

and my local copy is left in a bad state ("HEAD detached at 8e08a24").

That is as well, because your local repository doesn't know of the remote branch this commit originates from.

It seems to me like we should do the following:

  • Support absolute SSH Git URLs like git@github.com:kzar/adblockplus.git
  • Have ensure_dependencies.py fetch all remote branches if specified revision can't be found / or maybe just always.

As outlined above, those changes don't seem to be required to accomplish what you want. They would merely avoid some pitfalls, which IMO not ensure_depencies.py but GIT is to blame for. I rather don't want to introduce too much complexity to address those. But if those changes turn out to be trivial, I'm fine with it.

  • Write some more documentation with examples of how to write dependency files, or at least make the documentation easier to find if it already exists.

Sure, properly documenting the dependencies file format wouldn't hurt. So far there is only an example in ensure_dependency.py's source code. So feel free to write documentation.

Last edited 4 years ago by sebastian (previous) (diff)

comment:9 in reply to: ↑ 8 ; follow-up: Changed 4 years ago by kzar

Replying to sebastian:

Well, since you are apparently fine with tweaking the dependencies file, you could also just remove the line for the repo you have local commits in that you want to keep.

Not a bad idea, I'll do it that way for now until some of these other issues are resolved.

That is because this is a GIT/SSH-specific URL with omitted protocol part. More generic URL parsers (like Python's urlparse module) can't make assumptions about the protocol here. Hence git@github.com:kzar is recognized as part of the pathname. However, simply adding ssh:// to the URL should fix it.

So I had to change git@github.com:kzar/adblockplus.git to ssh://git@github.com/kzar/adblockplus.git to make it work. Could we not have the script transform URLs like that for us? (Just prepend "ssh://" and replace the first ":" with "/" if there's a ":" before the first "/" and the characters in between it and the first "/" are not all numbers? It would be pretty simple and I'm happy to file an issue + write the code.)

This looks like that 616-generic-hide is a branch that is only known by the remote. Calling git branch --track 616-generic-hide should fix that.

Yep, I realise that. The problem is that I think ensure_dependencies.py should create the tracking branch in that situation. (Remember it has often just done a fresh clone of the dependency.) Otherwise I have to run the script, have it fail, then go fix the tracking branch in the dependency and then re-run the script again. That's a pretty crappy workflow IMHO.

and my local copy is left in a bad state ("HEAD detached at 8e08a24").

That is as well, because your local repository doesn't know of the remote branch this commit originates from.

Nope, even if I have the branch tracked locally and I check out the commit hash for the HEAD of that branch I end up with a detached head. Maybe that's a difference between Mercurial and Git? I don't know.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 4 years ago by sebastian

Replying to kzar:

So I had to change git@github.com:kzar/adblockplus.git to ssh://git@github.com/kzar/adblockplus.git to make it work. Could we not have the script transform URLs like that for us? (Just prepend "ssh://" and replace the first ":" with "/" if there's a ":" before the first "/" and the characters in between it and the first "/" are not all numbers? It would be pretty simple and I'm happy to file an issue + write the code.)

I would prefer to use urlsplit() rather than parsing parts of the URL ourselves. Theoretically you could specify the default protocol with its second argument:

url = urlsplit(url, "ssh").geturl()

However, it seems to handle "ssh" incorrect. So following code it is:

if not urlsplit(url).scheme:
  url = "ssh://" + url

I wouldn't mind having that code in there. So feel free to write a patch.

This looks like that 616-generic-hide is a branch that is only known by the remote. Calling git branch --track 616-generic-hide should fix that.

Yep, I realise that. The problem is that I think ensure_dependencies.py should create the tracking branch in that situation. (Remember it has often just done a fresh clone of the dependency.) Otherwise I have to run the script, have it fail, then go fix the tracking branch in the dependency and then re-run the script again. That's a pretty crappy workflow IMHO.

Feel free to look into it. Thing is we already fetch all remotes (--all) and tags (--tags), but there is no option to fetch all branches. So we could either discover the remote branches with git branch --remotes and subscribe all of them with git branch --track before performing git fetch. Or probably simpler and faster, passing the revision to Git.pull() explicitly requesting that revision with git fetch.

Last edited 4 years ago by sebastian (previous) (diff)

comment:11 in reply to: ↑ 10 Changed 4 years ago by kzar

Replying to sebastian:

Very helpful thanks, I've opened issues #2310 #2311 for both of those ideas. Happy for this to be closed now.


comment:12 Changed 4 years ago by sebastian

  • Resolution set to incomplete
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.