Git Workflow

From Docswiki
Jump to navigation Jump to search

Group software and papers in the process of being written are stored on the University's GitLab repositories [1]. You should be able to log in via Raven, but someone with privileges will need to add you to the two projects. This page describes a typical workflow for retrieving, modifying and updating the repositories. It is not, however, a comprehensive guide to Git. For that, consult your favourite web search engine.

Setting up SSH access

To smoothly access Gitlab without having to type your user name and password the whole time, set up an SSH key. In a terminal on your desktop, type

 $ ssh-keygen -t ed25519 -C "GitLab"

If it complains about overwriting, then you have already done this step and you probably don't want to overwrite. It will ask you where to save the key: the default location should be fine. Now you are prompted for a passphrase. You can press ENTER to leave it blank, although that does mean that anyone who breaks into your computer can get access to GitLab with no further effort. You now have a file (default location is ~/.ssh/id_ed25519.pub) that contains your public key. Copy the entire contents of this file.

On the GitLab website, click on your user icon in the top right and select 'Settings' and then 'SSH Keys' from the left menu. Paste the contents of your public key file into the box. Put something useful in the title, like the name of your desktop (yes, you should probably do this for each machine you want to access GitLab from, rather than copying keys between machines). Optionally, you can insert an expiry date, such as the date your funding runs out. Click the 'Add key' button.

To test that you now have access, in a terminal type

 $ ssh -T git@gitlab.developers.cam.ac.uk

After accepting the RSA identity, you should see a welcome message and then the connection will close.

Some users have reported problems with using ed25519. RSA is available as an alternative. Generate an RSA key with

 $ ssh-keygen -t rsa -C "GitLab"

which will be saved by default at ~/.ssh/id_rsa.pub. Follow the same steps as above to add your public key to Gitlab.

Installing git LFS

Our software repository (git@gitlab.developers.cam.ac.uk:ch/wales/softwarewales.git) uses git Large File Storage (LFS) to manage some of the larger files. You must have the git LFS addon installed and initialised before cloning the software repository. If you do not, the clone will appear to succeed, but you will be missing some files. If you are using a department managed workstation, or any cluster other than rogue or nest, you will need to load a newer version of git:

 $ module load git/2.0.0

Replace 2.0.0 with whatever the newest available version of git is. Or, on your personal Ubuntu machine, run

 $ sudo apt-get install git-lfs

to install the necessary package. Whatever computer you are using, then run

 $ git lfs install

to inform git about the new LFS addon. This command only needs to be run once on each machine you intend to clone the software repository on. If you get the error message

 Error: failed to call git rev-parse --git-dir: exit status 128 : fatal: .git

then try

 $ git lfs install --skip-repo

instead. After you have cloned the repository, you should inspect the LFS files to make sure the clone worked correctly. You can see a list of the files in .gitattributes in the repository root directory. A good file to check might be THESES/PHD/ChrisWhittlestonPhD.pdf. If this file is a PDF of several megabytes, then the LFS succeeded. If it is a small plaintext file containing a URL, the LFS clone did not succeed.

If you have already cloned the repository before installing the LFS addon, you will need to clone again (a pull will not suffice).

Initial Checkout

You need to fetch the repository. In Git terms, this is called cloning. You should only need to carry out this step once for each repository. In your favourite web browser, navigate to the project page on Gitlab, eg. software. Spot the blue button on the right labelled 'Clone' and click on it. Copy the link under 'Clone with SSH' to the clipboard (don't use the 'clone with HTTPS' link, or you will have to type your username and password every time). In a terminal, choose a suitable location, like your home directory and change to there. Now type

 $ git clone git@gitlab.developers.cam.ac.uk:ch/wales/softwarewales.git

replacing the address with what you just copied. Git will download the repository. Once it has finished, check that you now have lots of new directories with the contents of the repository.

You should also tell Git your name and email address. Git will record these in the commit logs so other users will know who to complain to when a commit breaks everything. Run

 $ git config --global user.name "An Other"
 $ git config --global user.email "ao123@cam.ac.uk"

replacing the name and email address in quotes as appropriate.

Submodules

Our software repository has become quite large as external potentials have been added. Submodules offer a way to compartmentalise the repository and speed up clones and updates. Self-contained third-party potentials are good candidates for splitting off into submodules. We will use GDML (Gradient-Descent Machine Learning) as an example. On an initial checkout, you will see a GDML/ directory in the root of the repository, but the directory will be empty. Most users do not need GDML, so will not care or even be particularly aware that GDML has not be cloned.

Let us suppose that you actually need GDML. You can tell git that it is required by running

 $ git submodule init GDML

Then the next time you run

 $ git submodule update

the contents of GDML will be checked out. If GDML is subsequently updated, run the update command again to get the newest version. If at some later point you decide that you have finished with GDML and no longer need it checked out, run

 $ git submodule deinit GDML

and the GDML directory will be emptied. If the submodule you are interested in has its own submodules, add the --recursive flag to the commands. If you decide that you want all the submodules run

 $ git submodule update --init --recursive

and all submodules will be checked out. This command is not recommended and you must have a good reason why you need all the submodules before you consider running it.

Creating a new submodule

If you are creating an interface to a new large external potential, it may be appropriate to add the external potential as a submodule. It is appropriate if the potential is quite large (more than a few MB), is being placed in the root of the repository, and is not likely to require much in the way of changes after the interface is set up. Instead of adding the external potential to the softwarewales repository, create a new repository under Wales Group on Gitlab and place the files there. Within softwarewales run

 $ git submodule add git@gitlab.developers.cam.ac.uk:ch/wales/my_new_repository.git

where my_new_repository is replaced with whatever you named the new repository. You could also got the appropriate URL by going onto Gitlab for the new repository, clicking the 'Clone' button and copying. This command makes the necessary changes to the .gitmodules file, which will then need to be committed and pushed.

Turning an existing subdirectory into a submodule is also possible, but is slightly more complicated and is considered an advanced topic. Google is your friend. Do not mess with the Gitlab repository until you are satisfied you have made the correct adjustments locally.

Basic Workflow

Details for specific cases are below, but first, we mention the most important commands that you'll be running all the time. Imagine you've just arrived in the morning and it's time to start working on myfile.f90. The first command to run is

 $ git pull

This command contacts the remote repository on GitLab and fetches any commits that people may have made. Run this command frequently, and at least before every commit you make. One notable difference from updating in svn is that git will not merge other people's changes with files you have changed since your last commit. If other people have changed files you are working on, the pull will fail with an informative message. In this case, run

 $ git stash

which sets aside your local changes. Try the pull again, which should now succeed. Then run

 $ git stash pop

to reapply your local changes to the updated repository. The merge will usually happen automatically, but sometimes you will need to resolve conflicts yourself.

Now you edit myfile.90 and want to commit your changes. Run

 $ git add myfile.f90

Now the file is, in Git terminology, staged for commit. You haven't committed anything yet. You can add other files to the staging area too. Once your commit is ready, run

 $ git commit -m "Informative message."

Replace 'Informative message' with a brief message describing what changes are in your commit. At this point, you have updated your local repository and entered a commit in the permanent record. However, the commit hasn't gone to GitLab yet. To send it to GitLab (called the remote by Git), run

 $ git push

You can send multiple commits at once. This workflow should encourage you to commit often. Maybe you write a new function. Put in a commit. Then you add some stuff to keywords.f90 for the new functionality. Do another commit. Next you find a bug and fix it. Do another commit.

Note that the remote can function as the backup of your work. Therefore you should probably push any new commits at least as often as the end of each day.

Checking out an older version of the master branch

There are all sorts of useful git tools for checking how individual files and branches have changed. See the documentation for git diff for example. To check out the master branch current at a given date you can use:

git checkout `git rev-list -1 --before="Feb 11 2024" master`

Writing a Paper

Writing a paper is slightly simpler than editing the group code (Discuss...) because we aren't worrying about multiple branches. Each paper is a separate repository. To start a new paper, go to Gitlab and create a new repository by clicking the blue 'New Project' button on your home screen. Create a blank project and make sure the project URL indicates it is under ch/wales rather than in your user space. Checkout the new repository and start writing the paper in the blank directory. Each session of editing should involve

  1. git pull
  2. make some edits
  3. git pull
  4. git add all the edited files
  5. git commit with a helpful message
  6. git push

Simples. All authors will be editing the same branch (the master branch), so you'll see other authors' updates straight away. This approach keeps things easy, but if two authors are working at exactly the same time, there may be some merging to do. Reduce the amount of merging by committing, pulling and pushing often.

Do not add intermediate LaTeX files (.aux, .log, etc.) to the repository. Do not add your .dvi/.ps/.pdf documents either (except perhaps for proofs created by the journal). When it comes to revisions and resubmissions, do not create a new subdirectory for the new version. Git keeps the whole history so it is always possible to revert to a previous version.

You will also want to checkout the bibliography repository but you only need one copy of this, not one for each paper. A suitable directory structure might be to have a papers/ directory under your home directory that contains a directory for the bibliography repository and a directory for each paper you are currently working on.

Working on the group code

You've just been talking to David and you've come up with an exciting new feature to add to GMIN. It's going to take several days of coding, during which you'll want to back up your work on the remote, but you don't want to interfere with other people using GMIN. The solution is to create a new branch. A branch is your own version controlled copy of the code that you can edit at will without messing GMIN up for anyone else. All development should occur on branches. To create a new branch, run

 $ git checkout -b exciting_feature

This command both creates the branch and switches your working copy to it. Initially, your new branch is the same as the master branch you cut it from. However, the branch does not yet exist on GitLab. To create it, run

 $ git push --set-upstream origin exciting_feature

Now go ahead and edit files, making commits and pushing them to GitLab frequently.

When your feature is complete and you have checked it works and that you haven't broken anything else, it's time to get it into the master branch. Several steps are required. Firstly, it's quite likely that other people have changed master since you cut off your branch. You need to test that your changes function with the new changes to master, so first you need to merge in master:

 $ git checkout master
 $ git pull
 $ git checkout exciting_feature
 $ git merge master
 $ git push

The pull commands makes sure that your local copy of the repository is up to date. The merge command merges changes that have been made to the master branch to your branch. It creates new commits, that you then push to the remote of your branch.

Most users do not have permission to edit the master branch. To get your new feature in, you have to create a merge request. Go to the project page on GitLab. From the drop down menu of branches, select exciting_feature. Click the blue 'Create merge request' button in the top right. You now have a page in which you can give your merge request a title and description. You should assign the request to yourself and anyone else who has worked on this branch. Choose the person who is going to review for you. Make sure you tick the boxes to delete the feature branch after the request is accepted. These options help keep the remote repository and history clean. You can edit the commit message for the one commit that will be created: by default it will be the name of your branch. The person you select as reviewer will get a notification and a copy of your changes. They will look through your changes to make sure they follow the group coding standards (here) and that you haven't broken anything. If there are any issues, they may request that you make some changes, which you can then commit to the exciting_feature branch. The pull request will be updated and the reviewer will get a notification. However, the reviewer will not be doing extensive testing, so it remains your responsibility to follow the coding standards and make sure everything works. You should make sure your changes compile with nagfor before submitting the merge request, as that is the most particular compiler. Once your code has passed the review, the reviewer will click the Merge button and your branch will be merged into master. Just 'Approve' from the reviewer won't usually be enough because you probably don't have permission to write to Master and hence cannot merge your new branch into master yourself. Once it has been merged, you can clean up your repository with the following commands

 $ git checkout master
 $ git pull
 $ git branch -d exciting_feature

These commands switch your working copy back to master, update your local copy of the repository, and delete the branch you made. You might like to check that your new feature is in the master branch files before deleting your branch. If something has gone wrong and you delete your branch before the changes are in master, it is possible to recover, as the commits won't actually be deleted from the remote for a few weeks, but the recovery is an advanced topic that is best avoided.

Large Files

If any new file you are adding is large (>10MB), it should be stored on git LFS, rather than as a normal file. Fortunately, this is easy to do for new files. If you have already committed a large file and would now like to change it, you have a very complicated process ahead. The best instructions the author could find when doing this in the initial repository migration were here https://stackoverflow.com/questions/60995429/gradually-converting-a-repo-to-use-git-lfs-for-certain-files-one-file-at-a-time in the question, with the caveat that the bfg utility does not work and it was necessary to use git filter-branch as described here https://dalibornasevic.com/posts/2-permanently-remove-files-and-folders-from-a-git-repository instead. Note this process will delete the history of your existing file and it will only appear from the most recent commit. It may be possible to adjust this with a git rebase, but the author has not investigated.

Anyway, if you haven't yet committed your large file, you simply need to run

 $ git lfs track "<path-to-file>"

which informs git that this file is to be uploaded using LFS. This command edits the file .gitattributes, which will also need to be added to your commit. If you make a mistake, you can edit .gitattributes manually. You can see some examples as well as the current list of files that are uploaded with LFS in .gitattributes. As you can see from inspecting the file, it is also possible to use wildcards ('*') to specify multiple files at once. Be careful with your rules though: the rule is applied over all files, so if you add a rule for 'myfile.f90' and somewhere else in the repository there is another file with the same name, it will now also be uploaded with LFS. This can be useful for specifying, for example, all .mp4 files in the whole repository, with "*.mp4". However, if you really want just a specific file, specify the path from the repository root, for example "OPTIM/source/myfile.f90".

Useful commands to know

 $ git status

At any point, this command will show you what branch you are on, what files you have modified and staged and your local position compared to the remote. Use it often.

 $ git branch

Display a list of all the current branches.

 $ git diff

Show the differences between your working copy and the last commit, for all files. Add a file name to show only the differences for a specific file.

 $ git log

Display the commit history. Add a file or directory name afterwards to only show the commits that affected that file, or any file in the directory.

 $ git reset HEAD myfile.f90

Unstage myfile.f90 that you accidentally staged for the next commit, but actually don't want to commit just yet. The working copy of the file is not altered.

 $ git checkout -- myfile.f90

Revert myfile.f90 that you've completely messed up to what it was at the last commit. Changes to your working copy are lost.

 $ git reset --hard

Throw away all working and staged changes, reverting the current state to the last commit.

 $ git reset --hard 909a3cac63ae8782b258ebb8c27af361b555bff6

Throw away all working and staged changes, reverting the current state to that of the commit specified. The long hex number is a commit hash. It is not human readable, but you can copy the relevant one from the commit log.

 $ git clean -f

Throw away all untracked files. They will be deleted. Run with -n rather than -f to see which files would be deleted, but without actually doing anything.

 $ git fetch -p && for branch in $(git branch -vv | grep ': gone]' | awk '{print $1}'); do git branch -D $branch; done

Delete all local branches that do not exist on GitLab. This command is useful to periodically clean up local branches after they have been merged and deleted on GitLab. Warning: do not run this command if you've created a new local branch and not yet pushed it to GitLab.