--- marp: true title: Version control with git for scientists author: P.Y. Barriat #description: https://dev.to/nikolab/complete-list-of-github-markdown-emoji-markup-5aia backgroundImage: url('assets/back.png') _backgroundImage: url('assets/garde.png') footer: 15/05/2023 | Version control with Git _footer: "" paginate: true _paginate: false --- Version control with `git` for scientists === ![h:150](assets/git.png) ### PY Barriat ##### May 15th, 2023 ###### some parts inspired on slides from CISM --- # Discuss :speech_balloon: ### How do you manage different file versions :question: ### How do you work with collaborators on the same files :question: ### ![h:250](assets/01.png) --- # Notions of code versioning ## Track the history and evolution of the project think of it as a series of snapshots (**commits**) of your code ### Benefits * possibility to go back in time :calendar: > tracking bugs > recovering from mistakes * Information about the modification :clipboard: > who, when, why --- - team work :busts_in_silhouette: * **Simultaneous** work on a project > No need to send email to say "I'm working on that file" (dropbox organization) * **Asynchronous** synchronisation > Allow work Offline (opposite to overleaf project) > Need conflict resolution ### Different usage * local * client-server (Subversion) * distributed (Git) --- ## Workflow **Testing new idea** (and easy way to throw them out) :construction: **Multiple version** of the code - Stable (1.x.y) - Debug (1.x.y+1) - Next "feature" release (1.x+1.0) - Next "huge" release (2.0.0) --- # Open-Source Code ### Compare Repositories ![bg left 100%](assets/open_source_code.png) --- # What is `git` ? #### Version control system - Manage different versions of files - Collaborate with yourself - Collaborate with other people #### Why use `git` > *"Always remember your first collaborator is your future self, and your past self doesn't answer emails"* > Christie Balhai :wink: --- # `git` workflow Your local repository consists of **three areas** maintained by `git` - the first one is your **Working Directory** which holds the actual files - the second one is the **INDEX** which acts as a staging area - and finally the **HEAD** which points to the last commit you've made ![h:250](assets/06.png) --- ![bg 65%](assets/git_workflow.png) --- # Windows : use `git` in VSCode [Visual Studio Code](https://code.visualstudio.com/) is one of the most popular and powerful text editors used by software engineers today Free and available for [macOS](https://www.youtube.com/watch?v=8CJXB4Nu1wo), [Windows](https://www.youtube.com/watch?v=AdeWO-n9O2Q), and [Linux](https://code.visualstudio.com/Download) > Linux: `wget` the **deb** package then `dpkg -i code*.deb` ## Prerequisite To use `git` in VSCode, first make sure you have `git` [installed on your computer](https://git-scm.com/download/win) > keep everything by default during the install process --- ## Configure `git` in VSCode - Open Settings (JSON) and add the properties to the end of the page : ```json { "terminal.integrated.profiles.windows": { "Git Bash": { "path":"C:\\Program Files\\Git\\bin\\bash.exe" }, }, "terminal.integrated.defaultProfile.windows": "Git Bash" } ``` - Reopen VS Code - Open a new terminal --- # Getting started with `git` **checkout** a remote repository >create a local working copy of a remote repository ```bash git clone https://gogs.elic.ucl.ac.be/TECLIM/Git_Training.git ``` **add** & **commit** > you can propose changes (add it to the **INDEX**) ```bash git add ``` > you can commit these changes (to the **HEAD**) ```bash git commit -m "Commit message" ``` --- # commit `git` versioning is a succession of snapshot of your files at key time of their development each snapshot is called **commit** which is : - all the files at a given time - a unique name (SLHA1) - metadata > who created, when, info - pointer to previous(es) commit(s) --- Your changes are now in the **HEAD** of your local working copy. **push** >to send those changes to your remote repository ```bash git push ``` **pull** >to update your local working directory to the newest commit, to fetch and merge remote changes ```bash git pull ``` --- # `git` diff ```mermaid sequenceDiagram %%autonumber participant Workspace participant INDEX %%Note right of Workspace: Text in note Workspace-->INDEX: git diff INDEX-->HEAD: git diff --cached Workspace-->HEAD: git diff HEAD ``` --- # `git` undo In case you did something wrong (which for sure never happens :wink:) ```mermaid sequenceDiagram %%autonumber participant Workspace participant INDEX Note over Workspace,INDEX: wrong modification of a file
in your workspace INDEX->>Workspace: git checkout -- file %%HEAD-->Workspace: .
Note over Workspace,HEAD: wrong modification of a file
that you put in your index HEAD->>INDEX: git reset HEAD file INDEX->>Workspace: git checkout -- file ``` --- # Simple Git Exercices First, configure your environment (just once) :construction: > on your laptop, on your ELIC account, etc ```bash git config --global user.name "Your Name" git config --global user.email "foo@bar.be" git config --global color.ui auto git config --global core.editor "vim" git config --list ``` Now, clone https://gogs.elic.ucl.ac.be/TECLIM/Git_Training.git > Theses are very simple exercices to learn to manipulate git. > In each folder, simply run `./create.sh` and follow the guide :sunglasses: --- # `git` branches - a **branch** is pointer to a commit (represent an history) - a **branch** can point at other commit > it can move ! - a **branch** is a way to organize your work and working histories - since commit know which commits they are based on, **branch** represents a commit and what came before it - a branch is cheap, you can have multiple **branch** in the same repository and switch your working dir from one **branch** state to another --- # branches demo ```bash git commit git checkout -b newbranch git checkout newbranch git commit git commit git checkout master git commit git commit ``` > default branch: **master** ```mermaid graph LR; A[fffc93b] -->|commit| B(fc7f81f) B -->|commit| D(6fd1a5a) D -->|commit| E[newbranch
187e6ab ] B ---->|commit| Z(6ff4c2e) Z -->|commit| Y[master
c0f502f ] ``` --- - create a new branch : `git checkout -b newbranch` - switch to a branch : `git checkout newbranch` - delete a branch : `git branch -d newbranch` - list all branches : `git branch -a` > see both local and remote branches ### branch is cheap : do it often :+1: > branch allow to have short/long term parallel development --- # merging branches the interest of branch is that you can **merge** them > include in one (branch) file the modification done somewhere else ```bash git merge bx git branch -d bx git commit ``` ```mermaid graph LR; A(fffc93b
b_x) -->E[187e6ab
b_x merged
in b_y] D[6fd1a5a
b_y] -->E E -->|commit| Y[c0f502f
b_y ] ``` --- # Difference between `git` & `GitHub` ? `git` is the version control system **service** >git runs local if you don't use GitHub `GitHub` is the hosting service : **website** >on which you can publish (push) your git repositories and collaborate with other people --- # `Github` - It provides a backup of your files - It gives you a visual interface for navigating your repos - It gives other people a way to navigate your repos - It makes repo collaboration easy (e.g., multiple people contributing to the same project) - It provides a lightweight issue tracking system --- # ... and `GitLab` vs `GitHub` vs others `GitLab` is an alternative to `GitHub` > `GitLab` is free for unlimited private projects. `GitHub` doesn't provide private projects for free And for **ELIC**, `Gogs` does the job: https://gogs.elic.ucl.ac.be/ - shares the same features > dashboard, file browser, issue tracking, groups support, webhooks, etc - easy to install, cross-platform friendly - uses little memory, uses little CPU power - ... and 100% free :smile: --- # What is `git` good for ? #### Local > Backup, reproducibility ![h:300](assets/02.png) --- #### Client-Server > Backup, reproducibility, collaboration ![h:500](assets/03.png) --- ![bg 95%](assets/04.png) --- # `git` conflict :boom: ### multiple version of files are great - not always easy to know how to merge them - conflict will happen (same line modify by both user) ### conflict need to be resolved manually ! :fearful: - boring task - need to understand why a conflict is present ! - **do not be afraid of conflict !** :muscle: > Do not try to avoid them at all cost ! - stay in sync as most as possible and keep line short --- #### Distributed > Backup, reproducibility, collaboration, transparency ![h:500](assets/05.png) --- ```mermaid sequenceDiagram autonumber participant Workspace participant INDEX participant HEAD participant Remote Repository Remote Repository->>HEAD: clone HEAD->>Workspace: checkout %%Note over Workspace,Remote Repository: "clone" = clone + checkout Workspace->>INDEX: add INDEX->>HEAD: commit Remote Repository->>HEAD: fetch HEAD->>Workspace: merge Remote Repository->>Workspace: pull %%Note over Workspace,Remote Repository: pull = fetch + merge HEAD->>Remote Repository: push ``` --- # Conclusion - **versioning** is crucial both for small/large project :exclamation: - avoid dropbox for paper / project :confounded: - do meaningful commit - do meaningful message - `git` more complicated but the standard :smiley: --- # Version control with Git for scientists :chart_with_upwards_trend: