[PATCH] Documentation: pull, push, packing repository and working with others.

Describe where you can pull from with a bit more detail. Clarify description of pushing. Add a section on packing repositories. Add a section on recommended workflow for the project lead, subsystem maintainers and individual developers. Move "Tag" section around to make the flow of example simpler to follow. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-15 11:40:56 -07:00 · 2005-07-15 11:40:56 -07:00 · 3eb5128a10
commit 3eb5128a10
parent e7c1ca4273
1 changed files with 281 additions and 72 deletions
--- a/Documentation/tutorial.txt
+++ b/Documentation/tutorial.txt
@ -453,6 +453,55 @@ With that, you should now be having some inkling of what git does, and
 can explore on your own.
 [ Side note: most likely, you are not directly using the core
  git Plumbing commands, but using Porcelain like Cogito on top
  of it.  Cogito works a bit differently and you usually do not
  have to run "git-update-cache" yourself for changed files (you
  do tell underlying git about additions and removals via
  "cg-add" and "cg-rm" commands).  Just before you make a commit
  with "cg-commit", Cogito figures out which files you modified,
  and runs "git-update-cache" on them for you.  ]
 	Tagging a version
 	-----------------
 In git, there's two kinds of tags, a "light" one, and a "signed tag".
 A "light" tag is technically nothing more than a branch, except we put
 it in the ".git/refs/tags/" subdirectory instead of calling it a "head".
 So the simplest form of tag involves nothing more than
 	cat .git/HEAD > .git/refs/tags/my-first-tag
 after which point you can use this symbolic name for that particular
 state. You can, for example, do
 	git diff my-first-tag
 to diff your current state against that tag (which at this point will
 obviously be an empty diff, but if you continue to develop and commit
 stuff, you can use your tag as a "anchor-point" to see what has changed
 since you tagged it.
 A "signed tag" is actually a real git object, and contains not only a
 pointer to the state you want to tag, but also a small tag name and
 message, along with a PGP signature that says that yes, you really did
 that tag. You create these signed tags with
 	git tag <tagname>
 which will sign the current HEAD (but you can also give it another
 argument that specifies the thing to tag, ie you could have tagged the
 current "mybranch" point by using "git tag <tagname> mybranch").
 You normally only do signed tags for major releases or things
 like that, while the light-weight tags are useful for any marking you
 want to do - any time you decide that you want to remember a certain
 point, just create a private tag for it, and you have a nice symbolic
 name for the state at that point.
 	Copying archives
 	-----------------
@ -729,117 +778,277 @@ simply do
 and optionally give a branch-name for the remote end as a second
 argument.
-[ Todo: fill in real examples ]
+The "remote" repository can even be on the same machine.  One of
 the following notations can be used to name the repository to
 pull from:
 	Rsync URL
 		rsync://remote.machine/path/to/repo.git/
-	Tagging a version
+	HTTP(s) URL
-	-----------------
+		http://remote.machine/path/to/repo.git/
-In git, there's two kinds of tags, a "light" one, and a "signed tag".
+	GIT URL
 		git://remote.machine/path/to/repo.git/
 		remote.machine:/path/to/repo.git/
-A "light" tag is technically nothing more than a branch, except we put
+	Local directory
-it in the ".git/refs/tags/" subdirectory instead of calling it a "head".
+		/path/to/repo.git/
 So the simplest form of tag involves nothing more than
-	cat .git/HEAD > .git/refs/tags/my-first-tag
+[ Side Note: currently, HTTP transport is slightly broken in
  that when the remote repository is "packed" they do not always
  work.  But we have not talked about packing repository yet, so
  let's not worry too much about it for now.  ]
-after which point you can use this symbolic name for that particular
+[ Digression: you could do without using any branches at all, by
-state. You can, for example, do
+  keeping as many local repositories as you would like to have
-
+  branches, and merging between them with "git pull", just like
-	git diff my-first-tag
+  you merge between branches.  The advantage of this approach is
-
+  that it lets you keep set of files for each "branch" checked
-to diff your current state against that tag (which at this point will
+  out and you may find it easier to switch back and forth if you
-obviously be an empty diff, but if you continue to develop and commit
+  juggle multiple lines of development simultaneously.  Of
-stuff, you can use your tag as a "anchor-point" to see what has changed
+  course, you will pay the price of more disk usage to hold
-since you tagged it.
+  multiple working trees, but disk space is cheap these days.  ]
 A "signed tag" is actually a real git object, and contains not only a
 pointer to the state you want to tag, but also a small tag name and
 message, along with a PGP signature that says that yes, you really did
 that tag. You create these signed tags with
 	git tag <tagname>
 which will sign the current HEAD (but you can also give it another
 argument that specifies the thing to tag, ie you could have tagged the
 current "mybranch" point by using "git tag <tagname> mybranch").
 You normally only do signed tags for major releases or things
 like that, while the light-weight tags are useful for any marking you
 want to do - any time you decide that you want to remember a certain
 point, just create a private tag for it, and you have a nice symbolic
 name for the state at that point.
 	Publishing your work
 	--------------------
-We already talked about using somebody else's work from a remote
+So we can use somebody else's work from a remote repository; but
-repository, in the "merging external work" section.  It involved
+how can _you_ prepare a repository to let other people pull from
-fetching the work from a remote repository; but how would _you_
+it?
 prepare a repository so that other people can fetch from it?
-Your real work happens in your working directory with your
+Your do your real work in your working directory that has your
 primary repository hanging under it as its ".git" subdirectory.
-You _could_ make it accessible remotely and ask people to pull
+You _could_ make that repository accessible remotely and ask
-from it, but in practice that is not the way things are usually
+people to pull from it, but in practice that is not the way
-done.  A recommended way is to have a public repository, make it
+things are usually done.  A recommended way is to have a public
-reachable by other people, and when the changes you made in your
+repository, make it reachable by other people, and when the
-primary working directory are in good shape, update the public
+changes you made in your primary working directory are in good
-repository with it.
+shape, update the public repository from it.  This is often
 called "pushing".
 [ Side note: this public repository could further be mirrored,
  and that is how kernel.org git repositories are done.  ]
-Publishing the changes from your private repository to your
+Publishing the changes from your local (private) repository to
-public repository requires you to have write privilege on the
+your remote (public) repository requires a write privilege on
-machine that hosts your public repository, and it is internally
+the remote machine.  You need to have an SSH account there to
-done via an SSH connection.
+run a single command, "git-receive-pack".
-First, you need to create an empty repository to push to on the
+First, you need to create an empty repository on the remote
-machine that houses your public repository.  This needs to be
+machine that will house your public repository.  This empty
 repository will be populated and be kept up-to-date by pushing
 into it later.  Obviously, this repository creation needs to be
 done only once.
 [ Digression: "git push" uses a pair of programs,
  "git-send-pack" on your local machine, and "git-receive-pack"
  on the remote machine.  The communication between the two over
  the network internally uses an SSH connection.  ]
 Your private repository's GIT directory is usually .git, but
-often your public repository is named "<projectname>.git".
+your public repository is often named after the project name,
-Let's create such a public repository for project "my-git".
+i.e. "<project>.git".  Let's create such a public repository for
-After logging into the remote machine, create an empty
+project "my-git".  After logging into the remote machine, create
-directory:
+an empty directory:
 	mkdir my-git.git
-Then, initialize that directory with git-init-db, but this time,
+Then, make that directory into a GIT repository by running
-since it's name is not usual ".git", we do things a bit
+git-init-db, but this time, since it's name is not the usual
-differently:
+".git", we do things slightly differently:
 	GIT_DIR=my-git.git git-init-db
 Make sure this directory is available for others you want your
-changes to be pulled by.  Also make sure that you have the
+changes to be pulled by via the transport of your choice.  Also
-'git-receive-pack' program on the $PATH.
+you need to make sure that you have the "git-receive-pack"
 program on the $PATH.
-[ Side note: many installations of sshd does not invoke your
+[ Side note: many installations of sshd do not invoke your shell
-  shell as the login shell when you directly run programs; what
+  as the login shell when you directly run programs; what this
-  this means is that if your login shell is bash, only .bashrc
+  means is that if your login shell is bash, only .bashrc is
-  is read bypassing .bash_profile.  As a workaround, make sure
+  read and not .bash_profile.  As a workaround, make sure
-  .bashrc sets up $PATH so that 'git-receive-pack' program can
+  .bashrc sets up $PATH so that you can run 'git-receive-pack'
-  be run.  ]
+  program.  ]
-Your 'public repository' is ready to accept your changes.  Now,
+Your "public repository" is now ready to accept your changes.
-come back to the machine you have your private repository.  From
+Come back to the machine you have your private repository.  From
 there, run this command:
 	git push <public-host>:/path/to/my-git.git master
 This synchronizes your public repository to match the named
-branch head (i.e. refs/heads/master in this case) and objects
+branch head (i.e. "master" in this case) and objects reachable
-reachable from them in your current repository.
+from them in your current repository.
 As a real example, this is how I update my public git
 repository.  Kernel.org mirror network takes care of the
-propagation to other publically visible machines:
+propagation to other publicly visible machines:
 	git push master.kernel.org:/pub/scm/git/git.git/ 
-[ to be continued.. cvsimports, pushing and pulling ]
+[ Digression: your GIT "public" repository people can pull from
  is different from a public CVS repository that lets read-write
  access to multiple developers.  It is a copy of _your_ primary
  repository published for others to use, and you should not
  push into it from more than one repository (this means, not
  just disallowing other developers to push into it, but also
  you should push into it from a single repository of yours).
  Sharing the result of work done by multiple people are always
  done by pulling (i.e. fetching and merging) from public
  repositories of those people.  Typically this is done by the
  "project lead" person, and the resulting repository is
  published as the public repository of the "project lead" for
  everybody to base further changes on.  ]
 	Packing your repository
 	-----------------------
 Earlier, we saw that one file under .git/objects/??/ directory
 is stored for each git object you create.  This representation
 is convenient and efficient to create atomically and safely, but
 not so to transport over the network.  Since git objects are
 immutable once they are created, there is a way to optimize the
 storage by "packing them together".  The command
 	git repack
 will do it for you.  If you followed the tutorial examples, you
 would have accumulated about 17 objects in .git/objects/??/
 directories by now.  "git repack" tells you how many objects it
 packed, and stores the packed file in .git/objects/pack
 directory.
 [ Side Note: you will see two files, pack-*.pack and pack-*.idx,
  in .git/objects/pack directory.  They are closely related to
  each other, and if you ever copy them by hand to a different
  repository for whatever reason, you should make sure you copy
  them together.  The former holds all the data from the objects
  in the pack, and the latter holds the index for random
  access.  ]
 If you are paranoid, running "git-verify-pack" command would
 detect if you have a corrupt pack, but do not worry too much.
 Our programs are always perfect ;-).
 Once you have packed objects, you do not need to leave the
 unpacked objects that are contained in the pack file anymore.
 	git prune-packed
 would remove them for you.
 You can try running "find .git/objects -type f" before and after
 you run "git prune-packed" if you are curious.
 [ Side Note: as we already mentioned, "git pull" is broken for
  some transports dealing with packed repositories right now, so
  do not run "git prune-packed" if you plan to give "git pull"
  access via HTTP transport for now.  ]
 If you run "git repack" again at this point, it will say
 "Nothing to pack".  Once you continue your development and
 accumulate the changes, running "git repack" again will create a
 new pack, that contains objects created since you packed your
 archive the last time.  We recommend that you pack your project
 soon after the initial import (unless you are starting your
 project from scratch), and then run "git repack" every once in a
 while, depending on how active your project is.
 When a repository is synchronized via "git push" and "git pull",
 objects packed in the source repository is usually stored
 unpacked in the destination, unless rsync transport is used.
 	Working with Others
 	-------------------
 A recommended work cycle for a "project lead" is like this:
 (1) Prepare your primary repository on your local machine. Your
     work is done there.
 (2) Prepare a public repository accessible to others.
 (3) Push into the public repository from your primary
     repository.
 (4) "git repack" the public repository.  This establishes a big
     pack that contains the initial set of objects.
 (5) Keep working in your primary repository, and push your
     changes to the public repository.  Your changes include
     your own, patches you receive via e-mail, and merge resulting
     from pulling the "public" repositories of your "subsystem
     maintainers".
     You can repack this private repository whenever you feel
     like.
 (6) Every once in a while, "git repack" the public repository.
     Go back to step (5) and continue working.
 A recommended work cycle for a "subsystem maintainer" that
 works on that project and has own "public repository" is like
 this:
 (1) Prepare your work repository, by "git clone" the public
     repository of the "project lead".
 (2) Prepare a public repository accessible to others.
 (3) Copy over the packed files from "project lead" public
     repository to your public repository by hand; this part is
     currently not automated.
 (4) Push into the public repository from your primary
     repository.
 (5) Keep working in your primary repository, and push your
     changes to your public repository, and ask your "project
     lead" to pull from it.  Your changes include your own,
     patches you receive via e-mail, and merge resulting from
     pulling the "public" repositories of your "project lead"
     and possibly your "sub-subsystem maintainers".
     You can repack this private repository whenever you feel
     like.
 (6) Every once in a while, "git repack" the public repository.
     Go back to step (5) and continue working.
 A recommended work cycle for an "individual developer" who does
 not have a "public" repository is somewhat different.  It goes
 like this:
 (1) Prepare your work repositories, by "git clone" the public
     repository of the "project lead" (or "subsystem
     maintainer", if you work on a subsystem).
 (2) Copy .git/refs/master to .git/refs/upstream.
 (3) Do your work there.  Make commits.
 (4) Run "git fetch" from the public repository of your upstream
     every once in a while.  This does only the first half of
     "git pull" but does not merge.  The head of the public
     repository is stored in .git/FETCH_HEAD.  Copy it in
     .git/refs/heads/upstream.
 (5) Use "git cherry" to see which ones of your patches were
     accepted, and/or use "git rebase" to port your unmerged
     changes forward to the updated upstream.
 (6) Use "git format-patch upstream" to prepare patches for
     e-mail submission to your upstream and send it out.
     Go back to step (3) and continue. 
 [Side Note: I think Cogito calls this upstream "origin".
 Somebody care to confirm or deny?  ]
 [ to be continued.. cvsimports ]