Metadata-Version: 2.1
Name: grokmirror
Version: 2.0.11
Summary: Smartly mirror git repositories that use grokmirror
Home-page: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
Author: Konstantin Ryabitsev
Author-email: konstantin@linuxfoundation.org
License: GPLv3+
Download-URL: https://www.kernel.org/pub/software/network/grokmirror/grokmirror-2.0.11.tar.xz
Project-URL: Source, https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
Project-URL: Tracker, https://github.com/mricon/grokmirror/issues
Description: GROKMIRROR
        ==========
        --------------------------------------------
        Framework to smartly mirror git repositories
        --------------------------------------------
        
        :Author:    konstantin@linuxfoundation.org
        :Date:      2020-09-18
        :Copyright: The Linux Foundation and contributors
        :License:   GPLv3+
        :Version:   2.0.0
        
        DESCRIPTION
        -----------
        Grokmirror was written to make replicating large git repository
        collections more efficient. Grokmirror uses the manifest file published
        by the origin server in order to figure out which repositories to clone,
        and to track which repositories require updating. The process is
        lightweight and efficient both for the primary and for the replicas.
        
        CONCEPTS
        --------
        The origin server publishes a json-formatted manifest file containing
        information about all git repositories that it carries. The format of
        the manifest file is as follows::
        
            {
              "/path/to/bare/repository.git": {
                "description": "Repository description",
                "head":        "ref: refs/heads/branchname",
                "reference":   "/path/to/reference/repository.git",
                "forkgroup":   "forkgroup-guid",
                "modified":    timestamp,
                "fingerprint": sha1sum(git show-ref),
                "symlinks": [
                    "/location/to/symlink",
                    ...
                ],
               }
               ...
            }
        
        The manifest file is usually gzip-compressed to preserve bandwidth.
        
        Each time a commit is made to one of the git repositories, it
        automatically updates the manifest file using an appropriate git hook,
        so the manifest.js file should always contain the most up-to-date
        information about the state of all repositories.
        
        The mirroring clients will poll the manifest.js file and download the
        updated manifest if it is newer than the locally stored copy (using
        ``Last-Modified`` and ``If-Modified-Since`` http headers). After
        downloading the updated manifest.js file, the mirrors will parse it to
        find out which repositories have been updated and which new repositories
        have been added.
        
        Object Storage Repositories
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Grokmirror 2.0 introduces the concept of "object storage repositories",
        which aims to optimize how repository forks are stored on disk and
        served to the cloning clients.
        
        When grok-fsck runs, it will automatically recognize related
        repositories by analyzing their root commits. If it finds two or more
        related repositories, it will set up a unified "object storage" repo and
        fetch all refs from each related repository into it.
        
        For example, you can have two forks of linux.git:
          torvalds/linux.git:
            refs/heads/master
            refs/tags/v5.0-rc3
            ...
        
        and its fork:
        
          maintainer/linux.git:
            refs/heads/master
            refs/heads/devbranch
            refs/tags/v5.0-rc3
            ...
        
        Grok-fsck will set up an object storage repository and fetch all refs from
        both repositories:
        
          objstore/[random-guid-name].git
             refs/virtual/[sha1-of-torvalds/linux.git:12]/heads/master
             refs/virtual/[sha1-of-torvalds/linux.git:12]/tags/v5.0-rc3
             ...
             refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/master
             refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/devbranch
             refs/virtual/[sha1-of-maintainer/linux.git:12]/tags/v5.0-rc3
             ...
        
        Then both torvalds/linux.git and maintainer/linux.git with be configured
        to use objstore/[random-guid-name].git via objects/info/alternates
        and repacked to just contain metadata and no objects.
        
        The alternates repository will be repacked with "delta islands" enabled,
        which should help optimize clone operations for each "sibling"
        repository.
        
        Please see the example grokmirror.conf for more details about
        configuring objstore repositories.
        
        
        ORIGIN SETUP
        ------------
        Install grokmirror on the origin server using your preferred way.
        
        **IMPORTANT: Only bare git repositories are supported.**
        
        You will need to add a hook to each one of your repositories that would
        update the manifest upon repository modification. This can either be a
        post-receive hook, or a post-update hook. The hook must call the
        following command::
        
            /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
                -t /var/lib/gitolite3/repositories -n `pwd`
        
        The **-m** flag is the path to the manifest.js file. The git process
        must be able to write to it and to the directory the file is in (it
        creates a manifest.js.randomstring file first, and then moves it in
        place of the old one for atomicity).
        
        The **-t** flag is to help grokmirror trim the irrelevant toplevel disk
        path, so it is trimmed from the top.
        
        The **-n** flag tells grokmirror to use the current timestamp instead of
        the exact timestamp of the commit (much faster this way).
        
        Before enabling the hook, you will need to generate the manifest.js of
        all your git repositories. In order to do that, run the same command,
        but omit the -n and the \`pwd\` argument. E.g.::
        
            /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
                -t /var/lib/gitolite3/repositories
        
        The last component you need to set up is to automatically purge deleted
        repositories from the manifest. As this can't be added to a git hook,
        you can either run the ``--purge`` command from cron::
        
            /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
                -t /var/lib/gitolite3/repositories -p
        
        Or add it to your gitolite's ``D`` command using the ``--remove`` flag::
        
            /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
                -t /var/lib/gitolite3/repositories -x $repo.git
        
        If you would like grok-manifest to honor the ``git-daemon-export-ok``
        magic file and only add to the manifest those repositories specifically
        marked as exportable, pass the ``--check-export-ok`` flag. See
        ``git-daemon(1)`` for more info on ``git-daemon-export-ok`` file.
        
        You will need to have some kind of httpd server to serve the manifest
        file.
        
        REPLICA SETUP
        -------------
        Install grokmirror on the replica using your preferred way.
        
        Locate grokmirror.conf and modify it to reflect your needs. The default
        configuration file is heavily commented to explain what each option
        does.
        
        Make sure the user "mirror" (or whichever user you specified) is able to
        write to the toplevel and log locations specified in grokmirror.conf.
        
        You can either run grok-pull manually, from cron, or as a
        systemd-managed daemon (see contrib). If you do it more frequently than
        once every few hours, you should definitely run it as a daemon in order
        to improve performance.
        
        GROK-FSCK
        ---------
        Git repositories should be routinely repacked and checked for
        corruption. This utility will perform the necessary optimizations and
        report any problems to the email defined via fsck.report_to ('root' by
        default). It should run weekly from cron or from the systemd timer (see
        contrib).
        
        Please examine the example grokmirror.conf file for various things you
        can tweak.
        
        FAQ
        ---
        Why is it called "grok mirror"?
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Because it's developed at kernel.org and "grok" is a mirror of "korg".
        Also, because it groks git mirroring.
        
        Why not just use rsync?
        ~~~~~~~~~~~~~~~~~~~~~~~
        Rsync is extremely inefficient for the purpose of mirroring git trees
        that mostly consist of a lot of small files that very rarely change.
        Since rsync must calculate checksums on each file during each run, it
        mostly results in a lot of disk thrashing.
        
        Additionally, if several repositories share objects between each-other,
        unless the disk paths are exactly the same on both the remote and local
        mirror, this will result in broken git repositories.
        
        It is also a bit silly, considering git provides its own extremely
        efficient mechanism for specifying what changed between revision X and
        revision Y.
        
Keywords: git,mirroring,repositories
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
