Introduction

How to setup an Arch linux mirror based on the article of the Arch Linux wiki. The mirror has the follwing features:

HTTP
HTTPS
RSync

Introduction
Tier 1 vs. tier 2 Mirror
Pick an a tier 1 mirror
Sync script
Think about the (sub-) domain
Setup Webserver
Setup rsyncd
Open ports for Rsync and HTTP(S):
Create a feature request
Use you own local mirror:
Update 17.09.2022

Tier 1 vs. tier 2 Mirror

https://wiki.archlinux.org/title/DeveloperWiki:NewMirrors#For_the_mirror_administrator

Pick an a tier 1 mirror

Select a mirror which is located where your server is.

Sync script

Location of the mirror data

mkdir -p /srv/http/mirror/archlinux

We need to change the owner of the directory to "http" so, the web server is allowed to access those files. Nginx uses the user `http``

chown -R http:http /srv/http/mirror/

The bash script

Download the "official" script provided in Arch Linux wiki

Adjust the variables. For me it results in:

#!/bin/bash
#
########
#
# Copyright © 2014-2019 Florian Pritz <bluewind@xinu.at>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
#
########
#
# This is a simple mirroring script. To save bandwidth it first checks a
# timestamp via HTTP and only runs rsync when the timestamp differs from the
# local copy. As of 2016, a single rsync run without changes transfers roughly
# 6MiB of data which adds up to roughly 250GiB of traffic per month when rsync
# is run every minute. Performing a simple check via HTTP first can thus save a
# lot of traffic.

# Directory where the repo is stored locally. Example: /srv/repo
target="/srv/http/mirror/archlinux"

# Lockfile path
lock="/tmp/syncrepo-archlinux.lck"

# If you want to limit the bandwidth used by rsync set this.
# Use 0 to disable the limit.
# The default unit is KiB (see man rsync /--bwlimit for more)
bwlimit=0

# The source URL of the mirror you want to sync from.
# If you are a tier 1 mirror use rsync.archlinux.org, for example like this:
# rsync://rsync.archlinux.org/ftp_tier1
# Otherwise chose a tier 1 mirror from this list and use its rsync URL:
# https://www.archlinux.org/mirrors/
source_url='rsync://mirror.23m.com/archlinux/'

# An HTTP(S) URL pointing to the 'lastupdate' file on your chosen mirror.
# If you are a tier 1 mirror use: https://rsync.archlinux.org/lastupdate
# Otherwise use the HTTP(S) URL from your chosen mirror.
lastupdate_url='https://mirror.pagenotfound.de/archlinux/lastupdate'

#### END CONFIG

[ ! -d "${target}" ] && mkdir -p "${target}"

exec 9>"${lock}"
flock -n 9 || exit

# Cleanup any temporary files from old run that might remain.
# Note: You can skip this if you have rsync newer than 3.2.3
# not affected by https://github.com/WayneD/rsync/issues/192
find "${target}" -name '.~tmp~' -exec rm -r {} +

rsync_cmd() {
        local -a cmd=(rsync -rlptH --safe-links --delete-delay --delay-updates
                "--timeout=600" "--contimeout=60" --no-motd)

        if stty &>/dev/null; then
                cmd+=(-h -v --progress)
        else
                cmd+=(--quiet)
        fi

        if ((bwlimit>0)); then
                cmd+=("--bwlimit=$bwlimit")
        fi

        "${cmd[@]}" "$@"
}


# if we are called without a tty (cronjob) only run when there are changes
#if ! tty -s && [[ -f "$target/lastupdate" ]] && diff -b <(curl -Ls "$lastupdate_url") "$target/lastupdate" >/dev/null; then
#       # keep lastsync file in sync for statistics generated by the Arch Linux website
#       rsync_cmd "$source_url/lastsync" "$target/lastsync"
#       exit 0
#fi

rsync_cmd \
        --exclude='*.links.tar.gz*' \
        --exclude='/other' \
        --exclude='/sources' \
        "${source_url}" \
        "${target}"

echo "Last sync was $(date -d @$(cat ${target}/lastsync))"

Put bash script above to /srv/http/bin/syncrepo-archlinux.sh Make it executable by running:

chmod +x /srv/http/bin/syncrepo-archlinux.sh

Crontab

Edit the Crontab for the user http:

crontab -e -u http

Add the following line to the crontab to run the sync repo script every hour at minute 16

SHELL=/bin/bash
16 * * * * /bin/bash /srv/http/bin/syncrepo-archlinux.sh

Think about the (sub-) domain

I use mirror.pagenotfound.de for the subdomain.

Setup Webserver

Nginx

I created a Nginx "site" here: /etc/nginx/conf.d/mirror.pagenotfound.de.conf

# Exclude access log entries of the uptime check calls from Arch Linux
map $http_user_agent $excluded_ua {
    ~Python-urllib 0;
    ~archweb 0;
    default 1;
}

server {
  access_log /var/log/nginx/mirror.pagenotfound.de.log combined if=$excluded_ua;
  error_log /var/log/nginx/mirror.pagenotfound.de.error.log;

  listen 80;
  listen [::]:80;
  
  # AUR Nginx-quic: nginx-quic
  listen 443 http3 ;
  listen 443 ssl http2;  

  listen [::]:443 http3 ;
  listen [::]:443 ssl http2;

  server_name mirror.pagenotfound.de;

  # SSL stuff
  ssl_certificate /root/.acme.sh/pagenotfound.de/fullchain.cer;
  ssl_certificate_key /root/.acme.sh/pagenotfound.de/pagenotfound.de.key;
  ssl_dhparam /etc/nginx/ssl/dhparam_pnf.pem;
  ssl_protocols TLSv1.3 TLSv1.2;
  ssl_prefer_server_ciphers on;
  ssl_ecdh_curve secp521r1:secp384r1;
  ssl_ciphers EECDH+AESGCM:EECDH+AES256;#

  # Security stuff
  add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
  add_header X-Frame-Options DENY;
  add_header X-Content-Type-Options nosniff;
  add_header X-XSS-Protection "1; mode=block"; 

  # AUR Fancy index: nginx-mod-fancyindex
  fancyindex on;       
  fancyindex_exact_size off;
  root /srv/http/mirror;

  # Don't allow mirror stuff on search engines
  location = /robots.txt {
    add_header Content-Type text/plain;
    access_log off;
    return 200 "User-agent: *\nDisallow: /\n";
  }

  location = /favicon.ico { 
      log_not_found off; 
      access_log off; 
  } 

  location ~ "\.(sig)$" {
      log_not_found off;
      access_log off;
  }
}

Restart Nginx:

systemctl restart nginx

Caddy

https://mirror.pagenotfound.de {

  tls /etc/caddy/certs/fullchain.cer    /etc/caddy/certs/pagenotfound.de.key

   root * /srv/http/mirror
   file_server browse

   handle /robots.txt {
     header  Content-type "text/plain"

     respond 200 {
body "User-agent: *
Disallow: /"
        close
   }
  }
}

Restart Caddy:

systemctl restart caddy

See if it works: https://mirror.pagenotfound.de/

Setup rsyncd

If you want to support Rsync, use the follwing config and put it to /etc/rsyncd.conf:

uid = nobody
gid = nobody
use chroot = yes
max connections = 4
syslog facility = local5
pid file = /run/rsyncd.pid
read only = yes

[archlinux]
  path = /srv/http/mirror/archlinux
  comment = Archlinux Mirror
  read only = yes

Enable and start the rsync server:

systemctl enable rsyncd
systemctl start rsyncd

Open ports for Rsync and HTTP(S):

ufw allow 80 # http
ufw allow 443 # https
ufw allow 443/udp # http3
ufw allow 873 # Rsync

Create a feature request

As described here create a new ticket. Mine was this one here.

Use you own local mirror:

Edit your /etc/pacman.d/mirrorlist files:

cat /etc/pacman.d/mirrorlist | grep -v "#"                                                                                                                   ─╯
Server = https://mirror.pagenotfound.de/archlinux/$repo/os/$arch

On the server itself, you could use that:

cat /etc/pacman.d/mirrorlist | grep -v "#"
Server = file:///srv/http/mirror/archlinux/$repo/os/$arch

Update 17.09.2022

I had a bad performing rsync deamon. I found out, that Rsyncd tried to resolve the PTR records of the clients through DNS. But there are big delays resolving these kind of DNS queries. Until I find out what the issue is, I disable DNSSEC temporarily in /etc/systemd/resolve.conf:

cat /etc/systemd/resolved.conf | grep DNSSEC
DNSSEC=false