Introduction
How to setup an Arch linux mirror based on the article of the Arch Linux wiki. The mirror has the follwing features:
- HTTP
- HTTPS
- RSync
- Introduction
- Tier 1 vs. tier 2 Mirror
- Pick an a tier 1 mirror
- Sync script
- Think about the (sub-) domain
- Setup Webserver
- Setup rsyncd
- Open ports for Rsync and HTTP(S):
- Create a feature request
- Use you own local mirror:
- Update 17.09.2022
Tier 1 vs. tier 2 Mirror
Pick an a tier 1 mirror
Select a mirror which is located where your server is.
Sync script
Location of the mirror data
mkdir -p /srv/http/mirror/archlinux
We need to change the owner of the directory to "http" so, the web server is allowed to access those files. Nginx uses the user `http``
chown -R http:http /srv/http/mirror/
The bash script
Download the "official" script provided in Arch Linux wiki
Adjust the variables. For me it results in:
#!/bin/bash
#
########
#
# Copyright © 2014-2019 Florian Pritz <bluewind@xinu.at>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
#
########
#
# This is a simple mirroring script. To save bandwidth it first checks a
# timestamp via HTTP and only runs rsync when the timestamp differs from the
# local copy. As of 2016, a single rsync run without changes transfers roughly
# 6MiB of data which adds up to roughly 250GiB of traffic per month when rsync
# is run every minute. Performing a simple check via HTTP first can thus save a
# lot of traffic.
# Directory where the repo is stored locally. Example: /srv/repo
target="/srv/http/mirror/archlinux"
# Lockfile path
lock="/tmp/syncrepo-archlinux.lck"
# If you want to limit the bandwidth used by rsync set this.
# Use 0 to disable the limit.
# The default unit is KiB (see man rsync /--bwlimit for more)
bwlimit=0
# The source URL of the mirror you want to sync from.
# If you are a tier 1 mirror use rsync.archlinux.org, for example like this:
# rsync://rsync.archlinux.org/ftp_tier1
# Otherwise chose a tier 1 mirror from this list and use its rsync URL:
# https://www.archlinux.org/mirrors/
source_url='rsync://mirror.23m.com/archlinux/'
# An HTTP(S) URL pointing to the 'lastupdate' file on your chosen mirror.
# If you are a tier 1 mirror use: https://rsync.archlinux.org/lastupdate
# Otherwise use the HTTP(S) URL from your chosen mirror.
lastupdate_url='https://mirror.pagenotfound.de/archlinux/lastupdate'
#### END CONFIG
[ ! -d "${target}" ] && mkdir -p "${target}"
exec 9>"${lock}"
flock -n 9 || exit
# Cleanup any temporary files from old run that might remain.
# Note: You can skip this if you have rsync newer than 3.2.3
# not affected by https://github.com/WayneD/rsync/issues/192
find "${target}" -name '.~tmp~' -exec rm -r {} +
rsync_cmd() {
local -a cmd=(rsync -rlptH --safe-links --delete-delay --delay-updates
"--timeout=600" "--contimeout=60" --no-motd)
if stty &>/dev/null; then
cmd+=(-h -v --progress)
else
cmd+=(--quiet)
fi
if ((bwlimit>0)); then
cmd+=("--bwlimit=$bwlimit")
fi
"${cmd[@]}" "$@"
}
# if we are called without a tty (cronjob) only run when there are changes
#if ! tty -s && [[ -f "$target/lastupdate" ]] && diff -b <(curl -Ls "$lastupdate_url") "$target/lastupdate" >/dev/null; then
# # keep lastsync file in sync for statistics generated by the Arch Linux website
# rsync_cmd "$source_url/lastsync" "$target/lastsync"
# exit 0
#fi
rsync_cmd \
--exclude='*.links.tar.gz*' \
--exclude='/other' \
--exclude='/sources' \
"${source_url}" \
"${target}"
echo "Last sync was $(date -d @$(cat ${target}/lastsync))"
Put bash script above to /srv/http/bin/syncrepo-archlinux.sh
Make it executable by running:
chmod +x /srv/http/bin/syncrepo-archlinux.sh
Crontab
Edit the Crontab for the user http
:
crontab -e -u http
Add the following line to the crontab to run the sync repo script every hour at minute 16
SHELL=/bin/bash
16 * * * * /bin/bash /srv/http/bin/syncrepo-archlinux.sh
Think about the (sub-) domain
I use mirror.pagenotfound.de
for the subdomain.
Setup Webserver
Nginx
I created a Nginx "site" here: /etc/nginx/conf.d/mirror.pagenotfound.de.conf
# Exclude access log entries of the uptime check calls from Arch Linux
map $http_user_agent $excluded_ua {
~Python-urllib 0;
~archweb 0;
default 1;
}
server {
access_log /var/log/nginx/mirror.pagenotfound.de.log combined if=$excluded_ua;
error_log /var/log/nginx/mirror.pagenotfound.de.error.log;
listen 80;
listen [::]:80;
# AUR Nginx-quic: nginx-quic
listen 443 http3 ;
listen 443 ssl http2;
listen [::]:443 http3 ;
listen [::]:443 ssl http2;
server_name mirror.pagenotfound.de;
# SSL stuff
ssl_certificate /root/.acme.sh/pagenotfound.de/fullchain.cer;
ssl_certificate_key /root/.acme.sh/pagenotfound.de/pagenotfound.de.key;
ssl_dhparam /etc/nginx/ssl/dhparam_pnf.pem;
ssl_protocols TLSv1.3 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ecdh_curve secp521r1:secp384r1;
ssl_ciphers EECDH+AESGCM:EECDH+AES256;#
# Security stuff
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
# AUR Fancy index: nginx-mod-fancyindex
fancyindex on;
fancyindex_exact_size off;
root /srv/http/mirror;
# Don't allow mirror stuff on search engines
location = /robots.txt {
add_header Content-Type text/plain;
access_log off;
return 200 "User-agent: *\nDisallow: /\n";
}
location = /favicon.ico {
log_not_found off;
access_log off;
}
location ~ "\.(sig)$" {
log_not_found off;
access_log off;
}
}
Restart Nginx:
systemctl restart nginx
Caddy
https://mirror.pagenotfound.de {
tls /etc/caddy/certs/fullchain.cer /etc/caddy/certs/pagenotfound.de.key
root * /srv/http/mirror
file_server browse
handle /robots.txt {
header Content-type "text/plain"
respond 200 {
body "User-agent: *
Disallow: /"
close
}
}
}
Restart Caddy:
systemctl restart caddy
See if it works: https://mirror.pagenotfound.de/
Setup rsyncd
If you want to support Rsync, use the follwing config and put it to /etc/rsyncd.conf
:
uid = nobody
gid = nobody
use chroot = yes
max connections = 4
syslog facility = local5
pid file = /run/rsyncd.pid
read only = yes
[archlinux]
path = /srv/http/mirror/archlinux
comment = Archlinux Mirror
read only = yes
Enable and start the rsync server:
systemctl enable rsyncd
systemctl start rsyncd
Open ports for Rsync and HTTP(S):
ufw allow 80 # http
ufw allow 443 # https
ufw allow 443/udp # http3
ufw allow 873 # Rsync
Create a feature request
As described here create a new ticket. Mine was this one here.
Use you own local mirror:
Edit your /etc/pacman.d/mirrorlist
files:
cat /etc/pacman.d/mirrorlist | grep -v "#" ─╯
Server = https://mirror.pagenotfound.de/archlinux/$repo/os/$arch
On the server itself, you could use that:
cat /etc/pacman.d/mirrorlist | grep -v "#"
Server = file:///srv/http/mirror/archlinux/$repo/os/$arch
Update 17.09.2022
I had a bad performing rsync deamon. I found out, that Rsyncd tried to resolve the PTR records of the clients through DNS. But there are big delays resolving these kind of DNS queries. Until I find out what the issue is, I disable DNSSEC temporarily in /etc/systemd/resolve.conf
:
cat /etc/systemd/resolved.conf | grep DNSSEC
DNSSEC=false