2. Squid

Squid is a fully-featured HTTP/1.0 proxy and it offers a rich access control, authorization and logging environment to develop web proxy and content serving applications.

2.1 Installation

Let's start with the location of the cache server in the network: according to the documentation, the most suitable place is in the DMZ; this should keep the cache server secure while still able to peer with other, outside, caches (such as the ISP's).

The documentation also recommends setting a DNS name for the cache server (such as "cache.mydomain.tld" or "proxy.mydomain.tld") as soon as possible: a simple DNS entry can save many hours further down the line. Configuring client machines to access the cache server by IP address is asking for a long, painful transition down the road.

Squid installation is as simple as it can be; you only have to add the Squid package. Available flavors are "ldap" (allowing for LDAP authentication) and "snmp" (including SNMP support).

# export PKG_PATH=/path/to/your/favourite/OpenBSD/mirror
# pkg_add squid-x.x.STABLExx-snmp.tgz
squid-x.x.STABLExx-snmp: complete
--- squid-x.x.STABLExx-snmp -------------------
NOTES ON OpenBSD POST-INSTALLATION OF SQUID x.x

The local (OpenBSD) differences are:
configuration files are in              /etc/squid
sample configuration files are in       /usr/local/share/examples/squid
error message files are in              /usr/local/share/squid/errors
sample error message files are in       /usr/local/share/examples/squid/errors
icons are in                            /usr/local/share/squid/icons
sample icons are in                     /usr/local/share/examples/squid/icons
the cache is in                         /var/squid/cache
logs are stored in                      /var/squid/logs
the ugid squid runs as is               _squid:_squid

Please remember to initialize the cache by running "squid -z" before
trying to run Squid for the first time.

You can also edit /etc/rc.local so that Squid is started automatically:

    if [ -x /usr/local/sbin/squid ]; then
        echo -n ' squid';       /usr/local/sbin/squid
    fi

#

2.2 Base configuration

Squid configuration relies on several dozens of parameters, and thus can quickly turn into a very tricky task. Therefore, the best approach is probably starting with a very basic configuration and then tweaking the options, one by one, to meet your specific needs, while still making sure that everything keeps working as expected.

Actually, only a few parameters need to be set to get Squid up and running (theoretically, you could even run Squid with an empty configuration file): for all the options you don't explicitly set, the default values are assumed. Anyway, at least one setting must certainly be changed: the default configuration file denies access to all browsers; and this may sound a bit ...too strict!

Our first configuration will be very simple: we will place our proxy server in the DMZ (172.16.240.0/24, this is the network layout) and allow only requests from the LAN (172.16.0.0/24). No ISP's parent proxy is taken into account.

The main Squid configuration file is /etc/squid/squid.conf. Let's have a look at it.

The http_port option sets the port(s) that Squid will listen on for incoming HTTP requests. There are three forms: port alone (e.g. "http_port 3128"), hostname with port (e.g. "http_port proxy.kernel-panic.it:3128"), and IP address with port (e.g. "http_port 172.16.240.151:3128"); you can specify multiple socket addresses, each on a separate line. If your Squid machine is multi-homed and directly accessible from the internet, it is strongly recommended that you force Squid to bind the socket to the internal address. This way, Squid will only be visible from the internal network and won't proxy the whole world! Squid's default HTTP port is 3128, but many administrators prefer using a port which is easier to remember, such as 8080.

http_port   3128

The cache_dir parameter allows you to specify the path, size and depth of the directories where the cache swap files will be stored. Squid allows you to have multiple cache_dir tags in your config file.

cache_dir   ufs /var/squid/cache 100 16 256

The above line sets the cache directory pathname to /var/squid/cache, with a size of 100MB and 16 first-level subdirectories, each containing 256 second-level subdirectories. The cache directory must exist and be writable by the Squid process and its size can't exceed 80% of the whole disk. For further details, please refer to the documentation.

The cache_mgr parameter contains the e-mail address of the Squid administrator, which will appear at the end of the error pages; e.g.:

cache_mgr   webmaster@kernel-panic.it

The cache_effective_user and cache_effective_group options, allow you to set the UID and GID Squid will drop its privileges to once it has bound to the incoming network port. The package installation has already created the _squid user and group.

cache_effective_user    _squid
cache_effective_group   _squid

The ftp_user option sets the e-mail address that Squid will use as the password for anonymous FTP login. It's a good practice to use an existing address:

ftp_user    webmaster@kernel-panic.it

The following options set the paths to the log files; the format of the access log file, which logs every request received by the cache, can be specified by using a logformat directive (please refer to the documentation for a detailed list of the available format codes):

# Define the access log format
logformat squid  %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %un %Sh/%<A %mt
# Log client request activities ('squid' is the name of the log format to use)
access_log       /var/squid/logs/access.log squid

# Log information about the cache's behavior
cache_log        /var/squid/logs/cache.log
# Log the activities of the storage manager
cache_store_log  /var/squid/logs/store.log

And now we come to one of the most tricky parts of the configuration: Access Control Lists. The simplest way to restrict access is to only accept requests from the internal network. Such a basic access control can be enough in small networks, especially if you don't wish to use features like username/password authentication or URL filtering.

ACLs are usually split into two parts: acl lines, starting with the acl keyword and defining classes, and acl operators, allowing or denying requests based on classes. Acl-operators are checked from top to bottom and the first matching wins. Below is a very basic ruleset:

# Classes
acl  all           src    all               # Any IP address
acl  localhost     src    127.0.0.0/8       # Localhost
acl  lan           src    172.16.0.0/24     # LAN where authorized clients reside
acl  manager       proto  cache_object      # Cache object protocol
acl  to_localhost  dst    127.0.0.0/8       # Requests to localhost
acl  SSL_ports     port   443               # https port
acl  Safe_ports    port   80 21 443         # http, ftp, https ports
acl  CONNECT       method CONNECT           # SSL CONNECT method

# Only allow cachemgr access from localhost
http_access  allow  manager localhost
http_access  deny   manager

# Deny requests to unknown ports
http_access  deny   !Safe_ports

# Deny CONNECT to other than SSL ports
http_access  deny   CONNECT !SSL_ports

# Prevent access to local web applications from remote users
http_access  deny   to_localhost

# Allow access from the local network
http_access  allow  lan

# Default deny (this must be the last rule)
http_access  deny   all

2.3 Starting Squid

Now our cache server is almost ready for a first run, just one last step to go. We first need to create the cache-swap directories where Squid will store cached pages. The "squid -z" command will create all the required directories, according to the cache_dir parameter in squid.conf (see above), as the user and group specified by the cache_effective_user and cache_effective_group parameters.

# /usr/local/sbin/squid -z
2009/10/30 18:04:35| Creating Swap Directories
#

We are now ready to start Squid. Starting it in debug mode (-d 1 flag) and in foreground (-N flag) will make it easier to see if everything is working fine.

# /usr/local/sbin/squid -d 1 -N
2009/10/30 18:05:19| Starting Squid Cache version 2.7.STABLE6 for i386-unknown-openbsd4.6...
[ ... ]
2009/10/30 18:05:19| Accepting proxy HTTP connections at 0.0.0.0, port 3128, FD 10.
2009/10/30 18:05:19| Accepting ICP messages at 0.0.0.0, port 3130, FD 11.
2009/10/30 18:05:19| Accepting SNMP messages on port 3401, FD 12.
2009/10/30 18:05:19| WCCP Disabled.
2009/10/30 18:05:19| Ready to serve requests.
2009/10/30 18:05:22| Done scanning /var/squid/cache (0 entries)
2009/10/30 18:05:22| Finished rebuilding storage from disk.
2009/10/30 18:05:22|         0 Entries scanned
2009/10/30 18:05:22|         0 Invalid entries.
2009/10/30 18:05:22|         0 With invalid flags.
2009/10/30 18:05:22|         0 Objects loaded.
2009/10/30 18:05:22|         0 Objects expired.
2009/10/30 18:05:22|         0 Objects cancelled.
2009/10/30 18:05:22|         0 Duplicate URLs purged.
2009/10/30 18:05:22|         0 Swapfile clashes avoided.
2009/10/30 18:05:22|   Took 2.9 seconds (   0.0 objects/sec).
2009/10/30 18:05:22| Beginning Validation Procedure
2009/10/30 18:05:22|   Completed Validation Procedure
2009/10/30 18:05:22|   Validated 0 Entries
2009/10/30 18:05:22|   store_swap_size = 0k
2009/10/30 18:05:22| storeLateRelease: released 0 objects

Once you get the "Ready to serve requests" message, you should be able to use the cache server. Once it is up and running, Squid reads the cache store: the first time you should see all zeros, as above, because the cache store is empty.

Now, to make sure everything is working fine, we will configure our browser to use our fresh new proxy and we will try to access our favourite web site. In the /var/squid/logs/access.log file, you should see something like:

/var/squid/logs/access.log
1242419601.435   6735 172.16.0.13 TCP_MISS/200 11810 GET http://www.kernel-panic.it/ - DIRECT/62.149.140.23 text/html
1242419849.536     14 172.16.0.13 TCP_HIT/200 11820 GET http://www.kernel-panic.it/ - NONE/- text/html
[...]

For a detailed description of each field in the access.log file, please refer to the documentation. Anyway, TCP_MISS means that the requested page wasn't stored in the cache (either it was not present or it had expired); TCP_HIT, instead, means that the page was served from the cache. The second field is the time (in milliseconds) that Squid took to service the request: as you can see, it is much shorter when the page is cached. The page size is the fifth field: cached pages may be a little larger because of the extra headers added by Squid.

If everything is working fine, we can stop Squid:

# /usr/local/sbin/squid -k shutdown

and configure the system to start it on boot.

/etc/rc.local
if [ -x /usr/local/sbin/squid ]; then
    echo -n ' squid'
    /usr/local/sbin/squid
fi

You may also wish to start Squid through the RunCache script, which automatically restarts it on failure and logs both to the /var/squid/squid.out file and to syslog. Just remember to background it with an "&", or it will hang the system at boot time.